Hostname: page-component-54dcc4c588-nx7b4 Total loading time: 0 Render date: 2025-09-21T18:38:22.489Z Has data issue: false hasContentIssue false

A supervised machine learning approach for the decision-making process on data-based culling in dairy farms

Published online by Cambridge University Press:  18 September 2025

Oscar R. Espinoza Sandoval
Affiliation:
Facultad de Zootecnia y Ecología, Universidad Autónoma de Chihuahua, Chihuahua, México
Juan C. Angeles-Hernandez
Affiliation:
Departamento de Medicina y Zootecnia de Rumiantes, Facultad de Medicina Veterinaria y Zootecnia, Universidad Nacional Autónoma de México, Ciudad de México, México
Agustín Corral-Luna
Affiliation:
Facultad de Zootecnia y Ecología, Universidad Autónoma de Chihuahua, Chihuahua, México
Felipe A. Rodríguez-Almeida
Affiliation:
Facultad de Zootecnia y Ecología, Universidad Autónoma de Chihuahua, Chihuahua, México
Pablo Pinedo
Affiliation:
Department of Animal Sciences, Colorado State University, Fort Collins, CO, USA
Albert De Vries
Affiliation:
Department of Animal Sciences, University of Florida, Gainesville, FL, USA
Santiago A. Utsumi
Affiliation:
Department of Animal and Range Sciences, New Mexico State University, Las Cruces, NM, USA
Einar Vargas-Bello-Pérez*
Affiliation:
Facultad de Zootecnia y Ecología, Universidad Autónoma de Chihuahua, Chihuahua, México
*
Corresponding author: Einar Vargas–Bello–Pérez; Email: einar.vargasbelloperez@reading.ac.uk

Abstract

This research paper aimed to develop a supervised machine learning (ML) approach that learns and predicts data-based culling from farm information that reflects the criteria of the decisions taken to cull a cow by a farm manager. Data containing the features of milk yield, days in milk, lactation number, pregnancy status, days open and days pregnant were obtained from January to December 2020 from dairy cows on a large dairy farm in northern Mexico. The cows were labelled as those that were data-based culled (Cull) and those that were not culled (Stay). Six supervised ML algorithms were evaluated in a binary classification including logistic regression (LR), Gaussian naïve Bayes (GNB), k-nearest neighbors (k-NN), support vector machine (SVM), random forest (RF) and multilayer perceptron (MLP). Each model was subjected to hyperparameter optimization using a grid search approach combined with tenfold stratified cross-validation. This ensured that the class imbalance (Cull vs. Stay) was accounted during model evaluation. The best-performing model for each algorithm was selected on cross-validated accuracy. To evaluate the prediction performance of the ML algorithms on both labels from learned data, the metrics accuracy, precision, recall, F1-score and the Matthews correlation coefficient (MCC) were employed. Accuracy among all classifiers was >0.90. The poorest prediction performance was observed in GNB (MCC = 0.50) and LR (MCC = 0.72). Conversely, the rest of the classifiers achieved superior prediction performance in learning the specific culling criteria, reaching an MCC score >0.91. Overall, culling criteria can be learned and predicted by ML algorithms and their performance varies among classifiers. This study identified RF as the best performing algorithm, but k-NN, SVM and MLP are possible candidates to be used in on-farm conditions. To increase their reliability, these approaches need to be tested in several farms, under different scenarios and varieties of features.

Information

Type
Research Article
Copyright
© The Author(s), 2025. Published by Cambridge University Press on behalf of Hannah Dairy Research Foundation.

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

References

Adamczyk, K, Zaborski, D, Grzesiak, W, Makulska, J and Jagusiak, W (2016) Recognition of culling reasons in Polish dairy cows using data mining methods. Computers and Electronics in Agriculture 127, 2637.10.1016/j.compag.2016.05.011CrossRefGoogle Scholar
Ali, PJM and Faraj, RH (2014) Data Normalization and Standardization: a Technical Report. Machine Learning Technical Reports 1, 16.Google Scholar
Badem, H, Basturk, A, Caliskan, A and Yuksel, ME (2017) A new efficient training strategy for deep neural networks by hybridization of artificial bee colony and limited–memory BFGS optimization algorithms. Neurocomputing 266, 506526.10.1016/j.neucom.2017.05.061CrossRefGoogle Scholar
Berry, DP (2021) Invited review: beef-on-dairy—The generation of crossbred beef × dairy cattle. Journal of Dairy Science 104, 37893819.10.3168/jds.2020-19519CrossRefGoogle ScholarPubMed
Bharadiya, JP (2023) A tutorial on principal component analysis for dimensionality reduction in machine learning. International Journal of Innovative Science and Research Technology 8, 20282032.Google Scholar
Biau, G and Scornet, E (2016) A random forest guided tour. TEST 25, 197227.10.1007/s11749-016-0481-7CrossRefGoogle Scholar
Bisong, E (2019) Google Colaboratory. In Bisong E (ed), Building Machine Learning and Deep Learning Models on Google Cloud Platform. Berkeley, CA: Apress, pp. 59–64.10.1007/978-1-4842-4470-8CrossRefGoogle Scholar
Brooks, M (2024) Inside the maths that drives AI. Nature 631, 244246.10.1038/d41586-024-02185-zCrossRefGoogle ScholarPubMed
Buza, MH, Holden, LA, White, RA and Ishler, VA (2014) Evaluating the effect of ration composition on income over feed cost and milk yield. Journal of Dairy Science 97, 30733080.10.3168/jds.2013-7622CrossRefGoogle ScholarPubMed
Cabrera, VE (2012) A simple formulation and solution to the replacement problem: a practical tool to assess the economic cow value, the value of a new pregnancy, and the cost of a pregnancy loss. Journal of Dairy Science 95, 46834698.10.3168/jds.2011-5214CrossRefGoogle Scholar
Calsamiglia, S, Astiz, S, Baucells, J and Castillejos, L (2018) A stochastic dynamic model of a dairy farm to evaluate the technical and economic performance under different scenarios. Journal of Dairy Science 101, 75177530.10.3168/jds.2017-12980CrossRefGoogle ScholarPubMed
Carvalho, PD, Santos, VG, Giordano, JO, Wiltbank, MC and Fricke, PM (2018) Development of fertility programs to achieve high 21-day pregnancy rates in high-producing dairy cows. Theriogenology 114, 165172.10.1016/j.theriogenology.2018.03.037CrossRefGoogle ScholarPubMed
Cervantes, J, Garcia-Lamont, F, Rodríguez-Mazahua, L and Lopez, A (2020) A comprehensive survey on support vector machine classification: applications, challenges and trends. Neurocomputing 408, 189215.10.1016/j.neucom.2019.10.118CrossRefGoogle Scholar
Chicco, D and Jurman, G (2020) Machine learning can predict survival of patients with heart failure from serum creatinine and ejection fraction alone. BMC Medical Informatics Decision Making 20, 16.10.1186/s12911-020-1023-5CrossRefGoogle ScholarPubMed
Cockram, MS (2021) Invited review: the welfare of cull dairy cows. Applied Animal Science 37, 334352.10.15232/aas.2021-02145CrossRefGoogle Scholar
Compton, CWR, Heuer, C, Thomsen, PT, Carpenter, TE, Phyn, CVC and McDougall, S (2017) Invited review: a systematic literature review and meta-analysis of mortality and culling in dairy cattle. Journal of Dairy Science 100, 116.10.3168/jds.2016-11302CrossRefGoogle ScholarPubMed
Cunningham, P and Delany, SJ (2021) k-Nearest Neighbor Classifiers – a Tutorial. ACM Computing Surveys 54, 128.Google Scholar
De Vries, A (2004) Economics of Delayed Replacement When Cow Performance is Seasonal. Journal of Dairy Science 87, 29472958.10.3168/jds.S0022-0302(04)73426-8CrossRefGoogle ScholarPubMed
De Vries, A and Marcondes, MI (2020) Review: overview of factors affecting productive lifespan of dairy cows. Animal 14, 155164.10.1017/S1751731119003264CrossRefGoogle ScholarPubMed
Emerson, JW, Green, WA, Schloerke, B, Crowley, J, Cook, D, Hofmann, H and Wickham, H (2013) The Generalized Pairs Plot. Journal of Computational and Graphical Statistics 22, 7991.10.1080/10618600.2012.694762CrossRefGoogle Scholar
Espinoza-Sandoval, OR, Angeles-Hernandez, JC, Gonzalez-Ronquillo, M, Ghavipanje, N, Zhang, N, Bayat, AR, Hervás, G, Knolif, AE, Mele, M, Loor, JJ, Stergiadis, S and Vargas-Bello-Pérez, E (2024) Dairy farming in the era of artificial intelligence: Trend or a real game changer? Journal of Dairy Research 91, 139145.10.1017/S0022029924000426CrossRefGoogle Scholar
Ferrari, V, Marusi, M, Penasa, M, van Kaam Jbchm, , Finocchiaro, R and Cassandro, M (2024) A tool to optimise dairy herd replacements combining conventional, sexed, and beef semen. Italian Journal of Animal Science 23, 409415.10.1080/1828051X.2024.2324130CrossRefGoogle Scholar
Fetrow, J (1987) Culling dairy cows. In Williams E (eds), Proceedings of the 20th Annual Convention American Association of Bovine Practitioners. Stillwater, OK, USA. Phoenix, AZ: Frontier Printers, Inc., pp. 102–107.10.21423/aabppro19877465CrossRefGoogle Scholar
Fetrow, J, Nordlund, KV and Norman, HD (2006) Invited review: Culling: Nomenclature, definitions, and recommendations. Journal of Dairy Science 89, 18961905.10.3168/jds.S0022-0302(06)72257-3CrossRefGoogle ScholarPubMed
Fisher, A, Rudin, C and Dominici, F (2019) All models are wrong, but many are useful: learning a variable's importance by studying an entire class of prediction models simultaneously. Journal of Machine Learning Research 20, 181.Google ScholarPubMed
Foidl, H and Felderer, M (2019) Risk-based data validation in machine learning-based software systems. In Proceedings of the 3rd ACM SIGSOFT International Workshop on Machine Learning Techniques for Software Quality Evaluation (MaLTeSQuE 2019). Association for Computing Machinery, New York, NY, USA, pp. 1318.10.1145/3340482.3342743CrossRefGoogle Scholar
Fricke, PM, Wiltbank, MC and Pursley, JR (2023) The high fertility cycle. JDS Communications 4, 127131.10.3168/jdsc.2022-0280CrossRefGoogle ScholarPubMed
Gardner, MW and Dorling, SR (1998) Artificial neural networks (the multilayer perceptron)—a review of applications in the atmospheric sciences. Atmospheric Environment 32, 26272636.10.1016/S1352-2310(97)00447-0CrossRefGoogle Scholar
Geng, Y, Li, Q, Yang, G and Qiu, W (2024) Logistic Regression. In Geng Y, Li Q, Yang G and Qiu W (eds), Practical Machine Learning Illustrated with KNIME. Singapore: Springer, pp. 99–132.10.1007/978-981-97-3954-7CrossRefGoogle Scholar
Groenendaal, H, Galligan, DT and Mulder, HA (2004) An economic spreadsheet model to determine optimal breeding and replacement decisions for dairy cows. Journal of Dairy Science 87, 21462157.10.3168/jds.S0022-0302(04)70034-XCrossRefGoogle Scholar
Hare, E, Norman, HD and Wright, JR (2006) Survival Rates and Productive Herd Life of Dairy Cattle in the United States. Journal of Dairy Science 89, 37133720.10.3168/jds.S0022-0302(06)72412-2CrossRefGoogle ScholarPubMed
Jayasundara, S, Worden, D, Weersink, A, Wright, T, VanderZaag, A, Gordon, R and Wagner-Riddle, C (2019) Improving farm profitability also reduces the carbon footprint of milk production in intensive dairy production systems. Journal of Cleaner Production 229, 10181028.10.1016/j.jclepro.2019.04.013CrossRefGoogle Scholar
Kelly, A and Johnson, MA (2021) Investigating the Statistical Assumptions of Naïve Bayes Classifiers. 2021 55th Annual Conference on Information Sciences and Systems (CISS), Baltimore, MD, USA, pp. 16.10.1109/CISS50987.2021.9400215CrossRefGoogle Scholar
Krawczyk, B (2016) Learning from imbalanced data: open challenges and future directions. Progress in Artificial Intelligence 5, 221232.10.1007/s13748-016-0094-0CrossRefGoogle Scholar
Lehenbauer, TW and Oltjen, JW (1998) Dairy cow culling strategies: Making economical culling decisions. Journal of Dairy Science 81, 264271.10.3168/jds.S0022-0302(98)75575-4CrossRefGoogle ScholarPubMed
Lemaître, G, Nogueira, F and Aridas, CK (2017) Imbalanced-learn: a Python Toolbox to Tackle the Curse of Imbalanced Datasets in Machine Learning. Journal of Machine Learning Research 18, 15.Google Scholar
Li, JJ and Tong, X (2020) Statistical hypothesis testing versus machine learning binary classification: distinctions and guidelines. Patterns 1, 100115.10.1016/j.patter.2020.100115CrossRefGoogle ScholarPubMed
Liu, XY, Wu, J and Zhou, ZH (2008) Exploratory undersampling for class-imbalance learning. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics 39, 539550.Google ScholarPubMed
Lopez-Suarez, M, Armengol, E, Calsamiglia, S and Castillejos, L (2018) Using Decision Trees to Extract Patterns for Dairy Culling Management. In Iliadis, L, Maglogiannis, I and Plagianakos, V (eds), Artificial Intelligence Applications and Innovations. AIAI 2018. IFIP Advances in Information and Communication Technology, Vol. 519. Cham: Springer, pp. 231–239.Google Scholar
Luque, A, Carrasco, A, Martín, A and Heras A, de Las (2019) The impact of class imbalance in classification performance metrics based on the binary confusion matrix. Pattern Recognition 91, 216231.10.1016/j.patcog.2019.02.023CrossRefGoogle Scholar
Marshall, J, Haley, D, Levison, L, Kelton, DF, Miltenburg, C, Roche, S and Duffield, TF (2023) A survey of practices and attitudes around cull cow management by bovine veterinarians in Ontario, Canada. Journal of Dairy Science 106, 302311.10.3168/jds.2022-22005CrossRefGoogle ScholarPubMed
Nielsen, LR, Jørgensen, E, Kristensen, AR and Østergaard, S (2010) Optimal replacement policies for dairy cows based on daily yield measurements. Journal of Dairy Science 93, 7592.10.3168/jds.2009-2209CrossRefGoogle ScholarPubMed
Ouseleye, A, Rodriguez, J and Araujo, H (2023) Machine Learning models to optimize dairy cow culling: the case of the experimental farm UniLaSalle Beauvais. Computer Aided Chemical Engineering 52, 11351140.Google Scholar
Parmar, A, Katariya, R and Patel, V (2019) A review on random forest: an ensemble classifier. In: Hemanth, J, Fernando, X, Lafata, P and Baig, Z (eds) International Conference on Intelligent Data Communication Technologies and Internet of Things (ICICI) 2018. ICICI 2018. Lecture Notes on Data Engineering and Communications Technologies, Vol. 26. Cham: Springer.Google Scholar
Pedregosa, F, Varoquaux, G, Gramfort, A, Michel, V, Thirion, B, Grisel, O, Blondel, M, Prettenhofer, P, Weiss, R, Dubourg, V, Vanderplas, J, Passos, A, Cournapeau, D, Brucher, M, Perrot, M and Duchesnay, E (2011) Scikit-learn: machine Learning in Python. Journal of Machine Learning Research 12, 28252830.Google Scholar
Pisner, DA and Schnyer, DM (2020) Support vector machine. In Mechelli A and Vieira S (eds), Machine Learning. Cambridge, MA, USA: Academic Press, pp. 101121.10.1016/B978-0-12-815739-8.00006-7CrossRefGoogle Scholar
Python Software Foundation. Python Language Reference, Version 3.10. Available at http://www.python.org.Google Scholar
Rilanto, T, Reimus, K, Orro, T, Emanuelson, U, Viltrop, A and Mõtus, K (2020) Culling reasons and risk factors in Estonian dairy cows. BMC Veterinary Research 16, 173.10.1186/s12917-020-02384-6CrossRefGoogle ScholarPubMed
Shahinfar, S, Page, D, Guenther, J, Cabrera, V, Fricke, P and Weigel, K (2014) Prediction of insemination outcomes in Holstein dairy cattle using alternative machine learning algorithms. Journal of Dairy Science 97, 731742.10.3168/jds.2013-6693CrossRefGoogle ScholarPubMed
Slob, N, Catal, C and Kassahun, A (2021) Application of machine learning to improve dairy farm management: a systematic literature review. Preventive Veterinary Medicine 187, 105237.10.1016/j.prevetmed.2020.105237CrossRefGoogle ScholarPubMed
Sorge, US, Kelton, DF, Lissemore, KD, Sears, W and Fetrow, J (2007) Evaluation of the Dairy Comp 305 Module “Cow Value” in Two Ontario Dairy Herds. Journal of Dairy Science 90, 57845797.10.3168/jds.2006-0813CrossRefGoogle ScholarPubMed
Thirunavukkarasu, K, Singh, AS, Rai, P and Gupta, S (2018) Classification of IRIS Dataset using Classification Based KNN Algorithm in Supervised Learning. 4th International Conference on Computing Communication and Automation (ICCCA), Greater Noida, India, 2018, pp. 14.10.1109/CCAA.2018.8777643CrossRefGoogle Scholar
Waskom, ML (2021) Seaborn: statistical data visualization. Journal of Open Source Software 6, 3021.10.21105/joss.03021CrossRefGoogle Scholar
Weigel, KA, Palmer, RW and Caraviello, DZ (2003) Investigation of Factors Affecting Voluntary and Involuntary Culling in Expanding Dairy Herds in Wisconsin using Survival Analysis. Journal of Dairy Science 86, 14821486.10.3168/jds.S0022-0302(03)73733-3CrossRefGoogle ScholarPubMed
Wickramasinghe, I and Kalutarage, H (2021) Naive Bayes: applications, variations and vulnerabilities: a review of literature with code snippets for implementation. Soft Computing 25, 22772293.10.1007/s00500-020-05297-6CrossRefGoogle Scholar
Wolpert, DH (1996) The existence of a priori distinctions between learning algorithms. Neural Computation 8, 13911420.10.1162/neco.1996.8.7.1391CrossRefGoogle Scholar
Zaidi, A and Al Luhayb, ASM (2023) Two statistical approaches to justify the use of the logistic function in binary logistic regression. Mathematical Problems in Engineering 1, 5525675.10.1155/2023/5525675CrossRefGoogle Scholar
Supplementary material: File

Espinoza Sandoval et al. supplementary material

Espinoza Sandoval et al. supplementary material
Download Espinoza Sandoval et al. supplementary material(File)
File 1.5 MB