No CrossRef data available.
Published online by Cambridge University Press: 18 September 2025
This research paper aimed to develop a supervised machine learning (ML) approach that learns and predicts data-based culling from farm information that reflects the criteria of the decisions taken to cull a cow by a farm manager. Data containing the features of milk yield, days in milk, lactation number, pregnancy status, days open and days pregnant were obtained from January to December 2020 from dairy cows on a large dairy farm in northern Mexico. The cows were labelled as those that were data-based culled (Cull) and those that were not culled (Stay). Six supervised ML algorithms were evaluated in a binary classification including logistic regression (LR), Gaussian naïve Bayes (GNB), k-nearest neighbors (k-NN), support vector machine (SVM), random forest (RF) and multilayer perceptron (MLP). Each model was subjected to hyperparameter optimization using a grid search approach combined with tenfold stratified cross-validation. This ensured that the class imbalance (Cull vs. Stay) was accounted during model evaluation. The best-performing model for each algorithm was selected on cross-validated accuracy. To evaluate the prediction performance of the ML algorithms on both labels from learned data, the metrics accuracy, precision, recall, F1-score and the Matthews correlation coefficient (MCC) were employed. Accuracy among all classifiers was >0.90. The poorest prediction performance was observed in GNB (MCC = 0.50) and LR (MCC = 0.72). Conversely, the rest of the classifiers achieved superior prediction performance in learning the specific culling criteria, reaching an MCC score >0.91. Overall, culling criteria can be learned and predicted by ML algorithms and their performance varies among classifiers. This study identified RF as the best performing algorithm, but k-NN, SVM and MLP are possible candidates to be used in on-farm conditions. To increase their reliability, these approaches need to be tested in several farms, under different scenarios and varieties of features.