Introduction
As a fundamental pillar of efficient animal production – aligned with specific environmental conditions and welfare standards – the development of selection and improvement index protocols with minimal bias and maximum precision represents a dynamic and continuously evolving process (Hazel, Reference Hazel1943; Wilton et al., Reference Wilton, Evans and Van Vleck1968; Wellmann, Reference Wellmann2023; Pereira et al., Reference Pereira, Centurión, Valdez and Martinez-López2025; Rao et al., Reference Rao, Zhang, Gao, Wang and Yang2025). Advances in computational tools and artificial intelligence, together with a growing understanding of environmental characteristics and animal physiology, are constantly reshaping these protocols (Mouloodi et al., Reference Mouloodi, Rahmanpanah, Gohery, Burvill, Tse and Davies2021; Kojima et al., Reference Kojima, Oishi, Aoki, Matsubara, Uete, Fukushima, Inoue, Sato, Shiraishi, Hirooka and Masuda2022; Pereira et al., Reference Pereira, Centurión, Valdez and Martinez-López2025). The integration of such tools to modernise and refine existing protocols, tailoring them to each specific genotype–environment interaction scenario, is at the core of ongoing scientific, academic and applied research efforts (Passamonti et al., Reference Passamonti, Somenzi, Barbato, Chillemi, Colli, Joost, Milanesi, Negrini, Santini, Vajana, Williams and Ajmone-Marsan2021).
Artificial intelligence (AI) combines mathematical modelling, inferential statistics and computational procedures to simplify the analysis of complex data, enhance the accuracy of results, and generate predictions with minimal error (Ali et al., Reference Ali, El-adaway, Ahmed, Eissa, Nabi, Elbashbishy and Khalef2024; Pereira et al., Reference Pereira, Centurión, Valdez and Martinez-López2025). Machine learning (ML) aims to develop strategic and intelligent processing alternatives by simulating, replacing, or improving human decision-making, offering considerable advantages over classical statistical tools (Mouloodi et al., Reference Mouloodi, Rahmanpanah, Gohery, Burvill, Tse and Davies2021). Exploring ML models from an animal data bank, complex to address with conventional analyses, is a path to follow in animal husbandry, as it provides appropriate methodologies for generating knowledge from the complex dynamics of these data (Valletta et al., Reference Valletta, Torney, Kings, Thornton and Madden2017). By integrating numerical information from edaphoclimatic, genotypic, and phenotypic parameters, machine learning (ML) contributes to balancing productive objectives with environmental harmony (Pereira et al., Reference Pereira, Centurión, Valdez and Martinez-López2025). This subfield of AI has proven to be a viable approach for characterising behaviours, performances, yields, needs and interactions within the productive environment (Valletta et al., Reference Valletta, Torney, Kings, Thornton and Madden2017; Ghaffari et al., Reference Ghaffari, Jahanbekam, Sadri, Schuh, Dusel, Prehn, Adamski, Koch and Sauerwein2019; Ghaffari et al., Reference Ghaffari, Jahanbekam, Post, Sadri, Schuh, Koch and Sauerwein2020; Mota et al., Reference Mota, Pegolo, Baba, Morota, Peñagaricano, Bittante and Cecchinato2021; Passamonti et al., Reference Passamonti, Somenzi, Barbato, Chillemi, Colli, Joost, Milanesi, Negrini, Santini, Vajana, Williams and Ajmone-Marsan2021; Kojima et al., Reference Kojima, Oishi, Aoki, Matsubara, Uete, Fukushima, Inoue, Sato, Shiraishi, Hirooka and Masuda2022; Ruchay et al., Reference Ruchay, Kober, Dorofeev, Kolpakov, Dzhulamanov, Kalschikov and Guo2022; Siachos et al., Reference Siachos, Lennox, Anagnostopoulos, Griffiths, Neary, Smith and Oikonomou2024; Rao et al., Reference Rao, Zhang, Gao, Wang and Yang2025; Wang et al., Reference Wang, Chai, Chen, Zhang, Long, Diao, Chen, Guo, Tang and Wu2025), identifying and ranking critical variables, including for specific environments like freshwater wetlands (Pereira et al., Reference Pereira, Centurión, Valdez and Martinez-López2025). These studies also support the development of new animal selection and improvement protocols aimed at generating indices to select individuals with traits adapted to their environment.
In this context, the environment in question refers to the Ñeembucú wetlands, technically and historically recognised as special and specific freshwater ecosystems (RAMSAR, 2023), which have served for decades as calf production centres sustaining the traditional cattle-rearing cycle. Within this framework, a triangular scheme emerges as a structure of substantial importance for the sustainability of the livestock system, based on three key components: the harmonious management of the environment, the safeguarding of animal welfare and the consolidation of livestock productive efficiency (Martínez-López et al., Reference Martínez-López, Centurión-Insaurralde, Núñez-Yegros and Sponenberg2022).
The genotypes typically raised in the Ñeembucú wetlands include Brahman, Brangus and Nelore, along with two small groups of locally adapted genetic resources, known as Criollo Pilcomayo and Criollo Ñeembucú, the latter two currently at risk of extinction; identifying which of these genetic groups copes best with the humid heat characteristic of these ecosystems remains an ongoing challenge (Martínez-López et al., Reference Martínez-López, Centurión-Insaurralde, Núñez-Yegros and Sponenberg2022). Determining this adaptive capacity effectively is another fundamental aspect requiring scientific inquiry. In this regard, Pereira et al. (Reference Pereira, Centurión, Valdez and Martinez-López2025) emphasise the need to incorporate ML techniques into the process of identifying potentially important variables and predictive strategies for animal breeding programmes. In this context, they recommend the inclusion of body condition score (CC) as a reference parameter (Pereira et al., Reference Pereira, Centurión, Valdez and Martinez-López2025).
This study experiments with various ML models to: (1) investigate and determine substantial associations among relevant variables within the complex framework of genotype–environment interaction in calf-rearing livestock systems, with emphasis on the parameter CC in grazing cows within freshwater wetlands; (2) establish mathematical weightings to prioritise the influential parameters within this biological system; and (3) based on the resulting ranking, develop a tool referred to as the ‘Animal Selection Index for Adaptability (ISA)’ to identify and select breeding cows as part of a livestock improvement strategy in specific environments such as freshwater wetlands. Thus, the stages defined as objectives – developed and subsequently ordered in sequence – converge in the core outcome of this work, which constitutes the novel, complete protocol intended for implementation within the aforementioned ecosystem, aiming to improve reproductive efficiency in harmony with both the environment and animal welfare.
Materials and methods
Ethical approval
The study did not require specific ethical approval as no experimental manipulation of animals occurred. All data on adaptive parameters of different bovine genotypes raised in the Ñeembucú wetlands and surrounding areas (RAMSAR, 2023) were obtained from previous works (Martínez-López, Reference Martínez-López2020; Martínez-López et al., Reference Martínez-López, Centurión-Insaurralde, Núñez-Yegros and Sponenberg2022; Pereira et al., Reference Pereira, Centurión, Valdez and Martinez-López2025).
Data
Records were obtained from the above studies for 80 cows, aged four and five years, used for calf production and belonging to two Creole genotypes from Paraguay: Criollo Ñeembucú and Criollo Pilcomayo. It is important to clarify that, at the time of sampling, both groups were under a tangible threat of extinction and were undergoing frank degradation (Martínez-López et al., Reference Martínez-López, Centurión-Insaurralde, Núñez-Yegros and Sponenberg2022). In addition, three exotic breeds: Brangus, Nelore and Brahman, with 16 animals in each of the five groups. These cows belonged to four farms located in the Ñeembucú wetlands and surrounding areas, namely Isla Umbú (27°05′ S, 58°25′ W, 52 m a.s.l.), Nueva Italia (25°31′ S, 57°29′ W, 113 m a.s.l.), San Miguel (26°38′ S, 57°03′ W, 113 m a.s.l.) and Caapucú (26°17′ S, 57°10′ W, 65 m a.s.l.). The measurements corresponded to four time points in 2016, coinciding with the spring, summer, autumn and winter seasons. A total of 80 data points (16 animals from each of the five genotypes) were recorded for each season, totalling 320 observations.
Thirty-two variables concerning animal welfare, health and metabolic profile were considered, including ruminal frequency, respiratory frequency, heart rate, body temperature (°C), hair length (cm) and density, CC on a 1 to 5 scale (Edmonson et al., Reference Edmonson, Lean, Weaver, Farver and Webster1989), tick count in different body regions (flank, head, neck, armpit and groin) and the total frequency, concentrations in the blood of cortisol (nmol/l), haemoglobin (g/l), haematocrit (%), cholesterol (mmol/l), triglyceride (mmol/l), glutamic oxalic transaminase (GOT UI/ml), phosphatase (UI/ml), gamma-glutamyl transferase (GGT UI/ml), creatine phosphokinase (CPK UI/ml), calcium (mmol/l), phosphorus (mmol/l), magnesium (mmol/l), sodium (mmol/l), urea (mmol/l), creatinine (µmol/l), total proteins (g/l), albumin (g/l), globulin (g/l), in addition to the presence of endoparasites (yes/no) and the number of endoparasites identified. The combination of the five bovine genotypes and the four seasons was also considered as a factor of interest, constituting the 33rd variable. Variable selection was conducted in accordance with the methodology employed in a previous study (Pereira et al., Reference Pereira, Centurión, Valdez and Martinez-López2025), prioritising those variables based on their calculated quantitative preponderance within the ML procedure outlined in the statistical methods section.
Statistical methods
All numerical variables, except body condition score, were normalised to the min-max scale to range from 0 to 1 (Islam et al., Reference Islam, Mazumder, Shahriair Roni and Nur2024). Normalisation ensures unbiased model training and prevents variables with larger magnitudes from dominating the model (Pereira et al., Reference Pereira, Centurión, Valdez and Martinez-López2025). The data were split in 60% training and 40% testing to provide a larger test set for more rigorous evaluation and understanding of model performance, helping to identify overfitting issues early (Sivakumar et al., Reference Sivakumar, Parthasarathy and Padmapriya2024).
The H2O AutoML framework was executed with max_runtime_secs = 1200, max_models = 100, using root-mean-square error (RMSE) as the stopping metric. AutoML internally performs random grid and searches over hundreds of hyper-parameter combinations; therefore, no manual hyper-parameter selection was required. Five-fold, stratified cross-validation (nfolds = 5) together with early stopping (stopping_rounds = 5, stopping_tolerance = 0.001) mitigated over-fitting during the search. The model selection it was based on RMSE and mean absolute error (MAE) values:


where Si - estimated values; Oi - observed values and n - number of samples.
SHapley Additive exPlanations (SHAP) plots revealed the most important variables in predicting CC. SHAP plots provide a visual representation of a variable’s importance in the prediction. The h2o library (Fryda et al., Reference Fryda, LeDell, Gill, Aiello, Fu, Candel, Click, Kraljevic, Nykodym, Aboyoun, Kurka, Malohlava, Poirier and Wong2023) provided with R (R Core Team, 2024) was used to fit the ML models.
Proposed protocol for the animal selection index for adaptability
Based on the ML model and the animal parameter set included in the study, a new protocol for selecting and improving the cow herd for calf production in freshwater wetlands was proposed. The protocol involves following the detailed procedure developed in the materials and methods and sequentially presented in the results and discussion.
For ISA, the top 10 quantitative variables in the final ML model were numerically weighted by multiplying their ‘scaled importance’ (SI) by a factor of 1 or 100 %, or the proposed multiplication factor (FM) if the parameter analysis is within the reference range for some variables and scientific literature for other characteristics.
Results
Identification of main variables related to ‘genotype-environment interaction’ through machine learning modelling
Table 1 presents all the variables included in the initial ML model, listed according to the percentage of variance explained in the prediction of CC scores, evaluated on a scale from 1 to 5.
Table 1. Scaled importance and percentage explained of all variables included in the initial machine learning model to estimate the body condition score of five bovine genotypes across four seasons in the Ñeembucú wetlands

CPK = Creatine phosphokinase; GOT = Glutamic oxalic transaminase; GGT = Gamma-glutamyl transferase.
As shown, the combination of season and genotype factors reported the highest percentage of variance explained, accounting for approximately 33 % of the total, followed by creatinine and phosphatase levels, which had similar proportions.
Exploring the relationship between the variables selected in the final model and the CC score in cattle, Figure 1 displays the bivariate correlation coefficients.

Figure 1. Correlation of the selected variables based on their relative importance for the final machine learning model. * P < 0.05, ** P < 0.01, *** P < 0.001 by t-test. CPK= Creatine phosphokinase.
Higher and significant (P < 0.01 and P < 0.001) levels of association between CC and creatinine, calcium and haemoglobin were noted. In contrast, weaker but still significant (P < 0.05 and P < 0.01) relationships were observed with hair length, cholesterol, haematocrit and CPK.
In this study, multiple ML models were tested, with 10 showing the best fit for the studied case (Table 2), incorporating zootechnical variables associated with an important trait, the ‘CC’ of cows for reproduction, from different genotypes raised in the Ñeembucú wetlands, Paraguay. The ML model chosen to process the multiple zootechnical variables included in this approach was GBM_grid_1_AutoML_2_20240405_212421_model_469, due to its Root Mean Square Error (RMSE = 0.583) and Mean Absolute Error (MAE = 0.471) values.
Table 2. Performance indices of the 10 best-fitting models based on root mean square error (RMSE) and mean absolute error (MAE)

The five-fold cross-validation of the top GBM model (mean RMSE = 0.6 ± 0.09; mean MAE ≈ 0.48) mirrors the error obtained on the independent test set reported in Table 2 (RMSE = 0.583; MAE = 0.471). This close agreement indicates that the model’s predictive accuracy observed during resampling is reproduced when it is challenged with completely unseen data.
Figure 2 shows the relationship between the predicted and observed CC scores for the selected ML model indicated in Table 2, for both training and test data. It is verified that the same positive trend persists in the test data, although the slope is slightly shallower and the scatter wider, as expected when predictions face previously unseen variability; the relationship remains strong and aligned with the training data. This consistency between both panels demonstrates that the model generalises well to new observations and therefore provides visual evidence against over-fitting.

Figure 2. Correlation coefficient (r) between observed and estimated values by the selected model (a) considering training datasets and (b) testing datasets. **Significant at P < 0.01 by t-test.
In Figure 3a, residuals are plotted against the estimated CC scores, along with the 11 main explanatory variables listed in descending order of importance in Figure 3b. In this context, the season with genotype factor, phosphatase concentrations, cholesterol, phosphorus and hair length were the five most influential.

Figure 3. Residuals as a function of estimated body condition score values (a) and relative importance of the variables included in the final machine learning model (b) CPK = Creatine phosphokinase.
Figure 4 illustrates the individual contribution of each feature to the prediction of CC using the SHAP value-based approach.

Figure 4. SHAP contribution of the variables included in the final ML model for the increase or decrease in body condition score of five bovine genotypes analysed with data from four seasons of the year. CPK = Creatine phosphokinase.
Low concentrations of phosphatase and CPK, as well as reduced hair length, were found to positively impact CC scores. Similarly, the dynamic behaviour of body temperature was observed in the response variable. However, high levels of haematocrit and haemoglobin positively influenced CC.
Figure 5 shows the SHAP values related to phosphatase, cholesterol, blood phosphorus concentrations and hair length, all selected and included in the final model, to identify more specific characteristics about their dynamic behaviour in relation to CC scores, not identified in the SHAP summary plot (Figure 4).

Figure 5. SHAP values as a function of normalised values (0 to 1) for phosphatase (a) cholesterol (b) phosphorus (c) and hair length (d) in five bovine genotypes. BG = Brangus; BH = Brahman; CÑ = Criollo Ñeembucú; CP = Criollo Pilcomayo; NE = Nelore.
Consistent with Figure 4, a decrease in CC, scored on a scale from 1 to 5, was observed as phosphatase levels increased (Figure 5a). However, in some observations concerning the spring and summer seasons, a significant reduction in CC was noted, particularly from the normalised value of 0.4.
On the other hand, it is noteworthy that during the winter, a negative impact of cholesterol on CC was detected in most of the evaluated cows (Figure 5b). However, in general, the dynamic behaviour does not show a defined trend, with both positive and negative SHAP values being observed.
Following the same line of analysis, a decrease in CC scores was observed with higher blood phosphorus concentrations (Figure 5c). However, in some observational units, the impact was favourable. Moreover, an increase in hair length up to approximately 0.50 on the normalised scale (Figure 5d) did not affect CC scores. However, considering the seasonal factor, with higher hair length values, a decrease in CC was detected, particularly in autumn.
Regarding creatinine (Figure 6a), a negative impact on CC was detected with increased creatinine levels in certain cows, particularly those evaluated in summer and winter. In contrast, elevated haematocrit levels were associated with better body condition (Figure 6b). Regarding CPK, an exponential decrease in CC scores was illustrated with an increase in CPK levels in the blood (Figure 6c). Conversely, haemoglobin levels were associated with an increase in the predictive variable (starting from a value of 0.5 on the normalised scale) (Figure 6d).

Figure 6. SHAP values as a function of normalised values (0 to 1) for creatinine (a) haematocrit (b) CPK (c) and haemoglobin (d) in five bovine genotypes. BG = Brangus; BH = Brahman; CÑ = Criollo Ñeembucú; CP = Criollo Pilcomayo; NE = Nelore; CPK = Creatine phosphokinase.
As for body temperature, it is noteworthy that as it increases and varies around median values (0.5 in normalised value), CC scores stabilise in most animals (Figure 7a). However, the graph also reveals that at higher temperatures, the observed effect is negative in some cows.

Figure 7. SHAP values as a function of normalised values (0 to 1) for body temperature (a) calcium level (b) and season with genotype (c) in five bovine genotypes across four seasons of the year. BG = Brangus; BH = Brahman; CÑ = Criollo Ñeembucú; CP = Criollo Pilcomayo; NE = Nelore.
Additionally, the influence of the combined season-genotype factor is shown in Figure 7c. During the winter season, a strong negative influence on CC scores was revealed, contributing to a decrease, particularly in the Nelore genotype. In contrast, in summer, the results indicated a positive influence on CC in the different bovine groups, except for Nelore. Similarly, in Criollo Ñeembucú cows, mainly during autumn. As seen, there are variations in SHAP value dynamics according to genotype and season, suggesting the importance of considering the combined factor in the model.
Proposed protocol for the animal selection index for adaptability
This article proposes an innovative animal selection protocol using the ISA methodology, considering the model with filtered-final variables obtained through the ML strategy, following the procedure described in the materials and methods section, considering two main elements, ‘scaled importance’ and the multiplication factor (SI and FM, respectively), for each of the variables chosen in the importance scale. The details are presented in Table 3.
Table 3. Scaled importance of all variables included in the filtered and final machine learning model and the proposed multiplication factor

CPK = Creatine phosphokinase; ML = Machine learning; FM = Multiplication factor for the model, where if the observed value of the variable is within the reference range, FM takes the value of 1, multiplying by its associated SI, that is, by 100 % of the ‘SI’. Otherwise, FM takes a null value (zero).
1 Scaled: FM=1 or 100 % of SI, if 0.0 cm ≤ Hair length ≤ 1.0 cm; FM= 0.75 or 75 % of SI, if 1.1 cm ≤ Hair length ≤ 2.0 cm; FM= 0.5 or 50 % of SI, if 2.1 cm ≤ Hair length ≤ 3.0 cm; FM= 0.25 or 25 % of SI, if 3.1 cm ≤ Hair length ≤ 4.0 cm; FM= 0 or 0 % of SI, if 4.1 cm ≤Hair length.
In the proposed new protocol, the example shows that the ‘FM’ column displays a value of 1 in almost all rows. When it shows ‘1’ it means that the value is within the normal range of the variable, i.e., 100 % of the ideal referenced value. Following the same example in Table 3, for the phosphatase variable, the laboratory reference range for normality is suggested between 0 and 488 IU/ml (Instituto de Investigaciones en Ciencias de la Salud-Universidad Nacional de Asunción, 2016). If the animal under evaluation has a laboratory result within that range, it receives a full ‘1’ (i.e., 100 % of the scaled importance SI), which will be the multiplier. If it is not within the range, it receives a ‘0’ weighting for that variable, and in the end, this will detract from that animal’s value when applying the proposed formula to obtain the ISA, thus decreasing its index (ISA) and, consequently, its ranking.
Subsequently, it is proposed to obtain the Animal Selection Index for Adaptability or the bovine’s adaptation capacity in the environment where it is intended to produce or reproduce (ISA), using the following mathematical formula:

where ISA= adaptability selection index; n1= animal 1 under evaluation; NVR= number of variables ranked according to their scale of importance in the final ML model, in this case, there are 10 variables in total; SI= scaled importance of the variables, in this case, there are 10 quantitative variables selected in the ML; FM = multiplication factor.
Finally, to facilitate result comparison, the following formula is proposed to standardise the ISA on a scale of 0 to 100:

The maximum expected ISA value will be obtained when all FM values in Table 3 are equal to 1, indicating that all variables are within the reference range.
Next, Table 4 shows, as an example, the observed and standardised ISA values for 10 animals that were part of this study. Also, the associated ranking for each animal.
Table 4. Observed animal selection index for adaptability (ISA) values and standardised ISA with associated ranking for 10 animals that were part of the study

RR = Reference range of variables; CPK= Creatine phosphokinase; FM = multiplication factor; SI = scaled importance; ID = animal identification Code; ISA = animal selection index for adaptability. Value corresponds to the observed value of each variable for each animal. The maximum expected value of the ISA is obtained when all FM for all variables is equal to 1, i.e., when all are within the reference range.
In Table 4, it is observed that the animal with identification code (ID) 24 had the highest ISA. Therefore, it ranks first (1st) in the ranking. The animal ID 70 was placed second (2nd) with an ISA value of 86, while the animal ID 101 ranked third, with an ISA of 78. On the other hand, animals with ID 37 and 38 had the same ISA (72), resulting in a tie. In this context, when animals achieve the same ISA, they are assigned the same ranking position, as shown in Table 4, where only 10 animals were processed. In this case, animals ID 37 and 38 are ranked fourth (4th). Immediately, in the fifth position, the animal ID 103 was placed with an ISA of 70. All animals were similarly ranked according to the ISA value obtained, placing the animal with the highest ISA in the first position.
Figure 8 presents the schematic diagram outlining the implementation process of the selection and improvement protocol for cow herds destined for calf production in freshwater wetland environments.

Figure 8. Illustrative schematic of the implementation process of the innovative herd improvement programme for calf production based on adaptive capacity to freshwater wetlands. CPK = Creatine phosphokinase; SHAP = SHapley additive exPlanations.
The primary variables to be measured in cows are those derived from the ML model, given their calculated relative importance.
Discussion
Main findings derived from the machine learning methodology
The processing of multiple zootechnical variables to achieve bovine production efficiency in special and specific environments, such as wetlands and their basin, is crucial; and in this context, the ML approach is a highly valuable tool (Pereira et al., Reference Pereira, Centurión, Valdez and Martinez-López2025; Wang et al., Reference Wang, Chai, Chen, Zhang, Long, Diao, Chen, Guo, Tang and Wu2025). This significant approach to artificial intelligence provides a diverse range of models based on different types of algorithmic calculations and processing speeds (Valletta et al., Reference Valletta, Torney, Kings, Thornton and Madden2017; Dhaliwal and Williams, Reference Dhaliwal and Williams2024). Thus, from the total number of variables exposed, the first 11 were selected in the final ML model for the prediction of CC, all with a calculated quantitative preponderance exceeding the proportion reported by Pereira et al. (Reference Pereira, Centurión, Valdez and Martinez-López2025), which corresponded to the 2 % threshold.
It is important to remember that for efficient cattle breeding, CC is a global and specific trait of great relevance in females due to its strong correlation with their prolific capacity. In this regard, Ghaffari et al. (Reference Ghaffari, Jahanbekam, Sadri, Schuh, Dusel, Prehn, Adamski, Koch and Sauerwein2019) state that among the most important factors influencing productivity, reproduction, health and longevity of cattle is CC. It should be noted that the CC scoring scale corresponds to a methodology used in previous studies on cattle (Köck et al., Reference Köck, Ledinek, Gruber, Steininger, Fuerst-Waltl and Egger-Danner2018; Mota et al., Reference Mota, Pegolo, Baba, Morota, Peñagaricano, Bittante and Cecchinato2021).
The consideration of season and genotype in the analysis is relevant, as they may influence CC. In this context, a previous study (Worku et al., Reference Worku, Kechero and Janssens2021) highlights the negative influence of the dry season on animals, which can affect their CC. Similarly, Ribeiro et al. (Reference Ribeiro, Sanglard, Snelling, Thallman, Kuehn and Spangler2022) mention fluctuations in CC likely associated with seasonal changes, being more favourable for certain breeds compared to others.
In this study, weak to moderate associations between CC and the evaluated characteristics were evidenced (Rusakov, Reference Rusakov2023). Given the analysis conducted and the low correlations detected, the consideration of more complex procedures, such as ML, is justified for estimating CC.
Model performance
According to Dhaliwal and Williams (Reference Dhaliwal and Williams2024), compared to typical regression models, statistical methods based on ML techniques represent a better alternative, considering that they allow for a more thorough understanding of the evaluated data set, resulting in a considerable improvement in predictions, in addition to being simpler in terms of interpretability compared to deep learning algorithms. However, the size, quality and distribution of parameters in the considered data set influence the effectiveness of ML models (Simwanda and Ikotun, Reference Simwanda and Ikotun2024). Additionally, some authors (Dhaliwal and Williams, Reference Dhaliwal and Williams2024) point out potential overfitting issues, especially when models overly focus on training data. In this context, ensemble methods, including boosting, are considered an interesting and useful approach to reduce overfitting and increase accuracy (Dhaliwal and Williams, Reference Dhaliwal and Williams2024) and are widely employed methodologies (Chen and Guestrin, Reference Chen and Guestrin2016).
The key indices used to select ML estimation patterns, such as RMSE and MAE, were not significantly low, but this model was selected for its ability to approximate the impact and relevance of each variable included in this approach, to establish statistically significant estimates and associations with the trait ‘Body Condition’ of cows for calf rearing in the mentioned environments, in different seasons of the year. Therefore, the GBM_grid_1_AutoML_2_20240405_212421_model_469 model led to coherently high and rational predictions, based on its potential to determine important correlations between observed and estimated values.
The RMSE value evidenced in this study was higher than those obtained by Mota et al. (Reference Mota, Pegolo, Baba, Morota, Peñagaricano, Bittante and Cecchinato2021) in infrared predictions by Fourier transform of CC scores in specialised and dual-purpose dairy cattle breeds, considering different cross-validation scenarios using the Gradient Boosting Machine (GBM). They reported that predictive capacity improves as the training population size increases, and they also emphasised that including populations of different breeds in the training set enhances prediction accuracy, making it a useful tool for predicting phenotypes of biological and economic importance in dairy production.
It is worth noting that in this study, the chosen model belongs to the GBM implementation, which is based on an iterative process, due to the sequential construction of regression trees, correcting previous errors to improve model performance (Simwanda and Ikotun, Reference Simwanda and Ikotun2024). In this context, (1) the number of trees in the sequence, (2) the learning rate, (3) the maximum tree depth and (4) the minimum samples considered in each leaf are very important factors that affect error minimisation in the validation set through GBM (Mota et al., Reference Mota, Pegolo, Baba, Morota, Peñagaricano, Bittante and Cecchinato2021).
Natekin and Knoll (Reference Natekin and Knoll2013) note that, although GBMs offer advantages, such as high customisability to capture complex non-linear relationships, they also entail disadvantages, including high memory consumption in model storage, depending on the number of iterations performed; this number can become very large, for example, in applications involving intrusion detection systems. In this regard, Natekin and Knoll (Reference Natekin and Knoll2013) recommend seeking a balance between model complexity and the number of evaluations. However, the eXtreme Gradient Boosting algorithm can be considered an alternative, noted for its computational efficiency and predictive capability (Chen and Guestrin, Reference Chen and Guestrin2016; Ali et al., Reference Ali, El-adaway, Ahmed, Eissa, Nabi, Elbashbishy and Khalef2024; Simwanda and Ikotun, Reference Simwanda and Ikotun2024).
Gradient-boosted decision trees generally excel on medium-sized, tabular regression problems because (a) capture complex, non-linear relationships, (b) handle mixed feature types natively and (c) are relatively robust to skewed distributions. Comparative studies show GBM as the standard choice for tabular data when the goal is to maximise predictive performance with moderate tuning effort (Song et al., Reference Song, Liu, Liu and Wang2021). These empirical advantages, together with the leaderboard results, supported selecting GBM as the final predictive model, which was subsequently interpreted with SHAP values.
According to the correlation coefficients presented for the selected model, high predictive capability is suggested (Schober and Schwarte, Reference Schober and Schwarte2018).
Regarding the SHAP methodology used in this study, it is quite useful for interpreting predictive models derived from ML algorithms, considering that they allow visualising the specific contribution of features in the final model, discriminating their magnitude and direction (Rodríguez-Pérez and Bajorath, Reference Rodríguez-Pérez and Bajorath2020). According to Simwanda and Ikotun (Reference Simwanda and Ikotun2024), this tool positively influences the interpretability of results derived from ML, enabling the quantification of the impact of variables in the final model (Yan et al., Reference Yan, Hu, Li and Lin2024). In this work, they allowed for detecting the dynamic behaviour of the different variables evaluated.
Implications for animal husbandry in wetlands
Within this analytical framework, Pereira et al. (Reference Pereira, Centurión, Valdez and Martinez-López2025) reported results consistent with those presented here regarding the behavioural dynamics of hair length and alkaline phosphatase. It is worth highlighting that bovine coat characteristics serve as important indicators in the process of animal adaptation under specific and unique environmental conditions, such as those found in the Ñeembucú wetlands. Likewise, the consideration of alkaline phosphatase is relevant, as abnormal levels may indicate the presence of bone disease or hepatic damage (Urrego Gallego et al., Reference Urrego Gallego, Vargas Sanchez, Ayala Aguirre and Silva Ramírez2017).
Concerning creatinine, its increasing levels were negatively associated with CC. This pattern was particularly evident in animals evaluated during the summer and winter seasons. Lamp et al. (Reference Lamp, Derno, Otten, Mielenz, Nürnberg, Kuhla and Lamp2015) reported that cows subjected to heat stress, in addition to exhibiting reduced feed intake, showed elevated creatinine levels. They argued that environmental heat may influence the biological functions of dairy cows, triggering extensive tissue protein degradation, which could negatively affect productive performance. In a similar way, Martínez-López et al. (Reference Martínez-López, Centurión-Insaurralde, Núñez-Yegros and Sponenberg2022) stated that the mobilisation of body reserves and muscle catabolism may have influenced creatinine levels during the winter, particularly in the Nelore group, in line with the low CC observed; however, their findings also suggest adequate renal function without compromising tissue protein in the criollo groups raised under an extensive production system in wetland areas, considering the creatinine levels observed, particularly during the autumn season.
In relation to haematocrit, Lamp et al. (Reference Lamp, Derno, Otten, Mielenz, Nürnberg, Kuhla and Lamp2015) observed that environmental heat induces marked alterations in haematocrit levels, resulting in a reduction, particularly in cows exposed to heat stress. In the present study, a positive impact on CC was evidenced, in line with the findings of Pereira et al. (Reference Pereira, Centurión, Valdez and Martinez-López2025) regarding weight estimation in cows raised in wetlands, with haematocrit emerging as the second most influential variable in the final model.
Regarding CPK, a similar pattern was reported by Pereira et al. (Reference Pereira, Centurión, Valdez and Martinez-López2025), who observed that lower levels of this biomarker were associated with increased weight gain in cows raised in the Ñeembucú Wetland and its surrounding areas. It is worth emphasising that CPK is a widely recognised stress biomarker (Mpakama et al., Reference Mpakama, Chulayo and Muchenje2014), and its concentrations may rise in response to a range of psychological, nutritional, or physical stressors, such as muscle injury, exposure to high ambient temperatures, or pain (Loudon et al., Reference Loudon, Tarr, Pethick, Lean, Polkinghorne, Mason, Dunshea, Gardner and McGilchrist2019). Chen et al. (Reference Chen, Chen, Tu, Lee, Chen and Hsu2023) highlight the relevance of implementing strategies that enhance livestock productivity and reproductive performance during transitional periods, particularly in environments characterised by elevated heat and humidity, such as subtropical regions. In this regard, it is important to underscore that the study area (Ñeembucú) constitutes the most representative wetland ecosystem in Paraguay – an environment that has played a crucial role in the adaptive and selective processes of creole cattle breeds (Martínez-López et al., Reference Martínez-López, Centurión-Insaurralde, Núñez-Yegros and Sponenberg2022).
Haemoglobin levels were positively associated with an increase in CC, which is consistent with the findings of Campos-Gaona et al. (Reference Campos-Gaona, Correo-Orozco and Vélez-Terranova2024), who reported a positive association between haemoglobin levels, haematocrit and CC in cows during the transition period in the Colombian lowland tropics. They argue that haematocrit may represent a more suitable haematological parameter for assessing metabolism and water balance. Nonetheless, considering its role in homeostatic regulation and the ease of measurement, both indicators may be regarded as relevant biomarkers.
With respect to temperature, Lamp et al. (Reference Lamp, Derno, Otten, Mielenz, Nürnberg, Kuhla and Lamp2015) found that dairy cows under stress conditions exhibited an increase in rectal temperature. The results obtained in the present study differ from those reported by Pereira et al. (Reference Pereira, Centurión, Valdez and Martinez-López2025), who excluded this variable due to its low relevance in weight estimation; however, they concur on the inclusion of calcium as a variable in the final model, owing to its relative importance in the estimation of CC. For their part, Ghasemi et al. (Reference Ghasemi, Amanlou, Maheri-Sis, Salamatdoust-Nobar and Jozghasemi2024) found that calcium status does not influence variations in CC in primiparous and multiparous cows. Nevertheless, their results suggest that it may affect the incidence of metabolic disorders in different ways, highlighting the importance of homeostatic regulation of this parameter.
On the other hand, according to Ott et al. (Reference Ott, Manneck, Schrapers, Rosendahl and Aschenbach2023), breed is an influential factor in CC, which is consistent with the findings of the present study, where Nelore cattle exhibited lower CC compared to the other genetic groups, particularly during the winter season. These results are in agreement with the findings of Martínez-López et al. (Reference Martínez-López, Centurión-Insaurralde, Núñez-Yegros and Sponenberg2022), who reported significantly lower CC values for this breed, suggesting a greater functional commitment aimed at achieving more efficient performance in wetland environments.
It is important to highlight that wetland represent and cover large areas of the planet (RAMSAR, 2023), many of them with agricultural and livestock activities. In the new protocol proposed, aiming to obtain a ranking for each cow selected by the adaptability criterion to freshwater wetlands and seeking efficiency in animal reproduction, it is clarified that for blood-type variables, reference ranges provided by a nationally accredited reference laboratory (Instituto de Investigaciones en Ciencias de la Salud-Universidad Nacional de Asunción, 2016).
For the hair length variable, it was observed that the shorter the hair length, the higher the CC. Studies by Sarlo Davila et al. (Reference Sarlo Davila, Hamblen, Hansen, Dikmen, Oltenacu and Mateescu2019) on hair characteristics in Brahman-Angus multiracial cows found that hair length in the inner layer was 0.749 cm, while in the outer layer, it was 1.457 cm, based on samples taken from the shoulder. Similarly, Maia et al. (Reference Maia, Silva and Bertipaglia2003) reported average hair lengths in Holstein cows from a commercial herd in the state of São Paulo (Brazil) with black hair, a value of 1.205 cm, while in white-haired females, it was 1.426 cm, noting that samples were collected 20 cm below the column, in the centre of the trunk. Landaeta-Hernández et al. (Reference Landaeta-Hernández, Zambrano-Nava, Hernández-Fonseca, Godoy, Calles, Iragorri, Añez, Polanco, Montero-Urdaneta and Olson2011) indicated that in Criollo Limoneras females from the Zulia region (Venezuela), hair length in the cervical-lateral neck area in animals classified with smooth hair showed an average value of 0.49 ± 0.012 cm, while in cows categorised with normal hair, they found average lengths of 1.09 ± 0.02 cm.
Borges (Reference Borges2015), in turn, reported hair lengths ranging from 0.445 to 0.461 cm in Nelore cattle raised in Torixoréu and Uberlândia (Brazil), classifying them as short-haired animals and suggesting good adaptability to the rearing environment. Along similar lines, Botero (Reference Botero2008), in his study on hair length characterisation in bovine groups in Zulia (Venezuela), proposed a classification criterion based on hair length, as follows: short (≤0.30 cm), medium (0.31–0.50 cm), long (0.51–0.90 cm) and very long (>0.90 cm). Among the conclusions drawn, it was noted that bulls with a higher proportion of Zebu ancestry exhibited shorter hair on the back and neck.
In a study on fertility and hair coat characteristics in Holstein cows raised in Descalvado (São Paulo, Brazil), Bertipaglia et al. (Reference Bertipaglia, Silva and Maia2005) considered three hair length categories (>1.0 cm; 1.0–1.5 cm; and >1.5 cm) to assess their effect on the number of inseminations per conception. No significant differences were detected among the categories. The average hair length was approximately 1.26 cm, which was classified as short hair, suggesting good suitability for tropical environments. Similarly, Bianchini et al. (Reference Bianchini, McManus, Lucci, Fernandes, Prescott, Mariante and Egito2006) reported that the Nelore, Curraleiro and Junqueira breeds exhibited characteristics more compatible with tropical climates, with respective hair lengths of 7.43, 7.73 and 6.12 cm, highlighting their capacity for heat tolerance and resilience under nutritional constraints.
According to Sarlo Davila et al. (Reference Sarlo Davila, Howell, Nunez, Orelien, Roe, Rodriguez, Dikmen and Mateescu2020), cattle with shorter hair coats exhibit highly relevant thermoregulatory adaptations that enhance the efficiency of heat dissipation – findings that align with the conclusions of do Nascimento Barreto et al. (Reference do Nascimento Barreto, Jacintho, Barioni Junior, Pereira, Nanni Costa, Zandonadi Brandão, Romanello, Novais Azevedo and Rossetto Garcia2024). Based on these findings, the importance of short hair in cattle raised in hot and humid environments is emphasised. Accordingly, the following FM scale for hair length is proposed, assigning greater weight to shorter hair coats. It is worth noting that the cows analysed in the present study exhibited an average hair length of 2.08 cm, with a minimum of 0.5 cm and a maximum of 4.75 cm (Pereira et al., Reference Pereira, Centurión, Valdez and Martinez-López2025)

For the body temperature variable, normal values were considered those within the range of 38 to 39.3°C, respectively (Constable et al., Reference Constable, Hinchcliff, Done and Grünberg2017; Divers and Peek, Reference Divers and Peek2018).
According to Sarlo Davila et al. (Reference Sarlo Davila, Hamblen, Hansen, Dikmen, Oltenacu and Mateescu2019), based on the estimated genetic parameters for hair coat characteristics and body temperature, an exploitable variance was reported for improving thermotolerance in cattle. Within this context, they recommend the incorporation of the Brahman genetic group into breeding herds. These findings support the inclusion of these two variables in the ISA, as they were selected based on their relative importance determined through the ML procedure (Pereira et al., Reference Pereira, Centurión, Valdez and Martinez-López2025).
Regarding the use of the quadratic term in the proposed ISA index formula, it is intended to assign greater impact to those variables that have more influence according to the model considered, as well as to more accurately describe the interrelationship between biological variables. Various studies concerning different contexts with biological parameters have indicated the use of quadratic terms (Trevisan et al., Reference Trevisan, Bullock and Martin2021; Younis et al., Reference Younis, Mohamed Ahmed, Ahmed, Yehia, Abdelkarim, El-Abedein and Alhamdan2022; Bernabeu et al., Reference Bernabeu, McCartney, Gadd, Hillary, Lu, Murphy, Wrobel, Campbell, Harris, Liewald, Hayward, Sudlow, Cox, Evans, Horvath, McIntosh, Robinson, Vallejos and Marioni2023; Júnior et al., Reference Júnior, Araújo, Silva, Araújo, Lôbo, Nakabashi, Castro, Menezes, Maciel e Silva, Silva, Silva, Barbosa, Marques and Lourenço Júnior2023 a; Besteiro et al., Reference Besteiro, Arango, Rodríguez and Fernández2024). It is worth noting that in the exhaustive study of biological systems, the use of mathematical models is very important, especially non-linear ones (Júnior et al., Reference Júnior, de Araújo, de Menezes, de Araújo, Pavan, Rocha-Silva, da Silva, Marques, Maciel e Silva, de Menezes Chalkidis and de Brito Lourenço2023 b). Similarly, ML processes provide a useful methodology for studying non-linear relationships among zootechnical variables (Ruchay et al., Reference Ruchay, Kober, Dorofeev, Kolpakov, Dzhulamanov, Kalschikov and Guo2022).
Considering that all the variables included in this study are potentially heritable traits, which, when combined with the ‘genotype-environment interaction’ criterion (or environment), are aimed at selecting those with the best-ranked animals, first applying the ML model and then the ISA formula; thus promoting an innovative programme for selecting individuals and groups based on adaptability to freshwater wetlands and areas of influence, seeking to increase the prolific efficiency of the herd, without forgetting traditional paradigms of conventional improvement, such as the positive correlation between CC associated with the animal’s ability to conceive, gestate, give birth and lactate healthy calves.
According to Talukder et al. (Reference Talukder, Hipel and vanLoon2017), a composite indicator is commonly expressed as the sum of the products between the normalised variable (xi) and its weight (wi). That is, if the data share identical measurement units, xi does not require normalisation – though various techniques are available (Nardo et al., Reference Nardo, Saisana, Saltelli and Tarantola2005). In the context of the ISA formula, each variable’s weight is derived from its relative importance in the ML model used to estimate CC. Furthermore, rather than employing raw data values, the FM score – ranging from 0 to 1 – is utilised for all variables, ensuring they are bounded within a common interval.
Unlike traditional composite indicator structures, the ISA is not aggregated linearly but quadratically, thereby assigning greater emphasis to high-impact variables within their optimal range (FM = 1). Notably, Hazel (Reference Hazel1943) developed a linear, additive selection index, in which an animal’s merit is a direct function of phenotypic observations. While straightforward, index is limited, as an index derived for one region may not be applicable elsewhere due to specific factors such as population size, environmental conditions, or management practices (Hazel, Reference Hazel1943). Wilton et al. (Reference Wilton, Evans and Van Vleck1968) employed a quadratic index in addition to the linear one. Similarly, Ruales and Manrique (Reference Ruales and Manrique2007) demonstrated use of linear selection indices based on principal component analysis in a Creole breed. Wellmann (Reference Wellmann2023) also developed an optimal selection index, noting the importance of method choice – particularly where non-linearity exists – and considering directional features of conventional indices. In genomic selection processes, especially in cattle and poultry breeding, ML approaches have proven effective for modelling complex and dynamic biological relationships (Rao et al., Reference Rao, Zhang, Gao, Wang and Yang2025).
Drawing on these precedents, the innovative ISA methodology here proposed seeks to synergise multiple measurable traits in cows, applying objective weighting derived from the ML procedure, for the purpose of selecting animals adapted to special and specific environments such as the Ñeembucú wetlands, which cover approximately 100 000 ha (RAMSAR, 2023).
Once homogeneous groups of cows (in terms of age, body weight, health status and CC) have been selected to participate in the proposed selection programme exclusively from farms located within the Ñeembucú wetlands and their associated basin a structured sampling and data collection protocol is implemented across all four seasons of the year, ensuring accurate animal identification. The primary variables to be measured are those selected by the ML method and outlined in this study.
At the conclusion of the study year, with a complete database compiled, the ISA formula is applied to generate individual scores for each animal. This process results in the establishment of a ranking based on positive performance for calf production under the described hot and humid conditions. From this ranking, the highest-scoring individuals in terms of adaptability are selected. These cows will constitute an elite group of founder animals, whose female offspring may subsequently be reselected to continue the annual measurement process, thereby expanding herd size and enhancing genetic variability. Such expansion is expected to increase the precision of selection.
Practical implications
It is essential to emphasise that the implementation of this programme must include institutional and administrative considerations due to the unavoidable need for the involvement of regional producer associations, public agencies, government bodies from the agricultural sector, industry stakeholders and local academic and research institutions. The programme is designed as a regional development strategy and, as such, necessitates this multi-stakeholder engagement. However, it is proposed as a superior alternative to existing methodologies currently applied in the region, many of which are promoted by corporations importing selection techniques from markedly different environments, climates and countries.
As a result, within a few years, it will be possible to obtain sizeable cohorts of cows capable of producing calves efficiently in the targeted region, while remaining in harmony with the environment and adhering to animal welfare principles. It should be reiterated that these wetlands are uniquely suited to calf and weaning production, owing to their specific edaphoclimatic conditions and agroecological systems.
It is emphasised that this work uniquely presents a mathematical model with biological interpretation suitable for bovine physiological variables to identify cows with better adaptive capacities to the freshwater wetlands of Ñeembucú, always associated with important zootechnical parameters such as CC. Thus, a new complete selection and breeding protocol is proposed, tailored and adjusted to the real conditions of the specified area, considering the good heritability values achieved by these traits, aiming to increase the level of precision in animal selection, reduce the bias observed with other programmes implemented in the area (imported from other environments) and generate a product in harmony with the referred environment.
Limitations and future perspectives
The number of animals on which this study is based could be seen as relatively small (a potential limitation), but it is extremely important to highlight that two of the five genetic groups belong to local herds in a critical state of extinction. In addition, every effort was made to homogenise all characteristics of the cows included, considering that they are part of real production systems, with only their breed composition, environment and breeding management allowed to vary. It is also noted that the selection protocol developed in this study and proposed to the technical, academic and scientific communities may serve as a robust platform for future refinement. Furthermore, this methodology is a replicable tool for other freshwater wetlands with cattle production and can be adapted to each specific case.
Conclusion
An innovative protocol for identifying and selecting breeding cows for wetlands is proposed, using ML from variables associated with the ‘genotype-environment interaction’ criterion, applying the ISA equation for ranking animals by adaptability criteria.
Physiological and morphological variables of breeding cows, associated with their body condition, combined with specific ML strategies, sufficiently contributed to constructing importance levels and weighting animal performance in wetlands.
The most suitable ML model, GBM_grid_1_AutoML_2_20240405_212421_model_469, demonstrated adequate performance, considering cross-validation and residual analysis, determining its predictive potential and reliability. SHAP analysis concluded that low phosphatase and CPK levels increase CC, while high haematocrit and haemoglobin concentrations are beneficial. Hair length and body temperature showed negative influences on CC when high values were recorded for both.
The ISA was created and developed based on analysed and weighted variables’ scaled importance and multiplication factor, allowing objective ranking of study animals according to their adaptive potential and prolificity. The proposed protocol strategically combines ML with animal selection and improvement processes, providing a powerful tool adaptable to other regions and environmental conditions.
Advanced ML techniques combined with new mathematical modelling and statistical inferences strengthen the precision and efficiency of the animal selection process, offering a more comprehensive approach suited to the complex ‘genotype-environment interaction’ criterion, promoting strategic advancement in animal husbandry.
Acknowledgements
National Council of Science and Technology of Paraguay (CONACYT). To the University Research Scholarship Program, ‘Andrés Borgognon Montero’ (PUBIABM). Centro Multidisciplinario de Investigaciones Tecnológicas, Universidad Nacional de Asunción (CEMIT/UNA).
Author contributions
Conceptualisation: RML; writing – review & editing: RML, LMC and WEP; writing – original draft: RML, LMC and WEP; visualisation: RML, LMC and WEP; supervision: RML; methodology: RML and WEP; investigation: RML, LMC and WEP; formal analysis: RML and WEP. All authors read and approved of the final manuscript.
Funding statement
This research received no specific grant from any funding agency, commercial or not-for-profit sectors.
Competing interests
The authors declare there are no conflicts of interest.
Ethical standards
Not applicable.