To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Macrophytes serve as indicators of aquatic ecosystem health and are often employed in monitoring the condition of water bodies. Traditionally, such observations are conducted in situ, but remote sensing offers a cost-effective and scalable alternative. Here, an algorithm for macrophyte detection using satellite data was created; we utilized clustering, with its results serving as target labels for building a machine-learning model. We developed a model for macrophyte identification using reflectance data in the near-infrared band during spring and summer. The derived algorithm, employing Sentinel-2 satellite reflectance data, enables the identification of open water, submerged and floating macrophytes and emergent macrophytes. This approach enhances the efficiency and applicability of macrophyte assessment, bridging the gap between field observations and remote sensing for comprehensive aquatic ecosystem monitoring.
The decline in fed cattle cash sales and its impact on price discovery are concerning. This study extends existing literature by utilizing machine learning to explore factors, particularly decision trees and random forests, to explore factors influencing fed cattle price ranges, complementing traditional regression analyses. These models uncover hidden patterns and provide additional insights into the cattle market. Key variables such as weight range, head count, and trade location, are found to be associated with price ranges. Notably, the weight range emerges as the primary variable influencing the price range, with smaller weight ranges linked to lower price ranges.
This chapter puts forward new guidelines for designing and implementing distributed machine learning algorithms for big data. First, we present two different alternatives, which we call local and global approaches. To show how these two strategies work, we focus on the classical decision tree algorithm, revising its functioning and some details that need modification to deal with large datasets. We implement a local-based solution for decision trees, comparing its behavior and efficiency against a sequential model and the MLlib version. We also discuss the nitty-gritty of the implementation of decision trees in MLlib as a great example of a global solution. That allows us to formally define these two concepts, discussing the key (expected) advantages and disadvantages. The second part is all about measuring the scalability of a big data solution. We talk about three classical metrics, speed-up, size-up, and scale-up, to help understand if a distributed solution is scalable. Using these, we test our local-based approach and compare it against its global counterpart. This experiment allows us to give some tips for calculating these metrics correctly using a Spark cluster.
In this chapter, we introduce some of the more popular ML algorithms. Our objective is to provide the basic concepts and main ideas, how to utilize these algorithms using Matlab, and offer some examples. In particular, we discuss essential concepts in feature engineering and how to apply them in Matlab. Support vector machines (SVM), K-nearest neighbor (KNN), linear regression, Naïve Bayes algorithm, and decision trees are introduced and the fundamental underlying mathematics is explained while using Matlab’s corresponding Apps to implement each of these algorithms. A special section on reinforcement learning is included, detailing the key concepts and basic mechanism of this third ML category. In particular, we showcase how to implement reinforcement learning in Matlab as well as make use of some of the Python libraries available online and show how to use reinforcement learning for controller design.
Real-time gait trajectory planning is challenging for legged robots walking on unknown terrain. In this paper, to realize a more efficient and faster motion control of a quadrupedal robot, we propose an optimized gait planning generator (GPG) based on the decision tree (DT) and random forest (RF) model of the robot leg workspace. First, the framework of this embedded GPG and some of the modules associated with it are illustrated. Aiming at the leg workspace model described by DT and RF used in GPG, this paper introduces in detail how to collect the original data needed for training the model and puts forward an Interpolation Labeling with Dilation and Erosion (ILDE) data processing algorithm. After the DT and RF models are trained, we preliminarily evaluate their performance. We then present how these models can be used to predict the location relation between a spatial point and the leg workspace based on its distributional features. The DT model takes only 0.00011 s to process a sample, while the RF model can give the prediction probability. As a complement, the PID inverse kinematic model used in GPG is also mentioned. Finally, the optimized GPG is tested during a real-time single-leg trajectory planning experiment and an unknown terrain recognition simulation of a virtual quadrupedal robot. According to the test results, the GPG shows a remarkable rapidity for processing large-scale data in the gait trajectory planning tasks, and the results can prove it has an application value for quadruped robot control.
A decision tree is a tree-like model of decisions and their consequences, with classification and regression tree (CART) being the most commonly used. Being simple models, decision trees are considered ’weak learners’ relative to more complex and more accurate models. By using a large ensemble of weak learners, methods such as random forest can compete well against strong learners such as neural networks. An alternative to random forest is boosting. While random forest constructs all the trees independently, boosting constructs one tree at a time. At each step, boosting tries to a build a weak learner that improves on the previous one.
Abscisic acid (ABA) is a plant hormone well known to regulate abiotic stress responses. ABA is also recognised for its role in biotic defence, but there is currently a lack of consensus on whether it plays a positive or negative role. Here, we used supervised machine learning to analyse experimental observations on the defensive role of ABA to identify the most influential factors determining disease phenotypes. ABA concentration, plant age and pathogen lifestyle were identified as important modulators of defence behaviour in our computational predictions. We explored these predictions with new experiments in tomato, demonstrating that phenotypes after ABA treatment were indeed highly dependent on plant age and pathogen lifestyle. Integration of these new results into the statistical analysis refined the quantitative model of ABA influence, suggesting a framework for proposing and exploiting further research to make more progress on this complex question. Our approach provides a unifying road map to guide future studies involving the role of ABA in defence.
Religion is relevant to all of us, whether we are believers or not. This book concerns two interrelated topics. First, how probable is God's existence? Should we not conclude that all divinities are human inventions? Second, what are the mental and social functions of endorsing religious beliefs? The answers to these questions are interdependent. If a religious belief were true, the fact that humans hold it might be explained by describing how its truth was discovered. If all religious beliefs are false, a different explanation is required. In this provocative book Herman Philipse combines philosophical investigations concerning the truth of religious convictions with empirical research on the origins and functions of religious beliefs. Numerous topics are discussed, such as the historical genesis of monotheisms out of polytheisms, how to explain Saul's conversion to Jesus, and whether any apologetic strategy of Christian philosophers is convincing. Universal atheism is the final conclusion.
Attention Deficit Hyperactivity Disorder (ADHD) is a Neurodevelopmental Disorder characterized by persistent pattern of inattention and hyperactivity / impulsivity. There is considerable difficulty in diagnosing ADHD, mainly to discriminate what could be symptoms arising from ADHD or typical age behaviors. The decision tree model is a statistical algorithm, a predictive model built with comparisons of values for a given objective that can be compared with other constant values, placing these variables in a database at hierarchical levels.
Objectives
This study aims to apply the decision tree model in directing the screening of ADHD complaints to analyze which cognitive and behavioral parameters would be better associations with ADHD accurate diagnosis
Methods
We used a database of research protocol with 202 children assessed with complaints of ADHD and a control group with 185 participants. Decision tree analyzed parameters selected from the cognitive instruments, such voluntary attention, Continuous Performance Test indexes, WCST indexes, Wechsler Intelligence indexes and behavioral scales from CBCL/6-1 and TRF/6-18.
Results
The highlighted results points to WCST index like: “Perseverative answers” and “Perseverative errors” and “learning to learn” joint to “CPT omissions” and behavioral scales as “CBCL ADHD”, and “CBCL Problems of Attention” produces accuracy of diagnosis discrimination from 84.7% to 60% in the precision of the decision tree.
Conclusions
The decision tree and machine learning approaches can be effective in directing the screening of typical ADHD complaints.
This chapter examines the issue of how well Canadian jurors comprehend legal instructions and whether there are jury aids that can enhance their comprehension. It traces the design of the first study in Canada to assess the relative efficacy of two competing styles of instructions used in Canada and to compare them with the plain language criminal law instructions used in California, USA. The style that used a series of logical questions to be answered was associated with the highest levels of comprehension. The results also revealed that the type of aid provided had a modest impact on overall comprehension score and that those who received either both aids (written instructions and decision trees) or written instructions had scores that were significantly different from either those with no aid or the decision tree alone. This chapter also explores some of the challenges associated with cross-jurisdictional research. Additional studies are needed to ensure that jurors have an accurate understanding of the law and to make clear how styles of jury instruction and jury aids can impact comprehension levels.
The aim of this Research Communication was to apply the data mining technique to classify which environmental factors have the potential to motivate dairy cows to access natural shade. We defined two different areas at the silvopastoral system: shaded and sunny. Environmental factors and the frequency that dairy cows used each area were measured during four days, for 8 h each day. The shaded areas were the most used by dairy cows and presented the lowest mean values of all environmental factors. Solar radiation was the environmental factor with most potential to classify the dairy cow's decision to access shaded areas. Data mining is a machine learning technique with great potential to characterize the influence of the thermal environment in the cows' decision at the pasture.
Ex situ conservation of species is risky and expensive, but it can prevent extinction when in situ conservation fails. We used the IUCN Guidelines on the Use of Ex Situ Management for Species Conservation to evaluate whether to begin ex situ conservation for the South-east Asian subspecies of Bengal florican Houbaropsis bengalensis blandini, which is predicted to be extinct in the wild within 5 years. To inform our decision, we developed a decision tree, and used a demographic model to evaluate the probability of establishing a captive population under a range of husbandry scenarios and egg harvest regimes, and compared this with the probability of the wild population persisting. The model showed that if ex situ conservation draws on international best practice in bustard husbandry there is a high probability of establishing a captive population, but the wild population is unlikely to persist. We identified and evaluated the practical risks associated with ex situ conservation, and documented our plans to mitigate them. Modelling shows that it is unlikely that birds could be released within 20–30 years, by which time genetic, morphological and behavioural changes in the captive population, combined with habitat loss and extinction of the wild population, make it unlikely that Bengal florican could be released into a situation approximating their current wild state. We considered the philosophical and practical implications through a decision tree so that our decision to begin ex situ management is not held back by our preconceived notions of what it means to be wild.
Subjects with bipolar disorder (BD) show heterogeneous cognitive profile and that not necessarily the disease will lead to unfavorable clinical outcomes. We aimed to identify clinical markers of severity among cognitive clusters in individuals with BD through data-driven methods.
Methods
We recruited 167 outpatients with BD and 100 unaffected volunteers from Brazil and Spain that underwent a neuropsychological assessment. Cognitive functions assessed were inhibitory control, processing speed, cognitive flexibility, verbal fluency, working memory, short- and long-term verbal memory. We performed hierarchical cluster analysis and discriminant function analysis to determine and confirm cognitive clusters, respectively. Then, we used classification and regression tree (CART) algorithm to determine clinical and sociodemographic variables of the previously defined cognitive clusters.
Results
We identified three neuropsychological subgroups in individuals with BD: intact (35.3%), selectively impaired (34.7%), and severely impaired individuals (29.9%). The most important predictors of cognitive subgroups were years of education, the number of hospitalizations, and age, respectively. The model with CART algorithm showed sensitivity 45.8%, specificity 78.4%, balanced accuracy 62.1%, and the area under the ROC curve was 0.61. Of 10 attributes included in the model, only three variables were able to separate cognitive clusters in BD individuals: years of education, number of hospitalizations, and age.
Conclusion
These results corroborate with recent findings of neuropsychological heterogeneity in BD, and suggest an overlapping between premorbid and morbid aspects that influence distinct cognitive courses of the disease.
As tugboats interact very closely with ships in restricted waters, the possibility of accidents increases in these operations. Despite the high accident possibility, there is a gap in studies on tugboat accidents. This study aims to analyse accidents involving tugboats using data mining. For this purpose, a tugboat accidents dataset consisting of a total of 496 accident records for the period from 2008 to 2019 was collected. Logistic regression and decision tree algorithms were implemented to the dataset. The results revealed that tugboat propulsion type is the most important and influential factor in the severity of tugboat accidents. The inferences drawn from these results could be beneficial for tugboat operators and port authorities in enhancing their awareness of the factors affecting tugboat accidents. In addition, the outputs of this study can be a reference for management units in developing strategies for preventing tugboat accidents and can also be used in effective planning for practicable prevention programmes and practices.
This chapter is not intended to be a complete discussion of machine learning. We concentrate on a small number of ideas, and emphasize how to deal with very large data sets. Especially important is how we exploit parallelism to build models of the data. We consider the classical “perceptron” approach to learning a data classifier, where a hyperplane that separates two classes is sought. Then, we look at more modern techniques involving support-vector machines. Similar to perceptrons, these methods look for hyperplanes that best divide the classes, so that few, if any, members of the training set lie close to the hyperplane. We next consider nearest-neighbor techniques, where data is classified according to the class(es) of their nearest neighbors in some space. We end with a discussion of decision trees, which are branching programs for predicting the class of an example.
Data mining is an iterative process in which progress is defined by discovery through either automatic or manual methods. A data cleaning procedure is proposed to improve the quality of classification tasks in the knowledge discovery process by taking into account both redundant and conflicting data. The redundancy check is performed on the original dataset and the resultant dataset is preserved. This resultant dataset is then checked for conflicting data and, if any are found, they are corrected and updated on the original aircraft dataset. This updated dataset is then classified using a variety of classifiers such as Bayes, functions, lazy, MISC, rules and decision trees. The performance of the updated datasets on these classifiers is examine, and the result shows a significant improvement in the classification accuracy after redundancy and conflicts are removed. The conflicts after correction are updated in the original dataset, and when the performance of the classifier is evaluated, great improvement is observed. This paper aims to address how data mining techniques can be used to understand complex system accidents in the aviation domain. Decision trees are considered to be the one of the most powerful and popular approaches in knowledge discovery and data mining. The objective is to develop a classification model for aviation risk investigation and reduction using a decision tree induction method that enhances the ability to form decision trees and thereby proves that the classification accuracy of decision trees is greater. Different feature selectors are used in this study in order to reduce the number of initial attributes.
The 50/92 and 0/92reduced planting alternativestives of the 1985 farm bill allow farm program participants more flexibility in making production decisions. Specifically, these provisions relax the incentive to produce inherent in earlier commodity programs that linked deficiency payments directly to harvested acreage. This study examined the value of this additional decision flexibility for crop producers in the Blacklands of Central Texas. The results suggest that the reduced planting alternatives would not be used by, and have no value for risk neutral producers, but have substantial value for risk averse producers who would reduce planted acreage in years when yield expectations are low.
Objective – To determine the most cost-effective oral therapy for the treatment of Major Depressive Disorder (MDD) in Italy. Method – We conducted a pharmacoeconomic evaluation based on a decision analytic model that examined the treatment of major depressive disorder (MDD) in Italy. The analysis compared the serotonin norepinephrine reuptake inhibitor (SNRI), venlafaxine extended-release (venlafaxine XR), to selective serotonin reuptake inhibitors (SSRIs) and tricyclic antidepressants (TCAs). A meta-analysis was performed to determine the clinical rates of success. The meta-analytic rates were applied to the decision analytic model to calculate the expected cost and expected outcomes for each anti-depressant comparator. Cost-effectiveness was determined using the expected values for both a successful outcome, and a composite measure of outcome termed ‘symptom-free days’. A policy analysis was conducted to estimate the financial impact to the Servizio Sanitario Nazionale (SSN). Results – Treatment of MDD with venlafaxine XR yielded the highest overall efficacy rates for outpatients (73.7%) versus SSRIs (61.4%) and TCAs (59.3%), and inpatients (62.3%) versus SSRIs (58.6%) and TCAs (58.2%). Venlafaxine XR had the lowest dropout rates due to lack of efficacy (4.8%) versus SSRIs (8.4%) and TCAs (6.8%), and adverse drug reactions (10.9%) versus SSRIs (17.4%) and TCAs (23.1%). Initiating treatment of MDD with venlafaxine XR yielded the lowest expected cost for outpatients and for inpatients. The total resulting savings for the SSN at a 5% venlafaxine XR utilization was estimated between L 963 million and L 3,210 million. Conclusion – This study confirms that venlafaxine XR is generally a cost-effective treatment of MDD. Additionally, the results of this investigation suggest that increased utilization of venlafaxine XR will favorably impact the SSN.
The recent outbreak of H1N1 has provided the scientific community with a sad but timely opportunity to understand the influence of socioeconomic determinants on H1N1 pandemic mortality. To this end, we have used data collected from 341 US counties to model H1N1 deaths/1000 using 12 socioeconomic predictors to discover why certain counties reported fewer H1N1 deaths compared to other counties. These predictors were then used to build a decision tree. The decision tree developed was then used to predict H1N1 mortality for the whole of the USA. Our estimate of 7667 H1N1 deaths are in accord with the lower bound of the CDC estimate of 8870 deaths. In addition to the H1N1 death estimates, we have listed possible counties to be targeted for health-related interventions. The respective state/county authorities can use these results as the basis to target and optimize the distribution of public health resources.