We use cookies to distinguish you from other users and to provide you with a better experience on our websites. Close this message to accept cookies or find out how to manage your cookie settings.
To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Prediction and classification are two very active areas in modern data analysis. In this paper, prediction with nonlinear optimal scaling transformations of the variables is reviewed, and extended to the use of multiple additive components, much in the spirit of statistical learning techniques that are currently popular, among other areas, in data mining. Also, a classification/clustering method is described that is particularly suitable for analyzing attribute-value data from systems biology (genomics, proteomics, and metabolomics), and which is able to detect groups of objects that have similar values on small subsets of the attributes.
This chapter reviews the evidence behind the anti-misinformation interventions that have been designed and tested since misinformation research exploded in popularity around 2016. It focuses on four types of intervention: boosting skills or competences (media/digital literacy, critical thinking, and prebunking); nudging people by making changes to social media platforms’ choice architecture; debunking misinformation through fact-checking; and (automated) content labelling. These interventions have one of three goals: to improve relevant skills such as spotting manipulation techniques, source criticism, or lateral reading (in the case of boosting interventions and some content labels); to change people’s behavior, most commonly improving the quality of their sharing decisions (for nudges and most content labels); or to reduce misperceptions and misbeliefs (in the case of debunking). While many such interventions have been shown to work well in lab studies, there continues to be an evidence gap with respect to their effectiveness over time, and how well they work in real-life settings (such as on social media).
Thanks to its outstanding performances, boosting has rapidly gained wide acceptance among actuaries. Wüthrich and Buser (Data Analytics for Non-Life Insurance Pricing. Lecture notes available at SSRN. http://dx.doi.org/10.2139/ssrn.2870308, 2019) established that boosting can be conducted directly on the response under Poisson deviance loss function and log-link, by adapting the weights at each step. This is particularly useful to analyze low counts (typically, numbers of reported claims at policy level in personal lines). Huyghe et al. (Boosting cost-complexity pruned trees on Tweedie responses: The ABT machine for insurance ratemaking. Scandinavian Actuarial Journal. https://doi.org/10.1080/03461238.2023.2258135, 2022) adopted this approach to propose a new boosting machine with cost-complexity pruned trees. In this approach, trees included in the score progressively reduce to the root-node one, in an adaptive way. This paper reviews these results and presents the new BT package in R contributed by Willame (Boosting Trees Algorithm. https://cran.r-project.org/package=BT; https://github.com/GiregWillame/BT, 2022), which is designed to implement this approach for insurance studies. A numerical illustration demonstrates the relevance of the new tool for insurance pricing.
The goal of this chapter is to present complete examples of the design and implementation of machine learning methods in large-scale data analytics. In particular, we choose three distinct topics: semi-supervised learning, ensemble learning, and how to deploy deep learning models at scale. Each of them is introduced, motivating why parallelization to deal with big data is needed, determining the main bottlenecks, designing and coding Spark-based solutions, and discussing further work required to improve the code. In semi-supervised learning, we focus on the simplest self-labeling approach called self-training, and a global solution for it. Likewise, in ensemble learning, we design a global approach for bagging and boosting. Lastly, we show an example with deep learning. Rather than parallelizing the training of a model, which is typically easier on GPUs, we deploy the inference step for a case study in semantic image segmentation.
Libertarian paternalists argue that psychological research has shown that intuition is systematically flawed and we are hardly educable because our cognitive biases resemble stable visual illusions. Thus, they maintain, authorities who know what is best for us need to step in and steer our behavior with the help of nudges. Nudges are nothing new; justifying them on the basis of a latent irrationality is. Technological paternalism is government by algorithms, with tech companies and state governments using digital technology to predict and control citizens’ behavior. This philosophy claims first that AI is, or soon will be, superior to human intuition in all respects; second, people should defer to algorithms’ recommendations. I contend that algorithms and big data can outperform humans in tasks that are well-defined and stable, e.g., playing chess and working on assembly lines, but not in ill-defined and unstable tasks, e.g., finding the best mate and predicting human behavior. Misleadingly, the “dataist” worldview promotes algorithms as if these were omniscient beings and so people should allow them to decide for the good of each what job to accept, whom to marry, and whom to vote for.
We can easily find ourselves with lots of predictors. This situation has been common in ecology and environmental science but has spread to other biological disciplines as genomics, proteomics, metabolomics, etc., become widespread. Models can become very complex, and with many predictors, collinearity is more likely. Fitting the models is tricky, particularly if we’re looking for the “best” model, and the way we approach the task depends on how we’ll use the model results. This chapter describes different model selection approaches for multiple regression models and discusses ways of measuring the importance of specific predictors. It covers stepwise procedures, all subsets, information criteria, model averaging and validation, and introduces regression trees, including boosted trees.
Short-sighted decisions can have devastating consequences, and teaching people to make their decisions in a more far-sighted way is challenging. Previous research found that reflecting on one’s behavior can boost learning from success and failure. Here, we explore the potential benefits of guiding people to reflect on whether and how they thought about what to do (i.e., systematic metacognitive reflection). We devised a series of Socratic questions that prompt people to reflect on their decision-making and tested their effectiveness in a process-tracing experiment with a 5-step planning task (
$N=265$
). Each participant went through several cycles of making a series of decisions and then either reflecting on how they made those decisions, answering unrelated questions, or moving on to the next decision right away. We found that systematic metacognitive reflection helps people discover adaptive, far-sighted decision strategies faster. Our results suggest that systematic metacognitive reflection is a promising approach to boosting people’s decision-making competence.
A decision tree is a tree-like model of decisions and their consequences, with classification and regression tree (CART) being the most commonly used. Being simple models, decision trees are considered ’weak learners’ relative to more complex and more accurate models. By using a large ensemble of weak learners, methods such as random forest can compete well against strong learners such as neural networks. An alternative to random forest is boosting. While random forest constructs all the trees independently, boosting constructs one tree at a time. At each step, boosting tries to a build a weak learner that improves on the previous one.
Nudging has become a well-known policy practice. Recently, ‘boosting’ has been suggested as an alternative to nudging. In contrast to nudges, boosts aim to empower individuals to exert their own agency to make decisions. This article is one of the first to compare a nudging and a boosting intervention, and it does so in a critical field setting: hand hygiene compliance of hospital nurses. During a 4-week quasi-experiment, we tested the effect of a reframing nudge and a risk literacy boost on hand hygiene compliance in three hospital wards. The results show that nudging and boosting were both effective interventions to improve hand hygiene compliance. A tentative finding is that, while the nudge had a stronger immediate effect, the boost effect remained stable for a week, even after the removal of the intervention. We conclude that, besides nudging, researchers and policymakers may consider boosting when they seek to implement or test behavioral interventions in domains such as healthcare.
Identifying predictors of patient outcomes evaluated over time may require modeling interactions among variables while addressing within-subject correlation. Generalized linear mixed models (GLMMs) and generalized estimating equations (GEEs) address within-subject correlation, but identifying interactions can be difficult if not hypothesized a priori. We evaluate the performance of several variable selection approaches for clustered binary outcomes to provide guidance for choosing between the methods.
Methods:
We conducted simulations comparing stepwise selection, penalized GLMM, boosted GLMM, and boosted GEE for variable selection considering main effects and two-way interactions in data with repeatedly measured binary outcomes and evaluate a two-stage approach to reduce bias and error in parameter estimates. We compared these approaches in real data applications: hypothermia during surgery and treatment response in lupus nephritis.
Results:
Penalized and boosted approaches recovered correct predictors and interactions more frequently than stepwise selection. Penalized GLMM recovered correct predictors more often than boosting, but included many spurious predictors. Boosted GLMM yielded parsimonious models and identified correct predictors well at large sample and effect sizes, but required excessive computation time. Boosted GEE was computationally efficient and selected relatively parsimonious models, offering a compromise between computation and parsimony. The two-stage approach reduced the bias and error in regression parameters in all approaches.
Conclusion:
Penalized and boosted approaches are effective for variable selection in data with clustered binary outcomes. The two-stage approach reduces bias and error and should be applied regardless of method. We provide guidance for choosing the most appropriate method in real applications.
We present an actuarial claims reserving technique that takes into account both claim counts and claim amounts. Separate (overdispersed) Poisson models for the claim counts and the claim amounts are combined by a joint embedding into a neural network architecture. As starting point of the neural network calibration, we use exactly these two separate (overdispersed) Poisson models. Such a nested model can be interpreted as a boosting machine. It allows us for joint modeling and mutual learning of claim counts and claim amounts beyond the two individual (overdispersed) Poisson models.
A new algorithm, called boosted hybrid method, is proposed for the simulation of chemical reaction systems with scale-separation in time and disparity in species population. For such stiff systems, the algorithm can automatically identify scale-separation in time and slow down the fast reactions while maintaining a good approximation to the original effective dynamics. This technique is called boosting. As disparity in species population may still exist in the boosted system, we propose a hybrid strategy based on coarse-graining methods, such as the tau-leaping method, to accelerate the reactions among large population species. The combination of the boosting strategy and the hybrid method allow for an efficient and adaptive simulation of complex chemical reactions. The new method does not need a priori knowledge of the system and can also be used for systems with hierarchical multiple time scales. Numerical experiments illustrate the versatility and efficiency of the method.
Recommend this
Email your librarian or administrator to recommend adding this to your organisation's collection.