Search

Prediction and Classification in Nonlinear Data Analysis: Something Old, Something New, Something Borrowed, Something Blue
Jacqueline J. Meulman
Journal:

Psychometrika / Volume 68 / Issue 4 / December 2003

Published online by Cambridge University Press:

01 January 2025, pp. 493-517
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Prediction and classification are two very active areas in modern data analysis. In this paper, prediction with nonlinear optimal scaling transformations of the variables is reviewed, and extended to the use of multiple additive components, much in the spirit of statistical learning techniques that are currently popular, among other areas, in data mining. Also, a classification/clustering method is described that is particularly suitable for analyzing attribute-value data from systems biology (genomics, proteomics, and metabolomics), and which is able to detect groups of objects that have similar values on small subsets of the attributes.

7 - Interventions to Combat Misinformation
from Part III - Countering Misinformation
Jon Roozenbeek, University of Cambridge, Sander van der Linden, University of Cambridge
Book:

The Psychology of Misinformation

Published online:

28 March 2024

Print publication:

04 April 2024, pp 98-115
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

This chapter reviews the evidence behind the anti-misinformation interventions that have been designed and tested since misinformation research exploded in popularity around 2016. It focuses on four types of intervention: boosting skills or competences (media/digital literacy, critical thinking, and prebunking); nudging people by making changes to social media platforms’ choice architecture; debunking misinformation through fact-checking; and (automated) content labelling. These interventions have one of three goals: to improve relevant skills such as spotting manipulation techniques, source criticism, or lateral reading (in the case of boosting interventions and some content labels); to change people’s behavior, most commonly improving the quality of their sharing decisions (for nudges and most content labels); or to reduce misperceptions and misbeliefs (in the case of debunking). While many such interventions have been shown to work well in lab studies, there continues to be an evidence gap with respect to their effectiveness over time, and how well they work in real-life settings (such as on social media).

Boosted Poisson regression trees: a guide to the BT package in R
Gireg Willame, Julien Trufin, Michel Denuit
Journal:

Annals of Actuarial Science / Volume 18 / Issue 3 / November 2024

Published online by Cambridge University Press:

15 January 2024, pp. 605-625
- Article
- - You have access
- PDF
- HTML
- Export citation
Thanks to its outstanding performances, boosting has rapidly gained wide acceptance among actuaries. Wüthrich and Buser (Data Analytics for Non-Life Insurance Pricing. Lecture notes available at SSRN. http://dx.doi.org/10.2139/ssrn.2870308, 2019) established that boosting can be conducted directly on the response under Poisson deviance loss function and log-link, by adapting the weights at each step. This is particularly useful to analyze low counts (typically, numbers of reported claims at policy level in personal lines). Huyghe et al. (Boosting cost-complexity pruned trees on Tweedie responses: The ABT machine for insurance ratemaking. Scandinavian Actuarial Journal. https://doi.org/10.1080/03461238.2023.2258135, 2022) adopted this approach to propose a new boosting machine with cost-complexity pruned trees. In this approach, trees included in the score progressively reduce to the root-node one, in an adaptive way. This paper reviews these results and presents the new BT package in R contributed by Willame (Boosting Trees Algorithm. https://cran.r-project.org/package=BT; https://github.com/GiregWillame/BT, 2022), which is designed to implement this approach for insurance studies. A numerical illustration demonstrates the relevance of the new tool for insurance pricing.

9 - Advanced Examples: Semi-supervised, Ensembles, Deep Learning Model Deployment
from Part III - Machine Learning for Big Data
Isaac Triguero, University of Nottingham, Mikel Galar, Public University of Navarre
Book:

Large-Scale Data Analytics with Python and Spark

Published online:

15 December 2023

Print publication:

23 November 2023, pp 305-368
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

The goal of this chapter is to present complete examples of the design and implementation of machine learning methods in large-scale data analytics. In particular, we choose three distinct topics: semi-supervised learning, ensemble learning, and how to deploy deep learning models at scale. Each of them is introduced, motivating why parallelization to deal with big data is needed, determining the main bottlenecks, designing and coding Spark-based solutions, and discussing further work required to improve the code. In semi-supervised learning, we focus on the simplest self-labeling approach called self-training, and a global solution for it. Likewise, in ensemble learning, we design a global approach for bagging and boosting. Lastly, we show an example with deep learning. Rather than parallelizing the training of a model, which is typically easier on GPUs, we deploy the inference step for a case study in semantic image segmentation.

Chapter 4 - Governmental and Technological Paternalism
from Part I - The War on Intuition
Gerd Gigerenzer, Max Planck Institute for Human Development
Book:

The Intelligence of Intuition

Published online:

28 September 2023

Print publication:

12 October 2023, pp 68-88
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

Libertarian paternalists argue that psychological research has shown that intuition is systematically flawed and we are hardly educable because our cognitive biases resemble stable visual illusions. Thus, they maintain, authorities who know what is best for us need to step in and steer our behavior with the help of nudges. Nudges are nothing new; justifying them on the basis of a latent irrationality is. Technological paternalism is government by algorithms, with tech companies and state governments using digital technology to predict and control citizens’ behavior. This philosophy claims first that AI is, or soon will be, superior to human intuition in all respects; second, people should defer to algorithms’ recommendations. I contend that algorithms and big data can outperform humans in tasks that are well-defined and stable, e.g., playing chess and working on assembly lines, but not in ill-defined and unstable tasks, e.g., finding the best mate and predicting human behavior. Misleadingly, the “dataist” worldview promotes algorithms as if these were omniscient beings and so people should allow them to decide for the good of each what job to accept, whom to marry, and whom to vote for.

9 - Predictor Importance and Model Selection in Multiple Regression Models
Gerry P. Quinn, Deakin University, Victoria, Michael J. Keough, University of Melbourne
Book:

Experimental Design and Data Analysis for Biologists

Published online:

04 September 2023

Print publication:

07 September 2023, pp 174-193
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

We can easily find ourselves with lots of predictors. This situation has been common in ecology and environmental science but has spread to other biological disciplines as genomics, proteomics, metabolomics, etc., become widespread. Models can become very complex, and with many predictors, collinearity is more likely. Fitting the models is tricky, particularly if we’re looking for the “best” model, and the way we approach the task depends on how we’ll use the model results. This chapter describes different model selection approaches for multiple regression models and discusses ways of measuring the importance of specific predictors. It covers stepwise procedures, all subsets, information criteria, model averaging and validation, and introduces regression trees, including boosted trees.

Systematic metacognitive reflection helps people discover far-sighted decision strategies: A process-tracing experiment
Frederic Becker, Maria Wirzberger, Viktoria Pammer-Schindler, Srinidhi Srinivas, Falk Lieder
Journal:

Judgment and Decision Making / Volume 18 / 2023

Published online by Cambridge University Press:

18 May 2023, e15
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
Short-sighted decisions can have devastating consequences, and teaching people to make their decisions in a more far-sighted way is challenging. Previous research found that reflecting on one’s behavior can boost learning from success and failure. Here, we explore the potential benefits of guiding people to reflect on whether and how they thought about what to do (i.e., systematic metacognitive reflection). We devised a series of Socratic questions that prompt people to reflect on their decision-making and tested their effectiveness in a process-tracing experiment with a 5-step planning task ( $N=265$ ). Each participant went through several cycles of making a series of decisions and then either reflecting on how they made those decisions, answering unrelated questions, or moving on to the next decision right away. We found that systematic metacognitive reflection helps people discover adaptive, far-sighted decision strategies faster. Our results suggest that systematic metacognitive reflection is a promising approach to boosting people’s decision-making competence.

14 - Decision Trees, Random Forests and Boosting
William W. Hsieh, University of British Columbia, Vancouver
Book:

Introduction to Environmental Data Science

Published online:

23 March 2023

Print publication:

23 March 2023, pp 473-493
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

A decision tree is a tree-like model of decisions and their consequences, with classification and regression tree (CART) being the most commonly used. Being simple models, decision trees are considered ’weak learners’ relative to more complex and more accurate models. By using a large ensemble of weak learners, methods such as random forest can compete well against strong learners such as neural networks. An alternative to random forest is boosting. While random forest constructs all the trees independently, boosting constructs one tree at a time. At each step, boosting tries to a build a weak learner that improves on the previous one.

Improving hand hygiene in hospitals: comparing the effect of a nudge and a boost on protocol compliance
Henrico van Roekel, Joanne Reinhard, Stephan Grimmelikhuijsen
Journal:

Behavioural Public Policy / Volume 6 / Issue 1 / January 2022

Published online by Cambridge University Press:

03 May 2021, pp. 52-74
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
Nudging has become a well-known policy practice. Recently, ‘boosting’ has been suggested as an alternative to nudging. In contrast to nudges, boosts aim to empower individuals to exert their own agency to make decisions. This article is one of the first to compare a nudging and a boosting intervention, and it does so in a critical field setting: hand hygiene compliance of hospital nurses. During a 4-week quasi-experiment, we tested the effect of a reframing nudge and a risk literacy boost on hand hygiene compliance in three hospital wards. The results show that nudging and boosting were both effective interventions to improve hand hygiene compliance. A tentative finding is that, while the nudge had a stronger immediate effect, the boost effect remained stable for a week, even after the removal of the intervention. We conclude that, besides nudging, researchers and policymakers may consider boosting when they seek to implement or test behavioral interventions in domains such as healthcare.

Variable selection methods for identifying predictor interactions in data with repeatedly measured binary outcomes
Bethany J. Wolf, Yunyun Jiang, Sylvia H. Wilson, Jim C. Oates
Journal:

Journal of Clinical and Translational Science / Volume 5 / Issue 1 / 2021

Published online by Cambridge University Press:

16 November 2020, e59
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
Introduction:
Identifying predictors of patient outcomes evaluated over time may require modeling interactions among variables while addressing within-subject correlation. Generalized linear mixed models (GLMMs) and generalized estimating equations (GEEs) address within-subject correlation, but identifying interactions can be difficult if not hypothesized a priori. We evaluate the performance of several variable selection approaches for clustered binary outcomes to provide guidance for choosing between the methods.
Methods:
We conducted simulations comparing stepwise selection, penalized GLMM, boosted GLMM, and boosted GEE for variable selection considering main effects and two-way interactions in data with repeatedly measured binary outcomes and evaluate a two-stage approach to reduce bias and error in parameter estimates. We compared these approaches in real data applications: hypothermia during surgery and treatment response in lupus nephritis.
Results:
Penalized and boosted approaches recovered correct predictors and interactions more frequently than stepwise selection. Penalized GLMM recovered correct predictors more often than boosting, but included many spurious predictors. Boosted GLMM yielded parsimonious models and identified correct predictors well at large sample and effect sizes, but required excessive computation time. Boosted GEE was computationally efficient and selected relatively parsimonious models, offering a compromise between computation and parsimony. The two-stage approach reduced the bias and error in regression parameters in all approaches.
Conclusion:
Penalized and boosted approaches are effective for variable selection in data with clustered binary outcomes. The two-stage approach reduces bias and error and should be applied regardless of method. We provide guidance for choosing the most appropriate method in real applications.

A NEURAL NETWORK BOOSTED DOUBLE OVERDISPERSED POISSON CLAIMS RESERVING MODEL
Andrea Gabrielli
Journal:

ASTIN Bulletin: The Journal of the IAA / Volume 50 / Issue 1 / January 2020

Published online by Cambridge University Press:

17 December 2019, pp. 25-60

Print publication:

January 2020
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
We present an actuarial claims reserving technique that takes into account both claim counts and claim amounts. Separate (overdispersed) Poisson models for the claim counts and the claim amounts are combined by a joint embedding into a neural network architecture. As starting point of the neural network calibration, we use exactly these two separate (overdispersed) Poisson models. Such a nested model can be interpreted as a boosting machine. It allows us for joint modeling and mutual learning of claim counts and claim amounts beyond the two individual (overdispersed) Poisson models.

Comparing Random Forest with Logistic Regression for Predicting Class-Imbalanced Civil War Onset Data: A Comment
Yu Wang
Journal:

Political Analysis / Volume 27 / Issue 1 / January 2019

Published online by Cambridge University Press:

31 December 2018, pp. 107-110
- Article
- - You have access
- PDF
- HTML
- Export citation

Boosted Hybrid Method for Solving Chemical Reaction Systems with Multiple Scales in Time and Population Size
Yucheng Hu, Assyr Abdulle, Tiejun Li
Journal:

Communications in Computational Physics / Volume 12 / Issue 4 / October 2012

Published online by Cambridge University Press:

20 August 2015, pp. 981-1005

Print publication:

October 2012
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
A new algorithm, called boosted hybrid method, is proposed for the simulation of chemical reaction systems with scale-separation in time and disparity in species population. For such stiff systems, the algorithm can automatically identify scale-separation in time and slow down the fast reactions while maintaining a good approximation to the original effective dynamics. This technique is called boosting. As disparity in species population may still exist in the boosted system, we propose a hybrid strategy based on coarse-graining methods, such as the tau-leaping method, to accelerate the reactions among large population species. The combination of the boosting strategy and the hybrid method allow for an efficient and adaptive simulation of complex chemical reactions. The new method does not need a priori knowledge of the system and can also be used for systems with hierarchical multiple time scales. Numerical experiments illustrate the versatility and efficiency of the method.

Search Results

Refine search

Refine search

Actions for selected content:

13 results

Prediction and Classification in Nonlinear Data Analysis: Something Old, Something New, Something Borrowed, Something Blue

7 - Interventions to Combat Misinformation

Summary

Boosted Poisson regression trees: a guide to the BT package in R

9 - Advanced Examples: Semi-supervised, Ensembles, Deep Learning Model Deployment

Summary

Chapter 4 - Governmental and Technological Paternalism

Summary

9 - Predictor Importance and Model Selection in Multiple Regression Models

Summary

Systematic metacognitive reflection helps people discover far-sighted decision strategies: A process-tracing experiment

14 - Decision Trees, Random Forests and Boosting

Summary

Improving hand hygiene in hospitals: comparing the effect of a nudge and a boost on protocol compliance

Variable selection methods for identifying predictor interactions in data with repeatedly measured binary outcomes

A NEURAL NETWORK BOOSTED DOUBLE OVERDISPERSED POISSON CLAIMS RESERVING MODEL

Comparing Random Forest with Logistic Regression for Predicting Class-Imbalanced Civil War Onset Data: A Comment

Boosted Hybrid Method for Solving Chemical Reaction Systems with Multiple Scales in Time and Population Size

Search Results

Refine search

Refine search

Actions for selected content:

Save Search

13 results

Summary

Summary

Summary

Summary

Summary