Search

3 - Basic Data Analysis in Stata
Taylan Mavruk, University of Gothenburg
Book:

Quantitative Research Methods in Corporate Finance

Published online:

20 March 2025

Print publication:

20 March 2025, pp 21-39
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

Data management concerns collecting, processing, analyzing, organizing, storing, and maintaining the data you collect for a research design. The focus in this chapter is on learning how to use Stata and apply data-management techniques to a provided dataset. No previous knowledge is required for the applications. The chapter goes through the basic operations for data management, including missing-value analysis and outlier analysis. It then covers descriptive statistics (univariate analysis) and bivariate analysis. Finally, it ends by discussing how to merge and append datasets. This chapter is important to proceed with the applications, lab work, and mini case studies in the following chapters, since it is a means to become familiar with the software. Stata codes are provided in the main text. For those who are interested in using Python or R instead, the corresponding code is provided on the online resources page (www.cambridge.org/mavruk).

Axiomatic Characterization of the Quadratic Scoring Rule
Reinhard Selten
Journal:

Experimental Economics / Volume 1 / Issue 1 / June 1998

Published online by Cambridge University Press:

14 March 2025, pp. 43-62
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
In the evaluation of experiments often the problem arises of how to compare the predictive success of competing probabilistic theories. The quadratic scoring rule can be used for this purpose. Originally, this rule was proposed as an incentive compatible elicitation method for probabilistic expert judgments. It is shown that up to a positive linear transformation, the quadratic scoring rule is characterized by four desirable properties.

3 - Who Testifies in Congress? New Data on Congressional Hearings and Witnesses
Pamela Ban, University of California, San Diego, Ju Yeon Park, Ohio State University, Hye Young You, Princeton University, New Jersey
Book:

Hearings on the Hill

Published online:

21 November 2024

Print publication:

28 November 2024, pp 29-52
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

This chapter introduces a novel dataset encompassing 731,810 witnesses across 74,077 House, Senate, and Joint standing committee hearings held between 1961 and 2018. The dataset includes comprehensive details such as witness names, organizational affiliations, hearing summaries, committee titles, dates, and bill numbers discussed. The chapter describes the meticulous construction process, emphasizing the extraction of key variables focusing on witness affiliations, affiliation type, and gender. With eighteen categorized affiliation types and nine broader parent categories, this classification captures the diverse spectrum of external groups represented in congressional hearings. The chapter also provides rich descriptive statistics on hearings and witness over time and across committees.

1 - Statistics in the Research Process
Roberto R. Heredia, Texas A&M International University, Richard D. Hartley, University of Texas at San Antonio
Book:

Social Behavioral Statistics

Published online:

13 January 2025

Print publication:

28 November 2024, pp 1-19
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

Chapter 1 explores the link between the research process and theory and the role of statistics in scientific discovery. Discrete and continuous variables, the building blocks of methodology, take center stage, with clear and elaborate examples and their applicability to scales of measurement and measures of central tendency. Understanding statistics allows us to become better consumers of science and make better judgments and decisions about claims and facts allegedly supported by statistical results.

Chapter 7 - Analyzing Dichotomously Scored Items for Selecting the Most Proficient Test Takers
from Part III - Right or Wrong
Gary J. Ockey, Iowa State University
Book:

Introducing Second Language Assessment

Published online:

13 November 2024

Print publication:

07 November 2024, pp 95-118
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

In Chapter 7, the author introduces both content analysis and basic statistical analysis to help evaluate the effectiveness of assessments. The focus of the chapter is on guidelines for creating and evaluating reading and listening inputs and selected response item types, particularly multiple-choice items that accompany these inputs. The author guides readers through detailed evaluations of reading passages and accompanying multiple-choice items that need major revisions. The author discusses generative artificial intelligence as an aid for drafting inputs and creating items and includes an appendix which guides readers through the use of ChatGPT for this purpose. The author also introduces test-level statistics, including minimum, maximum, range, mean, variance, standard deviation, skewness, and kurtosis. The author shows how to calculate these statistics for an actual grammar tense test and includes an appendix with detailed guidelines for conducting these analyses using Excel software.

Chapter 4 - Corpus Methodology and Overview of Data
Claudia Claridge, University of Augsburg, Ewa Jonsson, Mid Sweden University, Merja Kytö, Uppsala University
Book:

Intensifiers in Late Modern English

Published online:

15 March 2024

Print publication:

28 March 2024, pp 64-89
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

This is the main methodology and first-results chapter. It opens with an introduction to the lexeme-based approach used for the investigation, contrasting this to previous, variationist approaches. The chapter proceeds to explain the data retrieval and screening processes and presents an overview of the data, the nearly 65,000 intensifier tokens found in the corpus, across the three main categories (maximizers, boosters, downtoners), and the descriptive results across time for the most frequent items. The word counts of the different sociopragmatic groups of speakers (divided by speakers’ role in the courtroom, gender and social class) are introduced, as well as the diachronic distribution of intensifiers across the genders and social classes. Results are presented within the descriptive statistics framework, but the chapter also briefly introduces the regression model, or the inferential, multivariate statistical method to be used in Chapters 8–11 to disentangle the complex interplay of the sociopragmatic variables of speakers on the use of intensifiers.

4 - Descriptive Statistics and the Normal Distribution
Dudley L. Poston, Jr, Texas A&M University, Eugenia Conde, University of North Carolina, Chapel Hill, Layton M. Field, Mount St. Mary’s University
Book:

Applied Regression Models in the Social Sciences

Published online:

17 August 2023

Print publication:

17 August 2023, pp 47-69
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

This chapter covers the two topics of descriptive statistics and the normal distribution. We first discuss the role of descriptive statistics and the measures of central tendency, variance, and standard deviation. We also provide examples of the kinds of graphs often used in descriptive statistics. We next discuss the normal distribution, its properties and its role in descriptive and inferential statistical analysis.

10 - #StatsWithCats
Edith Podhovnik
Book:

Purrieties of Language

Published online:

06 April 2023

Print publication:

27 April 2023, pp 253-280
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

The chapter ’#StatsWithCats’ shows some statistical methods to interpret and visualise the cat-related online data. The selected sociolinguistic variables are the social media platforms and the cat account types. The chapter takes frequencies and crosstabs to describe linguistic variation across four social media platforms and four cat account types. The selected linguistic variables refer to the choices of non-meowlogisms and meowlogisms on Facebook, Instagram, Twitter, and Youtube as well as in collective, for-profit celebrity, working-for-cause, and individual cat accounts. Additionally, the chapter uses social network analysis to illustrate the networks in cat-related digital spaces.

Statistically Valid Inferences from Differentially Private Data Releases, with Application to the Facebook URLs Dataset
Georgina Evans, Gary King
Journal:

Political Analysis / Volume 31 / Issue 1 / January 2023

Published online by Cambridge University Press:

20 April 2022, pp. 1-21
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
We offer methods to analyze the “differentially private” Facebook URLs Dataset which, at over 40 trillion cell values, is one of the largest social science research datasets ever constructed. The version of differential privacy used in the URLs dataset has specially calibrated random noise added, which provides mathematical guarantees for the privacy of individual research subjects while still making it possible to learn about aggregate patterns of interest to social scientists. Unfortunately, random noise creates measurement error which induces statistical bias—including attenuation, exaggeration, switched signs, or incorrect uncertainty estimates. We adapt methods developed to correct for naturally occurring measurement error, with special attention to computational efficiency for large datasets. The result is statistically valid linear regression estimates and descriptive statistics that can be interpreted as ordinary analyses of nonconfidential data but with appropriately larger standard errors.

In praise of Table 1: The importance of making better use of descriptive statistics
Kevin R. Murphy
Journal:

Industrial and Organizational Psychology / Volume 14 / Issue 4 / December 2021

Published online by Cambridge University Press:

14 December 2021, pp. 461-477
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
As data analytic methods in the managerial sciences become more sophisticated, the gap between the descriptive data typically presented in Table 1 and the analyses used to test the principal hypotheses advanced has become increasingly large. This contributes to several problems including: (1) the increasing likelihood that analyses presented in published research will be performed and/or interpreted incorrectly, (2) an increasing reliance on statistical significance as the principal criterion for evaluating results, and (3) the increasing difficulty of describing our research and explaining our findings to non-specialists. A set of simple methods for assessing whether hypotheses about interventions, moderator relationships and mediation, are plausible that are based on the simplest possible examination of descriptive statistics are proposed.

Chapter 12 - Statistics and probability
from Part 2 - Learning and teaching key mathematics content
- By Robyn Reaburn
Gregory Hine, Notre Dame University, Australia, Judy Anderson, University of Sydney, Robyn Reaburn, University of Tasmania, Michael Cavanagh, Macquarie University, Sydney, Linda Galligan, University of Southern Queensland, Bing H. Ngu, University of New England, Australia, Bruce White, University of South Australia
Book:

Teaching Secondary Mathematics

Published online:

27 October 2021

Print publication:

24 September 2021, pp 301-328
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

Genetic interactions with stressful environments in depression and addiction
Margit Burmeister, Srijan Sen
Journal:

BJPsych Advances / Volume 27 / Issue 3 / May 2021

Published online by Cambridge University Press:

23 April 2021, pp. 153-157

Print publication:

May 2021
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
Stress is the most important proximal precipitant of depression, yet most large genome-wide association studies (GWAS) do not include stress as a variable. Here, we review how gene × environment (G × E) interaction might impede the discovery of genetic factors, discuss two examples of G × E interaction in depression and addiction, studies incorporating high-stress environments, as well as upcoming waves of genome-wide environment interaction studies (GWEIS). We discuss recent studies which have shown that genetic distributions can be affected by social factors such as migrations and socioeconomic background. These distinctions are not just academic but have practical consequences. Owing to interaction with the environment, genetic predispositions to depression should not be viewed as unmodifiable destiny. Patients may genetically differ not just in their response to drugs, as in the now well-recognised field of pharmacogenetics, but also in how they react to stressful environments and how they are affected by behavioural therapies.

3 - Models of Central Tendency and Variability
Russell T. Warne, Utah Valley University
Book:

Statistics for the Social Sciences

Published online:

24 December 2020

Print publication:

17 December 2020, pp 36-61
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

This chapter discusses two types of descriptive statistics: models of central tendency and models of variability. Models of central tendency describe the location of the middle of the distribution, and models of variability describe the degree that scores are spread out from one another. There are four models of central tendency in this chapter. Listed in ascending order of the complexity of their calculations, these are the mode, median, mean, and trimmed mean. There are also four principal models of variability discussed in this chapter: the range, interquartile range, standard deviation, and variance. For the latter two statistics, students are shown three possible formulas (sample standard deviation and variance, population standard deviation and variance, and population standard deviation and variance estimated from sample data), along with an explanation of when it is appropriate to use each formula. No statistical model of central tendency or variability tells you everything you may need to know about your data. Only by using multiple models in conjunction with each other can you have a thorough understanding of your data.

3 - A Deep Dive into Supreme Court Evaluation and Support
Brandon L. Bartels, George Washington University, Washington DC, Christopher D. Johnston, Duke University, North Carolina
Book:

Curbing the Court

Published online:

14 August 2020

Print publication:

20 August 2020, pp 60-92
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

Chapter 3 describes in detail the data sources and research designs used throughout the book, including observational data sources, experiments on national samples of American citizens, and panel surveys tracking the same people over time. It also summarizes aggregate public opinion on key variables through time, including approval, confidence, trust, procedural perceptions, and broadly targeted and narrowly targeted Court-curbing. The chapter concludes that the Court’s “reservoir of goodwill” within the American public is not as deep or wide as many scholars suggest.

2 - Features, Combined: Normalization, Discretization and Outliers
from Part One - Fundamentals
Pablo Duboue
Book:

The Art of Feature Engineering

Published online:

29 May 2020

Print publication:

25 June 2020, pp 34-58
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

This chapter discusses Feature Engineering techniques that look holistically at the feature set, therefore replacing or enhancing the features based on their relation to the whole set of instances and features. Techniques such as normalization, scaling, dealing with outliers and generating descriptive features are covered. Scaling and normalization are the most common, it involves finding the maximum and minimum and changing the values to ensure they will lie in a given interval (e.g., [0, 1] or [−1, 1]). Discretization and binning involve, for example, analyzing a feature that is an integer (any number from -1 trillion to +1 trillion) and realize that it only takes the values 0, 1 and 10 so it can be simplified into a symbolic feature with three values (value0, value1 and value10). Descriptive features is the gathering of information that talks about the shape of the data, the discussion centres around using tables of counts (histograms) and general descriptive features such as maximum, minimum and averages. Outlier detection and treatment refers to looking at the feature values across many instances and realizing some values might present themselves very far from the rest.

Measures of variability and precision in statistics: appreciating, untangling and applying concepts
Adam Ciarleglio
Journal:

BJPsych Advances / Volume 27 / Issue 2 / March 2021

Published online by Cambridge University Press:

18 June 2020, pp. 137-139

Print publication:

March 2021
- Article
- - You have access
- PDF
- HTML
- Export citation
This reflection presents a discussion of some common measures of variability and how they are appropriately used in descriptive and inferential statistical analyses. We argue that confidence intervals (CIs), which incorporate these measures, serve as tools to assess both clinical and statistical significance.

7 - Compensation Analytics II
Jed DeVaro, California State University, East Bay
Book:

Strategic Compensation and Talent Management

Published online:

23 March 2020

Print publication:

02 April 2020, pp 143-171
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

This chapter presents a detailed example that applies the compensation analytics concepts developed in Chapter 6. The reader is assumed to be a compensation consultant charged with evaluating whether gender-based discrimination in pay is present in a public university system in the sciences. Section 7.1 walks through the analysis step-by-step, from formulating the business question, to acquiring and cleaning data, to analyzing the data and interpreting the results from voluminous statistical output in light of the business question. Section 7.2 covers exploratory data mining, causality, and experiments. Exploratory data mining covers situations in which the manager does not know in advance which relationships in the data will be of interest, in contrast to the example in section 7.1 in which a statistical model and specific measures could be constructed that were directly tailored to address the business question at hand. Section 7.2 covers the challenges associated with establishing causality in compensation research and how experiments can sometimes be designed to address those challenges. Randomization and some pitfalls associated with compensation experiments are also covered

7 - Compensation Analytics II
Jed DeVaro, California State University, East Bay
Book:

Strategic Compensation and Talent Management

Published online:

23 March 2020

Print publication:

02 April 2020, pp 143-171
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

This chapter presents a detailed example that applies the compensation analytics concepts developed in Chapter 6. The reader is assumed to be a compensation consultant charged with evaluating whether gender-based discrimination in pay is present in a public university system in the sciences. Section 7.1 walks through the analysis step-by-step, from formulating the business question, to acquiring and cleaning data, to analyzing the data and interpreting the results from voluminous statistical output in light of the business question. Section 7.2 covers exploratory data mining, causality, and experiments. Exploratory data mining covers situations in which the manager does not know in advance which relationships in the data will be of interest, in contrast to the example in section 7.1 in which a statistical model and specific measures could be constructed that were directly tailored to address the business question at hand. Section 7.2 covers the challenges associated with establishing causality in compensation research and how experiments can sometimes be designed to address those challenges. Randomization and some pitfalls associated with compensation experiments are also covered

6 - Compensation Analytics I
Jed DeVaro, California State University, East Bay
Book:

Strategic Compensation and Talent Management

Published online:

23 March 2020

Print publication:

02 April 2020, pp 107-142
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

This chapter responds to the growing importance of business analytics on "big data" in managerial decision-making, by providing a comprehensive primer on analyzing compensation data. All aspects of compensation analytics are covered, starting with data acquisition, types of data, and formulation of a business question that can be informed by data analysis. A detailed, hands-on treatment of data cleaning is provided, equipping readers to prepare data for analysis by detecting and fixing data problems. Descriptive statistics are reviewed, and their utility in data cleaning explicated. Graphical methods are used in examples to detect and trim outliers. The basics of linear regression analysis are covered, with an emphasis on application and interpreting results in the context of the business question(s) posed. One section covers the question of whether or not the pay measure (as a dependent variable) should be transformed via a logarithm, and the implications of that choice for interpreting the results are explained. Precision of regression estimates is covered via an intuitive, non-technical treatment of standard errors. An appendix covers nonlinear relationships among variables.

6 - Compensation Analytics I
Jed DeVaro, California State University, East Bay
Book:

Strategic Compensation and Talent Management

Published online:

23 March 2020

Print publication:

02 April 2020, pp 107-142
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

This chapter responds to the growing importance of business analytics on "big data" in managerial decision-making, by providing a comprehensive primer on analyzing compensation data. All aspects of compensation analytics are covered, starting with data acquisition, types of data, and formulation of a business question that can be informed by data analysis. A detailed, hands-on treatment of data cleaning is provided, equipping readers to prepare data for analysis by detecting and fixing data problems. Descriptive statistics are reviewed, and their utility in data cleaning explicated. Graphical methods are used in examples to detect and trim outliers. The basics of linear regression analysis are covered, with an emphasis on application and interpreting results in the context of the business question(s) posed. One section covers the question of whether or not the pay measure (as a dependent variable) should be transformed via a logarithm, and the implications of that choice for interpreting the results are explained. Precision of regression estimates is covered via an intuitive, non-technical treatment of standard errors. An appendix covers nonlinear relationships among variables.

Search Results

Refine search

Refine search

Actions for selected content:

22 results

3 - Basic Data Analysis in Stata

Summary

Axiomatic Characterization of the Quadratic Scoring Rule

3 - Who Testifies in Congress? New Data on Congressional Hearings and Witnesses

Summary

1 - Statistics in the Research Process

Summary

Chapter 7 - Analyzing Dichotomously Scored Items for Selecting the Most Proficient Test Takers

Summary

Chapter 4 - Corpus Methodology and Overview of Data

Summary

4 - Descriptive Statistics and the Normal Distribution

Summary

10 - #StatsWithCats

Summary

Statistically Valid Inferences from Differentially Private Data Releases, with Application to the Facebook URLs Dataset

In praise of Table 1: The importance of making better use of descriptive statistics

Chapter 12 - Statistics and probability

Genetic interactions with stressful environments in depression and addiction

3 - Models of Central Tendency and Variability

Summary

3 - A Deep Dive into Supreme Court Evaluation and Support

Summary

2 - Features, Combined: Normalization, Discretization and Outliers

Summary

Measures of variability and precision in statistics: appreciating, untangling and applying concepts

7 - Compensation Analytics II

Summary

7 - Compensation Analytics II

Summary

6 - Compensation Analytics I

Summary

6 - Compensation Analytics I

Summary

Search Results

Refine search

Refine search

Actions for selected content:

Save Search

22 results

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary