We use cookies to distinguish you from other users and to provide you with a better experience on our websites. Close this message to accept cookies or find out how to manage your cookie settings.
To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
In this paper, we propose a class of locally dependent latent trait models for responses to psychological and educational tests. Typically, item response models treat an individual's multiple response to stimuli as conditional independent given the individual's latent trait. In this paper, instead the focus is on models based on a family of conditional distributions, or kernel, that describes joint multiple item responses as a function of student latent trait, not assuming conditional independence. Specifically, we examine a hybrid kernel which comprises a component for one-way item response functions and a component for conditional associations between items given latent traits. The class of models allows the extension of item response theory to cover some new and innovative applications in psychological and educational research. An EM algorithm for marginal maximum likelihood of the hybrid kernel model is proposed. Furthermore, we delineate the relationship of the class of locally dependent models and the log-linear model by revisiting the Dutch identity (Holland, 1990).
When analyzing data, researchers make some choices that are either arbitrary, based on subjective beliefs about the data-generating process, or for which equally justifiable alternative choices could have been made. This wide range of data-analytic choices can be abused and has been one of the underlying causes of the replication crisis in several fields. Recently, the introduction of multiverse analysis provides researchers with a method to evaluate the stability of the results across reasonable choices that could be made when analyzing data. Multiverse analysis is confined to a descriptive role, lacking a proper and comprehensive inferential procedure. Recently, specification curve analysis adds an inferential procedure to multiverse analysis, but this approach is limited to simple cases related to the linear model, and only allows researchers to infer whether at least one specification rejects the null hypothesis, but not which specifications should be selected. In this paper, we present a Post-selection Inference approach to Multiverse Analysis (PIMA) which is a flexible and general inferential approach that considers for all possible models, i.e., the multiverse of reasonable analyses. The approach allows for a wide range of data specifications (i.e., preprocessing) and any generalized linear model; it allows testing the null hypothesis that a given predictor is not associated with the outcome, by combining information from all reasonable models of multiverse analysis, and provides strong control of the family-wise error rate allowing researchers to claim that the null hypothesis can be rejected for any specification that shows a significant effect. The inferential proposal is based on a conditional resampling procedure. We formally prove that the Type I error rate is controlled, and compute the statistical power of the test through a simulation study. Finally, we apply the PIMA procedure to the analysis of a real dataset on the self-reported hesitancy for the COronaVIrus Disease 2019 (COVID-19) vaccine before and after the 2020 lockdown in Italy. We conclude with practical recommendations to be considered when implementing the proposed procedure.
The use of programming languages in archaeological research has witnessed a notable surge in the last decade, particularly with R, a versatile statistical computing language that fosters the development of specialized packages. This article introduces the tesselle project (https://www.tesselle.org/), a comprehensive collection of R packages tailored for archaeological research and education. The tesselle packages are centered on quantitative analysis methods specifically crafted for archaeology. They are designed to complement both general-purpose and other specialized statistical packages. These packages serve as a versatile toolbox, facilitating the exploration and analysis of common data types in archaeology—such as count data, compositional data, or chronological data—and enabling the construction of reproducible workflows. Complementary packages for visualization, data preparation, and educational resources augment the tesselle ecosystem. This article outlines the project's inception, its objectives, design principles, and key components, along with reflections on future directions.
This chapter discusses record keeping, like maintaining a lab notebook. Historically, lab notebooks were analog, pen-and-paper affairs. With so much work being performed on the computer and with most scientific instruments creating digital data directly, most record-keeping efforts are digital. Therefore, we focus on strategies for establishing and maintaining records of computer-based work. Keeping good records of your work is essential. These records inform your future thoughts as you reflect on the work you have already done, acting as reminders and inspiration. They also provide important details for collaborators, and scientists working in large groups often have predefined standards for group members to use when keeping lab notebooks and the like. Computational work differs from traditional bench science, and this chapter describes practices for good record-keeping habits in the more slippery world of computer work.
The Mediterranean Region registers an increasing prevalence of obesity. The region lacks a diet screener to assess obesogenic nutrients. This study aimed to evaluate the reproducibility and validity of the Modified Mediterranean Prime Screen (MMPS) in estimating obesogenic nutrients’ intake among women of reproductive age, as compared with a culturally validated Food Frequency Questionnaire (FFQ), in Lebanon. We developed the MMPS consisting of thirty-two food/beverage items specific to the Lebanese Mediterranean culture. The MMPS and FFQ were administered in two visits (2 weeks–6 months apart), face to face and via telephone during the coronavirus disease 2019 pandemic. The reproducibility and validity of the MMPS were assessed using intraclass correlation coefficients (ICC) and Pearson’s correlations, respectively. The study included 143 women, aged 31·5 (sd 4·6) years, BMI 24·2 (sd 4·0) kg/m2, 87 % with university education and 91 % food secure. The reproducibility of the MMPS was moderate for energy and all assessed nutrients except for SFA (ICC = 0·428). The agreement of the MMPS with the reference FFQ was adequate for energy and obesogenic nutrients. Yet, the Pearson correlations for energy-adjusted nutrient intake were low for trans-fatty acids (0·294) and PUFA (0·377). The MMPS can be a time-efficient tool for dietary assessment of energy and many obesogenic nutrients. Future studies should validate the MMPS across the lifespan and re-evaluate it after updating the fatty acid profiles in the culturally specific food composition tables.
With its promise of nondestructive processing, rapid low-cost sampling, and portability to any field site or museum in the world, portable X-ray fluorescence (pXRF) spectrometry is rapidly becoming a standard piece of equipment for archaeologists. Even though the use of pXRF is becoming standard, the publication of pXRF analytical methods and the resulting data remains widely variable. Despite validation studies that demonstrate the importance of sample preparation, data collection settings, and data processing, there remains no standard for how to report pXRF results. In this article, we address the need for best practices in publishing pXRF analyses. We outline information that should be published alongside interpretive results in any archaeological application of pXRF. By publishing this basic information, archaeologists will increase the transparency and replicability of their analyses on an inter-analyst/inter-analyzer basis and provide clarity for journal editors and peer reviewers on publications and grant proposals for studies that use pXRF. The use of these best practices will result in better science in the burgeoning use of pXRF in archaeology.
We assessed the rigor and reproducibility (R&R) activities of institutions funded by the National Center for Advancing Translational Sciences (NCTSA) through a survey and website search (N = 61). Of 50 institutional responses, 84% reported incorporating some form of R&R training, 68% reported devoted R&R training, 30% monitored R&R practices, and 10% incentivized them. Website searches revealed 9 (15%) freely available training curricula, and 7 (11%) institutional programs specifically created to enhance R&R. NCATS should formally integrate R&R principles into its translational science models and institutional requirements.
Real-world data, such as administrative claims and electronic health records, are increasingly used for safety monitoring and to help guide regulatory decision-making. In these settings, it is important to document analytic decisions transparently and objectively to assess and ensure that analyses meet their intended goals.
Methods:
The Causal Roadmap is an established framework that can guide and document analytic decisions through each step of the analytic pipeline, which will help investigators generate high-quality real-world evidence.
Results:
In this paper, we illustrate the utility of the Causal Roadmap using two case studies previously led by workgroups sponsored by the Sentinel Initiative – a program for actively monitoring the safety of regulated medical products. Each case example focuses on different aspects of the analytic pipeline for drug safety monitoring. The first case study shows how the Causal Roadmap encourages transparency, reproducibility, and objective decision-making for causal analyses. The second case study highlights how this framework can guide analytic decisions beyond inference on causal parameters, improving outcome ascertainment in clinical phenotyping.
Conclusion:
These examples provide a structured framework for implementing the Causal Roadmap in safety surveillance and guide transparent, reproducible, and objective analysis.
This chapter describes how relationship scientists conduct research to answer questions about relationships. It explains all aspects of the research process, including how hypotheses are derived from theory, which study designs (e.g., experiments, cross-sectional studies, experience sampling) best suit specific research questions, how scientists can manipulate variables or measure variables with self-report, implicit, observational, or physiological measures, and what scientists consider when recruiting a sample to participate in their studies. This chapter also discusses how researchers approach questions about boundary conditions (when general trends do not apply) and mechanisms (the processes underlying their findings) and describes best practices for conducting ethical and reproducible research. Finally, this chapter includes a guide for how to read and evaluate empirical research articles.
Over the past years, computational methods based on deep learning—that is, machine learning with multilayered neural networks—have become state-of-the-art in main research areas in computer-aided architectural design (CAAD). To understand current trends of CAAD with deep learning, to situate them in a broader historical context, and to identify future research challenges, this article presents a systematic review of publications that apply neural networks to CAAD problems. Research papers employing neural networks were collected, in particular, from CumInCad a major open-access repository of the CAAD community and categorized into different types of research problems. Upon analyzing the distribution of the papers in these categories, namely, the composition of research subjects, data types, and neural network models, this article suggests and discusses several historical and technical trends. Moreover, it identifies that the publications analyzed typically provide limited access to important research components used as part of their deep learning methods. The article points out the importance of sharing training experiments and data, of describing the dataset, dataset parameters, dataset samples, model, learning parameters, and learning results to support reproducibility. It proposes a guideline that aims at increasing the quality and availability of CAAD research with machine learning.
Motor unit number index of the upper trapezius (MUNIX-Trapezius) is a candidate biomarker for bulbar lower motor neuron function; however, reliability data is incomplete. To assess MUNIX-Trapezius reliability in controls, we conducted a systematic review, a cross-sectional study (n = 20), and a meta-analysis. We demonstrated a high inter- and intra-rater intraclass correlation (0.86 and 0.94, respectively), indicating that MUNIX-Trapezius is reliable with between-study variability moderated by age and MUNIX technique. With further validation, this measure can serve as a disease monitoring and response biomarker of bulbar function in the therapeutic development for amyotrophic lateral sclerosis.
Replication is an important tool used to test and develop scientific theories. Areas of biomedical and psychological research have experienced a replication crisis, in which many published findings failed to replicate. Following this, many other scientific disciplines have been interested in the robustness of their own findings. This chapter examines replication in primate cognitive studies. First, it discusses the frequency and success of replication studies in primate cognition and explores the challenges researchers face when designing and interpreting replication studies across the wide range of research designs used across the field. Next, it discusses the type of research that can probe the robustness of published findings, especially when replication studies are difficult to perform. The chapter concludes with a discussion of different roles that replication can have in primate cognition research.
Part of what distinguishes science from other ways of knowing is that scientists show their work. Yet when probed, it turns out that much of the process of research is hidden away: in personal files, in undocumented conversations, in point-and-click menus, and so on. In recent years, a movement toward more open science has arisen in psychology. Open science practices capture a broad swath of activities designed to take parts of the research process that were previously known only to a research team and make them more broadly accessible (e.g., open data, open analysis code, pre-registration, open research materials). Such practices increase the value of research by increasing transparency, which may in turn facilitate higher research quality. Plus, open science practices are now required at many journals. This chapter will introduce open science practices and provide plentiful resources for researchers seeking to integrate these practices into their workflow.
The study was to evaluate the reproducibility and validity of the FFQ for residents of northeast China. A total of 131 participants completed two FFQ (FFQ1 and FFQ2) within a 3-month period, 125 participants completed 8-d weighed diet records (WDR) and 112 participants completed blood biomarker testing. Reproducibility was measured by comparing nutrient and food intake between FFQ1 and FFQ2. The validity of the FFQ was assessed by WDR and the triad method. The Spearman correlation coefficients (SCC) and intraclass correlation coefficients (ICC) for reproducibility ranged from 0·41 to 0·69 (median = 0·53) and from 0·18 to 0·68 (median = 0·53) for energy and nutrients and from 0·37 to 0·73 (median = 0·59) and from 0·33 to 0·86 (median = 0·60) for food groups, respectively. The classifications of same or adjacent quartiles ranged from 73·64 to 93·80 % for both FFQ. The crude SCC between the FFQ and WDR ranged from 0·27 to 0·55 (median = 0·46) for the energy and nutrients and from 0·26 to 0·70 (median = 0·52) for food groups, and classifications of the same or adjacent quartiles ranged from 65·32 to 86·29 %. The triad method indicated that validation coefficients for the FFQ were above 0·3 for most nutrients, which indicated a moderate or high level of validity. The FFQ that was developed for residents of northeast China for the Northeast Cohort Study of China is reliable and valid for assessing the intake of most foods and nutrients.
We reflect on the relative ‘success’ versus ‘failure’ of psychology as a research field, and we challenge the widelybheld notion that we are in a reproducibility (or replication) crisis. At the centre of our discussion is the question: does psychology have a future, qua science, if the phenomena it studies are changing all the time and contingent on fleeting contexts or historical conditions? This chapter describes how there is only a reproducibility crisis if we adopt assumptions and expectations that enact a substance ontology. In contrast, we describe how variability is to be expected if we adopt a process ontology. We argue that the way out of the current ‘crisis’ is therefore not necessarily more methodological and experimental rigour, but a fundamental shift in what we should expect from psychological phenomena. We call for a prioritization of understanding the ways in which phenomena are socially situated and context-contingent, rather than an unrealistic need to replicate.
To determine the relative validity and reproducibility of the Eetscore FFQ, a short screener for assessing diet quality, in patients with (severe) obesity before and after bariatric surgery (BS).
Design:
The Eetscore FFQ was evaluated against 3-d food records (3d-FR) before (T0) and 6 months after BS (T6) by comparing index scores of the Dutch Healthy Diet index 2015 (DHD2015-index). Relative validity was assessed using paired t tests, Kendall’s tau-b correlation coefficients (τb), cross-classification by tertiles, weighted kappa values (kw) and Bland–Altman plots. Reproducibility of the Eetscore FFQ was assessed using intraclass correlation coefficients (ICC).
Setting:
Regional hospital, the Netherlands.
Participants:
Hundred and forty participants with obesity who were scheduled for BS.
Results:
At T0, mean total DHD2015-index score derived from the Eetscore FFQ was 10·2 points higher than the food record-derived score (P < 0·001) and showed an acceptable correlation (τb = 0·42, 95 % CI: 0·27, 0·55). There was a fair agreement with a correct classification of 50 % (kw = 0·37, 95 % CI: 0·25, 0·49). Correlation coefficients of the individual DHD components varied from 0·01–0·54. Similar results were observed at T6 (τb = 0·31, 95 % CI: 0·12, 0·48, correct classification of 43·7 %; kw = 0·25, 95 % CI: 0·11, 0·40). Reproducibility of the Eetscore FFQ was good (ICC = 0·78, 95 % CI: 0·69, 0·84).
Conclusion:
The Eetscore FFQ showed to be acceptably correlated with the DHD2015-index derived from 3d-FR, but absolute agreement was poor. Considering the need for dietary assessment methods that reduce the burden for patients, practitioners and researchers, the Eetscore FFQ can be used for ranking according to diet quality and for monitoring changes over time.
Bilingualism is hard to define, measure, and study. Sparked by the “replication crisis” in the social sciences, a recent discussion on the advantages of open science is gaining momentum. Here, we join this debate to argue that bilingualism research would greatly benefit from embracing open science. We do so in a unique way, by presenting six fictional stories that illustrate how open science practices – sharing preprints, materials, code, and data; pre-registering studies; and joining large-scale collaborations – can strengthen bilingualism research and further improve its quality.
Reproducibility of a deep-learning fully convolutional neural network is evaluated by training several times the same network on identical conditions (database, hyperparameters, and hardware) with nondeterministic graphics processing unit operations. The network is trained to model three typical time–space-evolving physical systems in two dimensions: heat, Burgers’, and wave equations. The behavior of the networks is evaluated on both recursive and nonrecursive tasks. Significant changes in models’ properties (weights and feature fields) are observed. When tested on various benchmarks, these models systematically return estimations with a high level of deviation, especially for the recurrent analysis which strongly amplifies variability due to the nondeterminism. Trainings performed with double floating-point precision provide slightly better estimations and a significant reduction of the variability of both the network parameters and its testing error range.
The purpose of the current study was to develop a validated FFQ to evaluate the intake of non-nutritive sweeteners (NNS) in child and adolescent Asian populations.
Design:
Intensive and overall market research was performed to create the applicable NNS-FFQ with thirteen food categories and 305 items. Six intense sweeteners, including acesulfame potassium, aspartame, sucralose, glycyrrhizin, steviol glycosides and sorbitol, were investigated. The validity and reproducibility of the NNS-FFQ were evaluated. The validity was further assessed by examining the consistency of reported NNS intake compared with urinary biomarkers using Cohen’s κ analysis.
Settings:
This work was considered to be relevant in Asian societies.
Participants:
One hundred and two children and adolescents recruited from several clinics were invited to participate in the current study.
Results:
High content validity indices and high content validity ratio levels were revealed for each sweetener and food category. Reproducibility among subjects was satisfactory. Significant moderate correlations between estimated steviol glycoside/sucralose consumption and sensitive urinary biomarker levels were demonstrated (κ values were 0·59 and 0·45 for steviol glycosides and sucralose, respectively), indicating that the NNS-FFQ can be used to assess an individual’s NNS intake. The dietary intense sweetener consumption pattern evaluated in this measurement was similar to those observed in other Asian countries but differed from those observed in Western populations with respect to types and amounts of NNS.
Conclusions:
This validated NNS-FFQ can be an applicable and useful tool to evaluate NNS intake in future epidemiological and clinical studies.
Foodborne and waterborne gastrointestinal infections and their associated outbreaks are preventable, yet still result in significant morbidity, mortality and revenue loss. Many enteric infections demonstrate seasonality, or annual systematic periodic fluctuations in incidence, associated with climatic and environmental factors. Public health professionals use statistical methods and time series models to describe, compare, explain and predict seasonal patterns. However, descriptions and estimates of seasonal features, such as peak timing, depend on how researchers define seasonality for research purposes and how they apply time series methods. In this review, we outline the advantages and limitations of common methods for estimating seasonal peak timing. We provide recommendations improving reporting requirements for disease surveillance systems. Greater attention to how seasonality is defined, modelled, interpreted and reported is necessary to promote reproducible research and strengthen proactive and targeted public health policies, intervention strategies and preparedness plans to dampen the intensity and impacts of seasonal illnesses.