We use cookies to distinguish you from other users and to provide you with a better experience on our websites. Close this message to accept cookies or find out how to manage your cookie settings.
To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
When large achievement tests are conducted regularly, items need to be calibrated before being used as operational items in a test. Methods have been developed to optimally assign pretest items to examinees based on their abilities. Most of these methods, however, are intended for situations where examinees arrive sequentially to be assigned to calibration items. In several calibration tests, examinees take the test simultaneously or in parallel. In this article, we develop an optimal calibration design tailored for such parallel test setups. Our objective is both to investigate the efficiency gain of the method as well as to demonstrate that this method can be implemented in real calibration scenarios. For the latter, we have employed this method to calibrate items for the Swedish national tests in Mathematics. In this case study, like in many real test situations, items are of mixed format and the optimal design method needs to handle that. The method we propose works for mixed-format tests and accounts for varying expected response times. Our investigations show that the proposed method considerably enhances calibration efficiency.
A distinction is proposed between measures and predictors of latent variables. The discussion addresses the consequences of the distinction for the true-score model, the linear factor model, Structural Equation Models, longitudinal and multilevel models, and item-response models. A distribution-free treatment of calibration and error-of-measurement is given, and the contrasting properties of measures and predictors are examined.
The conventional method of measuring ability, which is based on items with assumed true parameter values obtained from a pretest, is compared to a Bayesian method that deals with the uncertainties of such items. Computational expressions are presented for approximating the posterior mean and variance of ability under the three-parameter logistic (3PL) model. A 1987 American College Testing Program (ACT) math test is used to demonstrate that the standard practice of using maximum likelihood or empirical Bayes techniques may seriously underestimate the uncertainty in estimated ability when the pretest sample is only moderately large.
Overconfidence plays a role in a large number of individual decision biases and has been considered a ‘meta-bias’ for this reason. However, since overconfidence is measured behaviorally with respect to particular tasks (in which performance varies across individuals), it is unclear whether people generally vary in terms of their general overconfidence. We investigated this issue using a novel measure: the Generalized Overconfidence Task (GOT). The GOT is a difficult perception test that asks participants to identify objects in fuzzy (‘adversarial’) images. Critically, participants’ estimated performance on the task is not related to their actual performance. Instead, variation in estimated performance, we argue, arises from generalized overconfidence, that is, people indicating a cognitive skill for which they have no basis. In a series of studies (total N = 1,293), the GOT was more predictive when looking at a broad range of behavioral outcomes than two other overestimation tasks (cognitive and numeracy) and did not display substantial overlap with conceptually related measures (Studies 1a and 1b). In Studies 2a and 2b, the GOT showed superior reliability in a test–retest design compared to the other overconfidence measures (i.e., cognitive and numeracy measures), particularly when collecting confidence ratings after each image and an estimated performance score. Finally, the GOT is a strong predictor of a host of behavioral outcomes, including conspiracy beliefs, bullshit receptivity, overclaiming, and the ability to discern news headlines.
Emotion recognition in conversation (ERC) faces two major challenges: biased predictions and poor calibration. Classifiers often disproportionately favor certain emotion categories, such as neutral, due to the structural complexity of classifiers, the subjective nature of emotions, and imbalances in training datasets. This bias results in poorly calibrated predictions where the model’s predicted probabilities do not align with the true likelihood of outcomes. To tackle these problems, we introduce the application of conformal prediction (CP) into ERC tasks. CP is a distribution-free method that generates set-valued predictions to ensure marginal coverage in classification, thus improving the calibration of models. However, inherent biases in emotion recognition models prevent baseline CP from achieving a uniform conditional coverage across all classes. We propose a novel CP variant, class spectrum conformation, which significantly reduces coverage bias in CP methods. The methodologies introduced in this study enhance the reliability of prediction calibration and mitigate bias in complex natural language processing tasks.
This study suggests that there may be considerable difficulties in providing accurate calendar age estimates in the Roman period in Europe, between ca. AD 60 and ca. AD 230, using the radiocarbon calibration datasets that are currently available. Incorporating the potential for systematic offsets between the measured data and the calibration curve using the ΔR approach suggested by Hogg et al. (2019), only marginally mitigates the biases in calendar date estimates observed. At present, it clearly behoves researchers in this period to “caveat emptor” and validate the accuracy of their calibrated radiocarbon dates and chronological models against other sources of dating information.
This handbook provides a comprehensive, practical, and independent guide to all aspects of making weather observations. The second edition has been fully updated throughout with new material, new instruments and technologies, and the latest reference and research materials. Traditional and modern weather instruments are covered, including how best to choose and to site a weather station, how to get the best out of your equipment, how to store and analyse your records and how to share your observations. The book's emphasis is on modern electronic instruments and automatic weather stations. It provides advice on replacing 'traditional' mercury-based thermometers and barometers with modern digital sensors, following implementation of the UN Minamata Convention outlawing mercury in the environment. The Weather Observer's Handbook will again prove to be an invaluable resource for both amateur observers choosing their first weather instruments and professional observers looking for a comprehensive and up-to-date guide.
Instrument calibrations are both one of the most important, and yet sometimes one of the most neglected, areas of weather measurement. This chapter describes straightforward methods to check and adjust calibrations for the most common meteorological instruments – precipitation (rainfall), temperature, humidity and air pressure sensors. To reduce uncertainty in the measurements themselves, meteorological instruments need to be accurately calibrated, or at least regularly compared against instruments of known calibration to quantify and adjust for any differences, or error. Calibrations can and do drift over time, and therefore instrumental calibrations should be checked regularly, and adjusted if necessary.
Variable-Value axiologies avoid Parfit’s Repugnant Conclusion while satisfying some weak instances of the Mere Addition principle. We apply calibration methods to two leading members of the family of Variable-Value views conditional upon: first, a very weak instance of Mere Addition and, second, some plausible empirical assumptions about the size and welfare of the intertemporal world population. We find that such facts calibrate these two Variable-Value views to be nearly totalist, and therefore imply conclusions that should seem repugnant to anyone who opposes Total Utilitarianism only due to the Repugnant Conclusion.
Solvency II requires that firms with Internal Models derive the Solvency Capital Requirement directly from the probability distribution forecast generated by the Internal Model. A number of UK insurance undertakings do this via an aggregation model consisting of proxy models and a copula. Since 2016 there have been a number of industry surveys on the application of these models, with the 2019 Prudential Regulation Authority (“PRA”) led industry wide thematic review identifying a number of areas of enhancement. This concluded that there was currently no uniform best practice. While there have been many competing priorities for insurers since 2019, the Working Party expects that firms will have either already made changes to their proxy modelling approach in light of the PRA survey, or will have plans to do so in the coming years. This paper takes the PRA feedback into account and explores potential approaches to calibration and validation, taking into consideration the different heavy models used within the industry and relative materiality of business lines.
A novel thinned antenna element distribution for cancelling grating lobes (GLs) as well as for reducing phase shifters (PSs) is presented for a two-dimensional phased-array automotive radar application. First, an efficient clustering technique of vertical adjacent elements is employed with array thinning for a PS reduction of 66.7%. In the proposed distribution, several single-element radiators (non-clustered antenna elements) are placed in the vertical direction with specific spacing in a grid of 16 × 12 (192) elements with λ/2 pitch. This disrupts the periodicity of phase-centers after element-clustering and takes a role as steerable GL canceller with capabilities of tracking and nullifying the GL at any scan angle. The proposed distribution enables beam steering up to ±60° in the azimuth plane, as well as ±25° in the elevation plane with cancelled GL and sidelobes. Furthermore, the proposed distribution has been efficiently calibrated with all elements activated by introducing the code division multiple access technique. To the best of the authors’ knowledge, this work represents the first fully calibrated state-of-the-art thinned distribution phased-array including a novel steerable GL canceller to track and nullify GLs.
This chapter elaborates on the calibration and validation procedures for the model. First, we describe our calibration strategy in which a customised optimisation algorithm makes use of a multi-objective function, preventing the loss of indicator-specific error information. Second, we externally validate our model by replicating two well-known statistical patterns: (1) the skewed distribution of budgetary changes and (2) the negative relationship between development and corruption. Third, we internally validate the model by showing that public servants who receive more positive spillovers tend to be less efficient. Fourth, we analyse the statistical behaviour of the model through different tests: validity of synthetic counterfactuals, parameter recovery, overfitting, and time equivalence. Finally, we make a brief reference to the literature on estimating SDG networks.
Wiggle-match dating of tree-ring sequences is particularly promising for achieving high-resolution dating across periods with reversals and plateaus in the calibration curve, such as the entire post-Columbian period of North American history. Here we describe a modified procedure for wiggle-match dating that facilitates precise dating of wooden museum objects while minimizing damage due to destructive sampling. We present two case studies, a dugout canoe and wooden trough, both expected to date to the 18th–19th century. (1) Tree rings were counted and sampled for dating from exposed, rough cross-sections in the wood, with no or minimal surface preparation, to preserve these fragile objects; (2) dating focused on the innermost and outermost portions of the sequences; and (3) due to the crude counting and sampling procedures, the wiggle-match was approximated using a simple ordered Sequence, with gaps defined as Intervals. In both cases, the outermost rings were dated with precision of 30 years or better, demonstrating the potential of wiggle-match dating for post-European Contact canoes and other similar objects.
This study aimed to investigate the influence of calibration field size on the gamma passing rate (GPR) in patient-specific quality assurance (PSQA).
Methods:
Two independent detectors, PTW OCTAVIUS 4D (4DOCT) and Arc Check, were utilised in volumetric modulated arc therapy plans for 26 patients (14 with Arc Check and 12 with 4DOCT). Plans were administered using Varian Unique machine (with 4DOCT) and Varian TrueBeam (with Arc Check), each employing different calibration factors (CFs): 4 × 4, 6 × 6, 8 × 8, 10 × 10, 12 × 12 and 15 × 15 cm2 field sizes. Gamma analysis was conducted with 2%2mm, 2%3mm and 3%3mm gamma criteria.
Results:
GPR exhibited variations across different CFs. GPR demonstrated an increasing trend below 10 × 10 cm² CFs, while it displayed a decreasing trend above 10 × 10 cm². Both detectors exhibited similar GPR patterns. The correlation between 4DOCT and Arc Check was strong in tighter criteria (2%2mm) with an R² value of 0·9957, moderate criteria (2%3mm) with an R² value of 0·9868, but reduced in liberal criteria (3%3mm) with an R² value of 0·4226.
Conclusion:
This study demonstrates that calibration field sizes significantly influence GPR in PSQA. This study recommends the plan specific calibration field must obtain to calibrate the QA devices for modulated plans.
A laser stripe sensor has two kinds of calibration methods. One is based on the homography model between the laser stripe plane and the image plane, which is called the one-step calibration method. The other is based on the simple triangular method, which is named as the two-step calibration method. However, the geometrical meaning of each element in the one-step calibration method is not clear as that in the two-step calibration method. A novel mathematical derivation is presented to reveal the geometrical meaning of each parameter in the one-step calibration method, and then the comparative study of the one-step calibration method and the two-step calibration method is completed and the intrinsic relationship is derived. What is more, a one-step calibration method is proposed with 7 independent parameters rather than 11 independent parameters. Experiments are conducted to verify the accuracy and robust of the proposed calibration method.
Combining experts’ subjective probability estimates is a fundamental task with broad applicability in domains ranging from finance to public health. However, it is still an open question how to combine such estimates optimally. Since the beta distribution is a common choice for modeling uncertainty about probabilities, here we propose a family of normative Bayesian models for aggregating probability estimates based on beta distributions. We systematically derive and compare different variants, including hierarchical and non-hierarchical as well as asymmetric and symmetric beta fusion models. Using these models, we show how the beta calibration function naturally arises in this normative framework and how it is related to the widely used Linear-in-Log-Odds calibration function. For evaluation, we provide the new Knowledge Test Confidence data set consisting of subjective probability estimates of 85 forecasters on 180 queries. On this and another data set, we show that the hierarchical symmetric beta fusion model performs best of all beta fusion models and outperforms related Bayesian fusion models in terms of mean absolute error.
The IntCal family of radiocarbon (14C) calibration curves is based on research spanning more than three decades. The IntCal group have collated the 14C and calendar age data (mostly derived from primary publications with other types of data and meta-data) and, since 2010, made them available for other sorts of analysis through an open-access database. This has ensured transparency in terms of the data used in the construction of the ratified calibration curves. As the IntCal database expands, work is underway to facilitate best practice for new data submissions, make more of the associated metadata available in a structured form, and help those wishing to process the data with programming languages such as R, Python, and MATLAB. The data and metadata are complex because of the range of different types of archives. A restructured interface, based on the “IntChron” open-access data model, includes tools which allow the data to be plotted and compared without the need for export. The intention is to include complementary information which can be used alongside the main 14C series to provide new insights into the global carbon cycle, as well as facilitating access to the data for other research applications. Overall, this work aims to streamline the generation of new calibration curves.
Stereo vision allows machines to perceive their surroundings, with plane identification serving as a crucial aspect of perception. The accuracy of identification constrains the applicability of stereo systems. Some stereo vision cameras are cost-effective, compact, and user-friendly, resulting in widespread use in engineering applications. However, identification errors limit their effectiveness in quantitative scenarios. While certain calibration methods enhance identification accuracy using camera distortion models, they rely on specific models tailored to a camera’s unique structure. This article presents a calibration method that is not dependent on any particular distortion model, capable of correcting plane position and orientation identified by any algorithm, provided that the identification error is biased. A high-precision mechanical calibration platform is designed to acquire accurate calibration data while using the same detected material in real measurement scenarios. Experimental comparisons confirm the efficacy of plane pose correction on PCL-RANSAC, with the average relative error of distance reduced by 5.4 times and the average absolute error of angle decreasing by 41.2%.
The Minoan eruption of Santorini, Greece, is an important and often-debated chronological marker in contexts of the Eastern Mediterranean region. Among various age estimates of this event, one based on wiggle-matching of radiocarbon (14C) dates from an olive branch found in Santorini by Friedrich et al. (2006) has been widely discussed. Calibrated age estimates based on wiggle-matching of these 14C ages have been changing with improvements in the 14C calibration curve. As also shown earlier, calibration of average 14C age of multiple tree rings dated together should not be done using a single-year calibration curve. Since recent calibration curves include many single-year 14C datasets, a different approach should be considered to calibrate the average 14C age of block of multiple tree rings. Here we have demonstrated the use of multiple moving average (MA) calibration curves for calibrating the sequence of four 14C ages reported for the Santorini olive branch. The resultant calibrated ages for the Minoan Eruption are relatively younger than previous estimates and range from the late-17th century BCE to mid-16th century BCE date.