Published online by Cambridge University Press: 01 January 2026
While published linguistic judgments sometimes differ from the judgments found in large-scale formal experiments with naive participants, there is not a consensus as to how often these errors occur nor as to how often formal experiments should be used in syntax and semantics research. In this article, we first present the results of a large-scale replication of the Sprouse et al. 2013 study on 100 English contrasts randomly sampled from Linguistic Inquiry 2001-2010 and tested in both a forced-choice experiment and an acceptability rating experiment. Like Sprouse, Schütze, and Almeida, we find that the effect sizes of published linguistic acceptability judgments are not uniformly large or consistent but rather form a continuum from very large effects to small or nonexistent effects. We then use this data as a prior in a Bayesian framework to propose a small n acceptability paradigm for linguistic acceptability judgments (SNAP Judgments). This proposal makes it easier and cheaper to obtain meaningful quantitative data in syntax and semantics research. Specifically, for a contrast of linguistic interest for which a researcher is confident that sentence A is better than sentence B, we recommend that the researcher should obtain judgments from at least five unique participants, using at least five unique sentences of each type. If all participants in the sample agree that sentence A is better than sentence B, then the researcher can be confident that the result of a full forced-choice experiment would likely be 75% or more agreement in favor of sentence A (with a mean of 93%). We test this proposal by sampling from the existing data and find that it gives reliable performance.
We would like to thank Evelina Fedorenko, Roger Levy, Steve Piantadosi, Colin Phillips, and the audience at AMLaP 2014 for comments and discussion. We would further like to thank Ruth Abrams, Zuzanna Balewski, Michael Behr, Lauren Burke, Camille DeJarnett, Jessica Hamrick, Anthony Lu, Emily Lydic, Molly McShane, Philip Smith, Jacob Steinhardt, Stephanie Tong, Chelsea Voss, Tyrel Waagen, Xiaoran Xu, and Fangheng Zhou for constructing experimental items as well as Zoe Snape for help with the implementation of the experiment. We thank Richard Futrell, Alexander Paunov, and Caitlyn Hoeflin for coding the obviousness of judgments. KM was supported by an NDSEG graduate fellowship.