Hostname: page-component-68c7f8b79f-j6k2s Total loading time: 0 Render date: 2026-01-02T01:50:04.551Z Has data issue: false hasContentIssue false

SNAP Judgments: A Small n Acceptability Paradigm (SNAP) for Linguistic Acceptability Judgments

Published online by Cambridge University Press:  01 January 2026

Kyle Mahowald*
Affiliation:
Massachusetts Institute of Technology
Peter Graff*
Affiliation:
Intel Corporation
Jeremy Hartman*
Affiliation:
University of Massachusetts Amherst
Edward Gibson*
Affiliation:
Massachusetts Institute of Technology
*
Mahowald, Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, 77 Massachusetts Ave., 46-3037 Cambridge, MA 02139 [kylemaho@mit.edu]
Graff, Intel Corporation, 2200 Mission College Blvd., Santa Clara, CA 95054 [peter.graff@intel.com]
Hartman, Department of Linguistics, University of Massachusetts at Amherst, 650 N. Pleasant St., Integrative Learning Center 428, Amherst, MA 01003, [hartman@linguist.umass.edu]
Gibson, Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, 77 Massachusetts Avenue, 46-3037 Cambridge, MA 02139 [egibson@mit.edu]
Get access

Abstract

While published linguistic judgments sometimes differ from the judgments found in large-scale formal experiments with naive participants, there is not a consensus as to how often these errors occur nor as to how often formal experiments should be used in syntax and semantics research. In this article, we first present the results of a large-scale replication of the Sprouse et al. 2013 study on 100 English contrasts randomly sampled from Linguistic Inquiry 2001-2010 and tested in both a forced-choice experiment and an acceptability rating experiment. Like Sprouse, Schütze, and Almeida, we find that the effect sizes of published linguistic acceptability judgments are not uniformly large or consistent but rather form a continuum from very large effects to small or nonexistent effects. We then use this data as a prior in a Bayesian framework to propose a small n acceptability paradigm for linguistic acceptability judgments (SNAP Judgments). This proposal makes it easier and cheaper to obtain meaningful quantitative data in syntax and semantics research. Specifically, for a contrast of linguistic interest for which a researcher is confident that sentence A is better than sentence B, we recommend that the researcher should obtain judgments from at least five unique participants, using at least five unique sentences of each type. If all participants in the sample agree that sentence A is better than sentence B, then the researcher can be confident that the result of a full forced-choice experiment would likely be 75% or more agreement in favor of sentence A (with a mean of 93%). We test this proposal by sampling from the existing data and find that it gives reliable performance.

Information

Type
Research Article
Copyright
Copyright © 2016 Linguistic Society of America

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

Footnotes

*

We would like to thank Evelina Fedorenko, Roger Levy, Steve Piantadosi, Colin Phillips, and the audience at AMLaP 2014 for comments and discussion. We would further like to thank Ruth Abrams, Zuzanna Balewski, Michael Behr, Lauren Burke, Camille DeJarnett, Jessica Hamrick, Anthony Lu, Emily Lydic, Molly McShane, Philip Smith, Jacob Steinhardt, Stephanie Tong, Chelsea Voss, Tyrel Waagen, Xiaoran Xu, and Fangheng Zhou for constructing experimental items as well as Zoe Snape for help with the implementation of the experiment. We thank Richard Futrell, Alexander Paunov, and Caitlyn Hoeflin for coding the obviousness of judgments. KM was supported by an NDSEG graduate fellowship.

References

Agresti, Alan. 2002. Categorical data analysis. New York: John Wiley & Sons.CrossRefGoogle Scholar
Arppe, Antti, and Järvikivi, Juhani. 2007. Take empiricism seriously! In support of methodological diversity in linguistics. Corpus Linguistics and Linguistic Theory 3. 99. 109. DOI: 10.1515/CLLT.2006.007.CrossRefGoogle Scholar
Baayen, R. Harald, Davidson, Doug J.; and Bates, Douglas Μ.. 2008. Mixed-effects modeling with crossed random effects for subjects and items. Journal of Memory and Language 59. 390412. DOI: 10.1016/j.jml.2007.12.005.CrossRefGoogle Scholar
Barr, Dale J., Levy, Roger, Scheepers, Christoph; and Tily, Harry J.. 2013. Random effects structure for confirmatory hypothesis testing: Keep it maximal. Journal of Memory and Language 68. 255–78. DOI: 10.1016/j.jml.2012.11.001.CrossRefGoogle ScholarPubMed
Bates, Douglas, Maechler, Martin, Bolker, Ben; and Walker, Steven. 2014. lme4: Linear mixed-effects models using Eigen and S4. Online: http://CRAN.R-project.org/package=lme4.Google Scholar
Birdsong, David. 1989. Metalinguistic performance and interlinguistic competence. Dordrecht: Springer.CrossRefGoogle Scholar
Chomsky, Noam. 1957. Syntactic structures. Berlin: Walter de Gruyter.CrossRefGoogle Scholar
Chomsky, Noam. 1986. Barriers. Cambridge, MA: MIT Press.Google Scholar
Cohen, J. 1994. The earth is round (p <.05). American Psychologist 49. 9971003. DOI: 10.1037/0003-066X.49.12.997.CrossRefGoogle Scholar
Cowart, Wayne. 1997. Experimental syntax: Applying objective methods to sentence judgments. Thousand Oaks, CA: Sage.Google Scholar
Crump, Matthew J. c., McDonnell, John V.; and Gureckis, Todd Μ.. 2013. Evaluating Amazon's Mechanical Turk as a tool for experimental behavioral research. PLoS ONE 8:e57410. DOI: 10.1371/journal.pone.0057410.CrossRefGoogle Scholar
Culicover, Peter w., and Jackendoff, Ray. 2010. Quantitative methods alone are not enough: Response to Gibson and Fedorenko. Trends in Cognitive Sciences 14. 234–35. DOI: 10.1016/j.tics.2010.03.012.CrossRefGoogle Scholar
Featherston, Sam. 2005. Magnitude estimation and what it can do for your syntax: Some wh-constraints in German. Lingua 115. 1525–50. DOI: 10.1016/j.lingua.2004.07.003.CrossRefGoogle Scholar
Gibson, Edward, and Fedorenko, Evelina. 2010. Weak quantitative standards in linguistics research. Trends in Cognitive Sciences 14. 233–34. DOI: 10.1016/j.tics.2010.03.005.CrossRefGoogle ScholarPubMed
Gibson, Edward, and Fedorenko, Evelina. 2013. The need for quantitative methods in syntax and semantics research. Language and Cognitive Processes 28. 88124. DOI: 10.1080/01690965.2010.515080.CrossRefGoogle Scholar
Gibson, Edward, Piantadosi, Steven T.; and Fedorenko, Evelina. 2013. Quantitative methods in syntax/semantics research: A response to Sprouse and Almeida (2013). Language and Cognitive Processes 28. 229–40. DOI: 10.1080/01690965.2012.704385.CrossRefGoogle Scholar
Gibson, Edward, Piantadosi, Steve; and Fedorenko, Kristina. 2011. Using Mechanical Turk to obtain and analyze English acceptability judgments. Language and Linguistics Compass 5. 509–24. DOI: 10.1111/j.1749-818X.2011.00295.x.CrossRefGoogle Scholar
Gross, Steven, and Culbertson, Jennifer. 2011. Revisited linguistic intuitions. The British Journal for the Philosophy of Science 62. 639–56. DOI: 10.1093/bjps/axr009.CrossRefGoogle Scholar
Householder, Fred W. Jr. 1965. On some recent claims in phonological theory. Journal of Linguistics 1. 1334. DOI: 10.1017/S0022226700000992.CrossRefGoogle Scholar
Labov, William. 1978. Sociolinguistics. A survey of linguistic science, ed. by Dingwall, William Orr, 339–72. Stamford, CT: Greylock.Google Scholar
Lau, Jey, Clark, Alexander; and Lappin, Shalom. 2017. Grammaticality, acceptability, and probability: A probabilistic view of linguistic knowledge. Cognitive Science, to appear.CrossRefGoogle Scholar
Levelt, Willem J. Μ., van Gent, J. A. W. Μ., Haans, A. F. J.; and Meijers, A. J. A.. 1977. Grammaticality, paraphrase and imagery. Acceptability in language, ed. by Greenbaum, Sidney, 87101. The Hague: Mouton.CrossRefGoogle Scholar
Linzen, Tal, and Oseki, Yohei. 2015. The reliability of acceptability judgments across languages. New York: New York University, ms.Google Scholar
Marantz, Alec. 2005. Generative linguistics within the cognitive neuroscience of language. The Linguistic Review 22. 429–45. DOI: 10.1515/tlir.2005.22.2-4.429.CrossRefGoogle Scholar
Mason, Winter, and Suri, Siddharth. 2012. Conducting behavioral research on Amazon's Mechanical Turk. Behavior Research Methods 44. 123. DOI: 10.3758/s13428-011-0124-6.CrossRefGoogle ScholarPubMed
McCawley, James D. 1982. How far can you trust a linguist? Language, mind, and brain, ed. by Simon, Thomas W. and Scholes, Robert J., 7587. Hillsdale, NJ: Ablex.Google Scholar
Myers, James. 2009. The design and analysis of small-scale syntactic judgment experiments. Lingua 119. 425–44. DOI: 10.1016/j.lingua.2008.09.003.CrossRefGoogle Scholar
R Core Team. 2012. R: A language and environment for statistical computing. Vienna: R Foundation for Statistical Computing. Online: http://www.R-project.org.Google Scholar
Schütze, Carson T. 1996. The empirical base of linguistics: Grammaticality judgments and linguistic methodology. Chicago: University of Chicago Press.Google Scholar
Sorace, Antonella, and Keller, Frank. 2005. Gradience in linguistic data. Lingua 115. 14971524. DOI: 10.1016/j.lingua.2004.07.002.CrossRefGoogle Scholar
Spencer, N. J. 1973. Differences between linguists and nonlinguists in intuitions of grammaticality-acceptability. Journal of Psycholinguistic Research 2. 8398. DOI: 10.1007/BF01067203.CrossRefGoogle ScholarPubMed
Sprouse, Jon. 2011. A validation of Amazon Mechanical Turk for the collection of acceptability judgments in linguistic theory. Behavior Research Methods 43. 155–67. DOI: 10.3758/s13428-010-0039-7.CrossRefGoogle ScholarPubMed
Sprouse, Jon, and Almeida, Diogo. 2012. Power in acceptability judgment experiments and the reliability of data in syntax. Irvine: University of California, Irvine, and Ann Arbor: Michigan State University, ms. Online: http://ling.auf.net/lingbuzz/001520.Google Scholar
Sprouse, Jon, Schütze;, Carson T. and Almeida, Diogo. 2013. A comparison of informal and formal acceptability judgments using a random sample from Linguistic Inquiry 2001-2010. Lingua 134. 219–48. DOI: 10.1016/j.lingua.2013.07.002.CrossRefGoogle Scholar
Wasow, Thomas, and Arnold, Jennifer. 2005. Intuitions in linguistic argumentation. Lingua 115. 1481–96. DOI: 10.1016/j.lingua.2004.07.001.CrossRefGoogle Scholar
Supplementary material: File

Mahowald et al. supplementary material

Mahowald et al. supplementary material
Download Mahowald et al. supplementary material(File)
File 325.3 KB