Hostname: page-component-745bb68f8f-mzp66 Total loading time: 0 Render date: 2025-01-26T00:02:17.633Z Has data issue: false hasContentIssue false

Let's move forward: Image-computable models and a common model evaluation scheme are prerequisites for a scientific understanding of human vision

Published online by Cambridge University Press:  06 December 2023

James J. DiCarlo
Affiliation:
Dept. of Brain and Cognitive Sciences, Quest for Intelligence, and McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA dicarlo@mit.edu; https://dicarlolab.mit.edu mferg@mit.edu evelina9@mit.edu; https://evlab.mit.edu/ msch@mit.edu; https://mschrimpf.com/
Daniel L. K. Yamins
Affiliation:
Wu Tsai Neurosciences Institute, Stanford University, Stanford, CA, USA yamins@stanford.edu bonnen@stanford.edu; http://neuroailab.stanford.edu/research.html
Michael E. Ferguson
Affiliation:
Dept. of Brain and Cognitive Sciences, Quest for Intelligence, and McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA dicarlo@mit.edu; https://dicarlolab.mit.edu mferg@mit.edu evelina9@mit.edu; https://evlab.mit.edu/ msch@mit.edu; https://mschrimpf.com/
Evelina Fedorenko
Affiliation:
Dept. of Brain and Cognitive Sciences, Quest for Intelligence, and McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA dicarlo@mit.edu; https://dicarlolab.mit.edu mferg@mit.edu evelina9@mit.edu; https://evlab.mit.edu/ msch@mit.edu; https://mschrimpf.com/
Matthias Bethge
Affiliation:
Tübingen AI Center, University of Tübingen, Tübingen, Germany matthias.bethge@bethgelab.org; https://bethgelab.org/
Tyler Bonnen
Affiliation:
Wu Tsai Neurosciences Institute, Stanford University, Stanford, CA, USA yamins@stanford.edu bonnen@stanford.edu; http://neuroailab.stanford.edu/research.html
Martin Schrimpf
Affiliation:
Dept. of Brain and Cognitive Sciences, Quest for Intelligence, and McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA dicarlo@mit.edu; https://dicarlolab.mit.edu mferg@mit.edu evelina9@mit.edu; https://evlab.mit.edu/ msch@mit.edu; https://mschrimpf.com/ École polytechnique fédérale de Lausanne, Lausanne, Switzerland

Abstract

In the target article, Bowers et al. dispute deep artificial neural network (ANN) models as the currently leading models of human vision without producing alternatives. They eschew the use of public benchmarking platforms to compare vision models with the brain and behavior, and they advocate for a fragmented, phenomenon-specific modeling approach. These are unconstructive to scientific progress. We outline how the Brain-Score community is moving forward to add new model-to-human comparisons to its community-transparent suite of benchmarks.

Type
Open Peer Commentary
Copyright
Copyright © The Author(s), 2023. Published by Cambridge University Press

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Baker, N., & Elder, J. H. (2022). Deep learning models fail to capture the configural nature of human shape perception. iScience, 25(9), 104913.CrossRefGoogle ScholarPubMed
Bowers, J. S., & Jones, K. W. (2007). Detecting objects is easier than categorizing them. Quarterly Journal of Experimental Psychology, 61, 552557.CrossRefGoogle Scholar
Geirhos, R., Narayanappa, K., Mitzkus, B., Thieringer, T., Bethge, M., Wichmann, F. A., & Brendel, W. (2021). Partial success in closing the gap between human and machine vision. Advances in Neural Information Processing Systems, 34, 2388523899.Google Scholar
Mack, M. L., Gauthier, I., Sadr, J., & Palmeri, T. J. (2008). Object detection and basic-level categorization: Sometimes you know it is there before you know what it is. Psychonomic Bulletin & Review, 15(1), 2835.CrossRefGoogle ScholarPubMed
Newell, A. (1973). You can't play 20 questions with nature and win: Projective comments on the papers of this symposium. Visual information processing. Academic Press.CrossRefGoogle Scholar
Puebla, G., & Bowers, J. S. (2022). Can deep convolutional neural networks support relational reasoning in the same-different task? Journal of Vision, 22(10), 118.CrossRefGoogle ScholarPubMed
Rajalingham, R., Issa, E. B., Bashivan, P., Kar, K., Schmidt, K., & DiCarlo, J. J. (2018). Large-scale, high-resolution comparison of the core visual object recognition behavior of humans, monkeys, and state-of-the-art deep artificial neural networks. Journal of Neuroscience, 38(33), 72557269.CrossRefGoogle ScholarPubMed
Saarela, T. P., Sayim, B., Westheimer, G., & Herzog, M. H. (2009). Global stimulus configuration modulates crowding. Journal of Vision, 9(2), 5.CrossRefGoogle ScholarPubMed
Schrimpf, M., Kubilius, J., Hong, H., Majaj, N. J., Rajalingham, R., Issa, E. B., … DiCarlo, J. J. (2018). Brain-Score: Which artificial neural network for object recognition is most brain-like? bioRxiv, 407007.Google Scholar
Schrimpf, M., Kubilius, J., Lee, M. J., Murty, R., Apurva, N., Ajemian, R., & DiCarlo, J. J. (2020). Integrative benchmarking to advance neurally mechanistic models of human intelligence. Neuron, 108(3), 413423.CrossRefGoogle ScholarPubMed
Spoerer, C. J., Kietzmann, T. C., Mehrer, J., Charest, I., & Kriegeskorte, N. (2020). Recurrent neural networks can explain flexible trading of speed and accuracy in biological vision. PLoS Computational Biology, 16(10), e1008215.CrossRefGoogle ScholarPubMed
Zhang, C., Bengio, S., Hardt, M., Recht, B., & Vinyals, O. (2021). Understanding deep learning (still) requires rethinking generalization. Communications of the ACM, 64(3), 107115.CrossRefGoogle Scholar