Hostname: page-component-745bb68f8f-5r2nc Total loading time: 0 Render date: 2025-01-11T14:38:02.422Z Has data issue: false hasContentIssue false

Deep problems with neural network models of human vision

Published online by Cambridge University Press:  01 December 2022

Jeffrey S. Bowers
Affiliation:
School of Psychological Science, University of Bristol, Bristol, UK j.bowers@bristol.ac.uk; https://jeffbowers.blogs.bristol.ac.uk/ gaurav.malhotra@bristol.ac.uk marin.dujmovic@bristol.ac.uk m.lleramontero@bristol.ac.uk christian.tsvetkov@bristol.ac.uk valerio.biscione@gmail.com guillermo.puebla@bristol.ac.uk
Gaurav Malhotra
Affiliation:
School of Psychological Science, University of Bristol, Bristol, UK j.bowers@bristol.ac.uk; https://jeffbowers.blogs.bristol.ac.uk/ gaurav.malhotra@bristol.ac.uk marin.dujmovic@bristol.ac.uk m.lleramontero@bristol.ac.uk christian.tsvetkov@bristol.ac.uk valerio.biscione@gmail.com guillermo.puebla@bristol.ac.uk
Marin Dujmović
Affiliation:
School of Psychological Science, University of Bristol, Bristol, UK j.bowers@bristol.ac.uk; https://jeffbowers.blogs.bristol.ac.uk/ gaurav.malhotra@bristol.ac.uk marin.dujmovic@bristol.ac.uk m.lleramontero@bristol.ac.uk christian.tsvetkov@bristol.ac.uk valerio.biscione@gmail.com guillermo.puebla@bristol.ac.uk
Milton Llera Montero
Affiliation:
School of Psychological Science, University of Bristol, Bristol, UK j.bowers@bristol.ac.uk; https://jeffbowers.blogs.bristol.ac.uk/ gaurav.malhotra@bristol.ac.uk marin.dujmovic@bristol.ac.uk m.lleramontero@bristol.ac.uk christian.tsvetkov@bristol.ac.uk valerio.biscione@gmail.com guillermo.puebla@bristol.ac.uk
Christian Tsvetkov
Affiliation:
School of Psychological Science, University of Bristol, Bristol, UK j.bowers@bristol.ac.uk; https://jeffbowers.blogs.bristol.ac.uk/ gaurav.malhotra@bristol.ac.uk marin.dujmovic@bristol.ac.uk m.lleramontero@bristol.ac.uk christian.tsvetkov@bristol.ac.uk valerio.biscione@gmail.com guillermo.puebla@bristol.ac.uk
Valerio Biscione
Affiliation:
School of Psychological Science, University of Bristol, Bristol, UK j.bowers@bristol.ac.uk; https://jeffbowers.blogs.bristol.ac.uk/ gaurav.malhotra@bristol.ac.uk marin.dujmovic@bristol.ac.uk m.lleramontero@bristol.ac.uk christian.tsvetkov@bristol.ac.uk valerio.biscione@gmail.com guillermo.puebla@bristol.ac.uk
Guillermo Puebla
Affiliation:
School of Psychological Science, University of Bristol, Bristol, UK j.bowers@bristol.ac.uk; https://jeffbowers.blogs.bristol.ac.uk/ gaurav.malhotra@bristol.ac.uk marin.dujmovic@bristol.ac.uk m.lleramontero@bristol.ac.uk christian.tsvetkov@bristol.ac.uk valerio.biscione@gmail.com guillermo.puebla@bristol.ac.uk
Federico Adolfi
Affiliation:
School of Psychological Science, University of Bristol, Bristol, UK j.bowers@bristol.ac.uk; https://jeffbowers.blogs.bristol.ac.uk/ gaurav.malhotra@bristol.ac.uk marin.dujmovic@bristol.ac.uk m.lleramontero@bristol.ac.uk christian.tsvetkov@bristol.ac.uk valerio.biscione@gmail.com guillermo.puebla@bristol.ac.uk Ernst Strüngmann Institute (ESI) for Neuroscience in Cooperation with Max Planck Society, Frankfurt am Main, Germany fedeadolfi@gmail.com
John E. Hummel
Affiliation:
Department of Psychology, University of Illinois Urbana–Champaign, Champaign, IL, USA jehummel@illinois.edu rmflood2@illinois.edu
Rachel F. Heaton
Affiliation:
Department of Psychology, University of Illinois Urbana–Champaign, Champaign, IL, USA jehummel@illinois.edu rmflood2@illinois.edu
Benjamin D. Evans
Affiliation:
Department of Informatics, School of Engineering and Informatics, University of Sussex, Brighton, UK b.d.evans@sussex.ac.uk j.mitchell@napier.ac.uk
Jeffrey Mitchell
Affiliation:
Department of Informatics, School of Engineering and Informatics, University of Sussex, Brighton, UK b.d.evans@sussex.ac.uk j.mitchell@napier.ac.uk
Ryan Blything
Affiliation:
School of Psychology, Aston University, Birmingham, UK r.blything@aston.ac.uk

Abstract

Deep neural networks (DNNs) have had extraordinary successes in classifying photographic images of objects and are often described as the best models of biological vision. This conclusion is largely based on three sets of findings: (1) DNNs are more accurate than any other model in classifying images taken from various datasets, (2) DNNs do the best job in predicting the pattern of human errors in classifying objects taken from various behavioral datasets, and (3) DNNs do the best job in predicting brain signals in response to images taken from various brain datasets (e.g., single cell responses or fMRI data). However, these behavioral and brain datasets do not test hypotheses regarding what features are contributing to good predictions and we show that the predictions may be mediated by DNNs that share little overlap with biological vision. More problematically, we show that DNNs account for almost no results from psychological research. This contradicts the common claim that DNNs are good, let alone the best, models of human object recognition. We argue that theorists interested in developing biologically plausible models of human vision need to direct their attention to explaining psychological findings. More generally, theorists need to build models that explain the results of experiments that manipulate independent variables designed to test hypotheses rather than compete on making the best predictions. We conclude by briefly summarizing various promising modeling approaches that focus on psychological data.

Type
Target Article
Copyright
Copyright © The Author(s), 2022. Published by Cambridge University Press

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Adolfi, F., Bowers, J. S., & Poeppel, D. (2023). Successes and critical failures of neural networks in capturing human-like speech recognition. Neural Networks, 162, 199211.CrossRefGoogle ScholarPubMed
Alcorn, M. A., Li, Q., Gong, Z., Wang, C., Mai, L., Ku, W. S., & Nguyen, A. (2019, June). Strike (with) a pose: Neural networks are easily fooled by strange poses of familiar objects. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, Long Beach, USA (pp. 4845–4854).CrossRefGoogle Scholar
Alexander, D. M., & Van Leeuwen, C. (2010). Mapping of contextual modulation in the population response of primary visual cortex. Cognitive Neurodynamics, 4(1), 124.CrossRefGoogle ScholarPubMed
Ba, J., Mnih, V., & Kavukcuoglu, K. (2014). Multiple object recognition with visual attention. arXiv preprint arXiv:1412.7755, 110.Google Scholar
Baker, N., Kellman, P. J., Erlikhman, G., & Lu, H. (2018a). Deep convolutional networks do not perceive illusory contours. In Proceedings of the 40th annual conference of the cognitive science society, Cognitive Science Society, Austin, TX (pp. 1310–1315).Google Scholar
Baker, N., Lu, H., Erlikhman, G., & Kellman, P. J. (2018b). Deep convolutional networks do not classify based on global object shape. PLoS Computational Biology, 14(12), e1006613.CrossRefGoogle Scholar
Barrett, D., Hill, F., Santoro, A., Morcos, A., & Lillicrap, T. (2018, July). Measuring abstract reasoning in neural networks. In International conference on machine learning, Stockholm, Sweden (pp. 511–520).Google Scholar
Bhattasali, N. X., Tomov, M., & Gershman, S. (2021, June). CCNLab: A benchmarking framework for computational cognitive neuroscience. In Thirty-fifth conference on neural information processing systems datasets and benchmarks track (round 1). Virtual conference.Google Scholar
Biederman, I. (1987). Recognition-by-components: A theory of human image understanding. Psychological Review, 94(2), 115147.CrossRefGoogle ScholarPubMed
Biederman, I., & Ju, G. (1988). Surface versus edge-based determinants of visual recognition. Cognitive Psychology, 20(1), 3864.CrossRefGoogle ScholarPubMed
Biscione, V., & Bowers, J. S. (2021). Convolutional neural networks are not invariant to translation, but they can learn to be. Journal of Machine Learning Research, 22, 128.Google Scholar
Biscione, V., & Bowers, J. S. (2022). Learning online visual invariances for novel objects via supervised and self-supervised training. Neural Networks, 150, 222236. https://doi.org/10.1016/j.neunet.2022.02.017CrossRefGoogle ScholarPubMed
Biscione, V., & Bowers, J. S. (2023). Mixed evidence for gestalt grouping in deep neural networks. Computational Brain & Behavior. https://doi.org/10.1007/s42113-023-00169-2CrossRefGoogle Scholar
Blything, R., Biscione, V., & Bowers, J. (2020). A case for robust translation tolerance in humans and CNNs. A commentary on Han et al. arXiv preprint arXiv:2012.05950, 18.Google Scholar
Blything, R., Biscione, V., Vankov, I. I., Ludwig, C. J. H., & Bowers, J. S. (2021). The human visual system and CNNs can both support robust online translation tolerance following extreme displacements. Journal of Vision, 21(2), 9, 1–16. https://doi.org/10.1167/jov.21.2.9CrossRefGoogle ScholarPubMed
Bowers, J. S. (2017). Parallel distributed processing theory in the age of deep networks. Trends in Cognitive Science, 21, 950961.CrossRefGoogle ScholarPubMed
Bowers, J. S., & Davis, C. J. (2012a). Bayesian just-so stories in psychology and neuroscience. Psychological Bulletin, 138, 389414. doi:10.1037/a0026450CrossRefGoogle ScholarPubMed
Bowers, J. S., & Davis, C. J. (2012b). Is that what Bayesians believe? Reply to Griffiths, Chater, Norris, and Pouget (2012). Psychological Bulletin, 138, 423426. doi:10.1037/a0027750CrossRefGoogle ScholarPubMed
Bowers, J. S., & Jones, K. W. (2007). Detecting objects is easier than categorizing them. Quarterly Journal of Experimental Psychology, 61, 552557.CrossRefGoogle Scholar
Bowers, J. S., Vankov, I. I., & Ludwig, C. J. (2016). The visual system supports online translation invariance for object identification. Psychonomic Bulletin & Review, 23, 432438.CrossRefGoogle ScholarPubMed
Burgess, C. P., Matthey, L., Watters, N., Kabra, R., Higgins, I., Botvinick, M., & Lerchner, A. (2019). Monet: Unsupervised scene decomposition and representation. arXiv preprint arXiv:1901.11390, 122.Google Scholar
Cao, Y., Grossberg, S., & Markowitz, J. (2011). How does the brain rapidly learn and reorganize view-invariant and position-invariant object representations in the inferotemporal cortex? Neural Networks, 24(10), 10501061.CrossRefGoogle ScholarPubMed
Carpenter, G. A., & Grossberg, S. (1981). Adaptation and transmitter gating in vertebrate photoreceptors. Journal of Theoretical Neurobiology, 1(1), 142.Google Scholar
Caucheteux, C., Gramfort, A., & King, J. R. (2022). Deep language algorithms predict semantic comprehension from brain activity. Scientific Reports, 12(1), 110.CrossRefGoogle ScholarPubMed
Cavanagh, P., Hénaff, M. A., Michel, F., Landis, T., Troscianko, T., & Intriligator, J. (1998). Complete sparing of high-contrast color input to motion perception in cortical color blindness. Nature Neuroscience, 1(3), 242247.CrossRefGoogle ScholarPubMed
Cichy, R. M., Khosla, A., Pantazis, D., Torralba, A., & Oliva, A. (2016). Comparison of deep neural networks to spatiotemporal cortical dynamics of human visual object recognition reveals hierarchical correspondence. Scientific Reports, 6, 113.CrossRefGoogle ScholarPubMed
Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20(1), 3746.CrossRefGoogle Scholar
Cowan, N. (2001). The magical number 4 in short-term memory: A reconsideration of mental storage capacity. Behavioral and Brain Sciences, 24(1), 87114.CrossRefGoogle ScholarPubMed
Crosby, M., Beyret, B., & Halina, M. (2019). The animal-AI Olympics. Nature Machine Intelligence, 1(5), 257257.CrossRefGoogle Scholar
Doumas, L. A. A., Puebla, G., Martin, A. E., & Hummel, J. E. (2022). A theory of relation learning and cross-domain generalization. Psychological Review, 129(5), 9991041. https://doi.org/10.1037/rev0000346CrossRefGoogle ScholarPubMed
Driver, J., & Baylis, G. C. (1996). Edge-assignment and figure-ground segmentation in short-term visual matching. Cognitive Psychology, 31(3), 248306.CrossRefGoogle ScholarPubMed
Duan, S., Matthey, L., Saraiva, A., Watters, N., Burgess, C. P., Lerchner, A., & Higgins, I. (2019). Unsupervised model selection for variational disentangled representation learning. arXiv preprint arXiv:1905.12614, 129.Google Scholar
Dujmović, M., Bowers, J. S., Adolfi, F., & Malhotra, G. (2022). Some pitfalls of measuring representational similarity using representational similarity analysis. arXiv preprint, 148. https://www.biorxiv.org/content/10.1101/2022.04.05.487135v1Google Scholar
Dujmović, M., Bowers, J. S., Adolfi, F., & Malhotra, G. (2023). Obstacles to inferring mechanistic similarity using Representational Similarity Analysis. bioRxiv. https://doi.org/10.1101/2022.04.05.487135Google Scholar
Dujmović, M., Malhotra, G., & Bowers, J. S. (2020). What do adversarial images tell us about human vision? eLife, 9, e55978.CrossRefGoogle ScholarPubMed
Duncan, J., & Humphreys, G. W. (1989). Visual search and stimulus similarity. Psychological Review, 96(3), 433458.CrossRefGoogle ScholarPubMed
Elmoznino, E., & Bonner, M. F. (2022). High-performing neural network models of visual cortex benefit from high latent dimensionality. bioRxiv, 133. https://doi.org/10.1101/2022.07.13.499969Google Scholar
Erdogan, G., & Jacobs, R. A. (2017). Visual shape perception as Bayesian inference of 3D object-centered shape representations. Psychological Review, 124(6), 740761.CrossRefGoogle ScholarPubMed
Evans, B. D., Malhotra, G., & Bowers, J. S. (2022). Biological convolutions improve DNN robustness to noise and generalisation. Neural Networks, 148, 96110. https://doi.org/10.1016/j.neunet.2021.12.005CrossRefGoogle ScholarPubMed
Farah, M. J. (2004). Visual agnosia. MIT Press.CrossRefGoogle Scholar
Feather, J., Durango, A., Gonzalez, R., & McDermott, J. (2019). Metamers of neural networks reveal divergence from human perceptual systems. Advances in Neural Information Processing Systems, 32, 112.Google Scholar
Fleming, R. W., & Storrs, K. R. (2019). Learning to see stuff. Current Opinion in Behavioral Sciences, 30, 100108.CrossRefGoogle ScholarPubMed
Fodor, J. A., & Pylyshyn, Z. W. (1988). Connectionism and cognitive architecture: A critical analysis. Cognition, 28(1–2), 371.CrossRefGoogle ScholarPubMed
Francis, G., Manassi, M., & Herzog, M. H. (2017). Neural dynamics of grouping and segmentation explain properties of visual crowding. Psychological Review, 124(4), 483504.CrossRefGoogle ScholarPubMed
Funke, C. M., Borowski, J., Stosio, K., Brendel, W., Wallis, T. S., & Bethge, M. (2021). Five points to check when comparing visual perception in humans and machines. Journal of Vision, 21(3), 16, 1–23.CrossRefGoogle ScholarPubMed
Garrigan, P., & Kellman, P. J. (2008). Perceptual learning depends on perceptual constancy. Proceedings of the National Academy of Sciences of the United States of America, 105(6), 22482253.CrossRefGoogle ScholarPubMed
Gauthier, I., & Tarr, M. J. (2016). Visual object recognition: Do we (finally) know more now than we did? Annual Review of Vision Science, 2, 377396.CrossRefGoogle ScholarPubMed
Geirhos, R., Jacobsen, J. H., Michaelis, C., Zemel, R., Brendel, W., Bethge, M., & Wichmann, F. A. (2020a). Shortcut learning in deep neural networks. Nature Machine Intelligence, 2(11), 665673.CrossRefGoogle Scholar
Geirhos, R., Meding, K., & Wichmann, F. A. (2020b). Beyond accuracy: Quantifying trial-by-trial behaviour of CNNs and humans by measuring error consistency. Advances in Neural Information Processing Systems, 33, 1389013902.Google Scholar
Geirhos, R., Narayanappa, K., Mitzkus, B., Thieringer, T., Bethge, M., Wichmann, F. A., & Brendel, W. (2021). Partial success in closing the gap between human and machine vision. Advances in Neural Information Processing Systems, 34, 2388523899.Google Scholar
Geirhos, R., Rubisch, P., Michaelis, C., Bethge, M., Wichmann, F. A., & Brendel, W. (2019). ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness. In International conference on learning representations (ICLR), New Orleans. https://openreview.net/forum?id=Bygh9j09KXGoogle Scholar
Geirhos, R., Temme, C. R., Rauber, J., Schütt, H. H., Bethge, M., & Wichmann, F. A. (2018). Generalisation in humans and deep neural networks. Advances in Neural Information Processing Systems, 31, 75387550.Google Scholar
George, D., Lehrach, W., Kansky, K., Lázaro-Gredilla, M., Laan, C., Marthi, B., … Phoenix, D. S. (2017). A generative vision model that trains with high data efficiency and breaks text-based CAPTCHAs. Science (New York, N.Y.), 358(6368), eaag2612.CrossRefGoogle ScholarPubMed
German, J. S., & Jacobs, R. A. (2020). Can machine learning account for human visual object shape similarity judgments. Vision Research, 167, 8799. https://doi.org/10.1016/j.visres.2019.12.001CrossRefGoogle ScholarPubMed
Goodale, M. A., & Milner, A. D. (1992). Separate visual pathways for perception and action. Trends in Neurosciences, 15(1), 2025.CrossRefGoogle ScholarPubMed
Graves, A., Wayne, G., Reynolds, M., Harley, T., Danihelka, I., Grabska-Barwińska, A., … Hassabis, D. (2016). Hybrid computing using a neural network with dynamic external memory. Nature, 538(7626), 471476.CrossRefGoogle ScholarPubMed
Greff, K., Kaufman, R. L., Kabra, R., Watters, N., Burgess, C., Zoran, D., … Lerchner, A. (2019, May). Multi-object representation learning with iterative variational inference. In International conference on machine learning, Long Beach, USA (pp. 2424–2433).Google Scholar
Greff, K., van Steenkiste, S., & Schmidhuber, J. (2020). On the binding problem in artificial neural networks. arXiv preprint arXiv:2012.05208, 175.Google Scholar
Griffiths, T. L., Chater, N., Norris, D., & Pouget, A. (2012). How the Bayesians got their beliefs (and what those beliefs actually are): Comment on Bowers and Davis (2012). Psychological Bulletin, 138(3), 415422. https://doi.org/10.1037/a0026884CrossRefGoogle ScholarPubMed
Grill-Spector, K., & Kanwisher, N. (2005). Visual recognition: As soon as you know it is there, you know what it is. Psychological Science, 16(2), 152160.CrossRefGoogle Scholar
Grossberg, S. (1980). How does a brain build a cognitive code? Psychological Review, 87, 151.CrossRefGoogle ScholarPubMed
Grossberg, S. (2000). The complementary brain: Unifying brain dynamics and modularity. Trends in Cognitive Sciences, 4, 233246.CrossRefGoogle ScholarPubMed
Grossberg, S. (2021). Conscious mind, resonant brain: How each brain makes a mind. Oxford University Press.CrossRefGoogle Scholar
Grossberg, S., & Mingolla, E. (1985). Neural dynamics of form perception: Boundary completion, illusory figures, and neon color spreading. Psychological Review, 92(2), 173211.CrossRefGoogle ScholarPubMed
Grossberg, S., & Mingolla, E. (1987). Neural dynamics of perceptual grouping: Textures, boundaries, and emergent segmentations. In The adaptive brain II (pp. 143210). Elsevier.CrossRefGoogle Scholar
Guest, O., & Martin, A. E. (2023). On logical inference over brains, behaviour, and artificial neural networks. Computational Brain & Behavior, 6, 213227.CrossRefGoogle Scholar
Gulordava, K., Bojanowski, P., Grave, E., Linzen, T., & Baroni, M. (2018, June). Colorless green recurrent networks dream hierarchically. In Proceedings of NAACL 2018 (pp. 1195–1205). New Orleans, Louisiana: ACL.CrossRefGoogle Scholar
Hacker, C., & Biederman, I. (2018). The invariance of recognition to the stretching of faces is not explained by familiarity or warping to an average face. arXiv preprint, 123. https://doi.org/10.31234/osf.io/e5hgxGoogle Scholar
Hannagan, T., Agrawal, A., Cohen, L., & Dehaene, S. (2021). Emergence of a compositional neural code for written words: Recycling of a convolutional neural network for reading. Proceedings of the National Academy of Sciences of the United States of America, 118(46), e210477911.Google ScholarPubMed
Heinke, D., Wachman, P., van Zoest, W., & Leek, E. C. (2021). A failure to learn object shape geometry: Implications for convolutional neural networks as plausible models of biological vision. Vision Research, 189, 8192.CrossRefGoogle ScholarPubMed
Hermann, K., Chen, T., & Kornblith, S. (2020). The origins and prevalence of texture bias in convolutional neural networks. Advances in Neural Information Processing Systems, 33, 1900019015.Google Scholar
Hochberg, J., & Brooks, V. (1962). Pictorial recognition as an unlearned ability: A study of one child's performance. The American Journal of Psychology, 75(4), 624628.CrossRefGoogle ScholarPubMed
Holyoak, K. J., & Hummel, J. E. (2000). The proper treatment of symbols in a connectionist architecture. In Dietrich, E. & Markman, A. (Eds.), Cognitive dynamics: Conceptual change in humans and machines (pp. 229264). MIT Press.Google Scholar
Huber, L. S., Geirhos, R., & Wichmann, F. A. (2022). The developmental trajectory of object recognition robustness: Children are like small adults but unlike big deep neural networks. arXiv preprint arXiv:2205.10144, 132.Google Scholar
Hummel, J. E. (2000). Where view-based theories break down: The role of structure in shape perception and object recognition. In Deitrich, E. & Markman, A. (Eds.), Cognitive dynamics: Conceptual change in humans and machines (pp. 157185). Erlbaum.Google Scholar
Hummel, J. E. (2013). Object recognition. In Reisburg, D. (Ed.), Oxford handbook of cognitive psychology (pp. 3246). Oxford University Press.Google Scholar
Hummel, J. E., & Biederman, I. (1992). Dynamic binding in a neural network for shape recognition. Psychological Review, 99, 480517. https://doi.org/10.1037/0033-295X.99.3.480CrossRefGoogle Scholar
Hummel, J. E., & Stankiewicz, B. J. (1996). Categorical relations in shape perception. Spatial Vision, 10(3), 201236.Google ScholarPubMed
Izhikevich, E. M. (2004). Which model to use for cortical spiking neurons? IEEE Transactions on Neural Networks 15(5), 10631070. https://doi.org/10.1109/TNN.2004.832719CrossRefGoogle ScholarPubMed
Jacob, G., Pramod, R. T., Katti, H., & Arun, S. P. (2021). Qualitative similarities and differences in visual object representations between brains and deep networks. Nature Communications, 12(1), 114.CrossRefGoogle ScholarPubMed
Kell, A. J. E., Yamins, D. L. K., Shook, E. N., Norman-Haignere, S. V., & McDermott, J. H. (2018). A task-optimized neural network replicates human auditory behavior, predicts brain responses, and reveals a cortical processing hierarchy. Neuron, 98(3), 630644.e16.CrossRefGoogle ScholarPubMed
Khaligh-Razavi, S. M., & Kriegeskorte, N. (2014). Deep supervised, but not unsupervised, models may explain IT cortical representation. PLoS Computational Biology, 10(11), e1003915.CrossRefGoogle Scholar
Kheradpisheh, S. R., Ghodrati, M., Ganjtabesh, M., & Masquelier, T. (2016). Deep networks can resemble human feed-forward vision in invariant object recognition. Scientific Reports, 6(1), 124.CrossRefGoogle ScholarPubMed
Kiani, R., Esteky, H., Mirpour, K., & Tanaka, K. (2007). Object category structure in response patterns of neuronal population in monkey inferior temporal cortex. Journal of Neurophysiology, 97, 42964309. doi:10.1152/jn.00024.2007CrossRefGoogle ScholarPubMed
Kiat, J. E., Luck, S. J., Beckner, A. G., Hayes, T. R., Pomaranski, K. I., Henderson, J. M., & Oakes, L. M. (2022). Linking patterns of infant eye movements to a neural network model of the ventral stream using representational similarity analysis. Developmental Science, 25, e13155. https://doi.org/10.1111/desc.13155CrossRefGoogle ScholarPubMed
Kim, B., Reif, E., Wattenberg, M., Bengio, S., & Mozer, M. C. (2021). Neural networks trained on natural scenes exhibit Gestalt closure. Computational Brain & Behavior, 4, 251263.CrossRefGoogle Scholar
Krauskopf, J. (1963). Effect of retinal image stabilization of the appearance of heterochromatic targets. Journal of the Optical Society of America, 53, 741744.CrossRefGoogle ScholarPubMed
Kriegeskorte, N. (2015). Deep neural networks: A new framework for modeling biological vision and brain information processing. Annual Review of Vision Science, 1, 417446.CrossRefGoogle ScholarPubMed
Kriegeskorte, N., Mur, M., & Bandettini, P. A. (2008a). Representational similarity analysis – Connecting the branches of systems neuroscience. Frontiers in Systems Neuroscience, 2(4), 128.Google ScholarPubMed
Kriegeskorte, N., Mur, M., Ruff, D. A., Kiani, R., Bodurka, J., Esteky, H., … Bandettini, P. A. (2008b). Matching categorical object representations in inferior temporal cortex of man and monkey. Neuron, 60(6), 11261141.CrossRefGoogle ScholarPubMed
Kriegeskorte, N., & Wei, X. X. (2021). Neural tuning and representational geometry. Nature Reviews Neuroscience, 22, 703718. https://doi.org/10.1038/s41583-021-00502-3CrossRefGoogle ScholarPubMed
Kubilius, J., Bracci, S., & Op de Beeck, H. P. (2016). Deep neural networks as a computational model for human shape sensitivity. PLoS Computational Biology, 12(4), e1004896.CrossRefGoogle ScholarPubMed
Kubilius, J., Schrimpf, M., Kar, K., Hong, H., Majaj, N. J., Rajalingham, R., … DiCarlo, J. J. (2019). Brain-like object recognition with high-performing shallow recurrent ANNs. Advances in Neural Information Processing Systems, 32, 112.Google Scholar
Lehky, S. R., & Sejnowski, T. J. (1988). Network model of shape-from-shading: Neural function arises from both receptive and projective fields. Nature, 333(6172), 452454.CrossRefGoogle ScholarPubMed
Lehman, J., & Stanley, K. O. (2011). Abandoning objectives: Evolution through the search for novelty alone. Evolutionary Computation, 19(2), 189223.CrossRefGoogle ScholarPubMed
Lissauer, S. H. (1890). Ein Fall von Seelenblindheit nebst einem Beitrage zur Theorie derselben. Archiv für Psychiatrie und Nervenkrankheiten, 21(2), 222270.CrossRefGoogle Scholar
Livingstone, M., & Hubel, D. (1988). Segregation of form, color, movement, and depth: Anatomy, physiology, and perception. Science, 240(4853), 740749.CrossRefGoogle ScholarPubMed
Lonnqvist, B., Bornet, A., Doerig, A., & Herzog, M. H. (2021). A comparative biology approach to DNN modeling of vision: A focus on differences, not similarities. Journal of Vision, 21(10), 1717. https://doi.org/10.1167/jov.21.10.17CrossRefGoogle Scholar
Lotter, W., Kreiman, G., & Cox, D. (2020). A neural network trained for prediction mimics diverse features of biological neurons and perception. Nature Machine Intelligence, 2(4), 210219.CrossRefGoogle ScholarPubMed
Mack, M. L., Gauthier, I., Sadr, J., & Palmeri, T. J. (2008). Object detection and basic-level categorization: Sometimes you know it is there before you know what it is. Psychonomic Bulletin & Review, 15(1), 2835.CrossRefGoogle ScholarPubMed
Macpherson, T., Churchland, A., Sejnowski, T., DiCarlo, J., Kamitani, Y., Takahashi, H., & Hikida, T. (2021). Natural and artificial intelligence: A brief introduction to the interplay between AI and neuroscience research. Neural Networks, 144, 603613.CrossRefGoogle Scholar
Majaj, N. J., Hong, H., Solomon, E. A., & DiCarlo, J. J. (2015). Simple learned weighted sums of inferior temporal neuronal firing rates accurately predict human core object recognition performance. Journal of Neuroscience, 35(39), 1340213418.CrossRefGoogle ScholarPubMed
Malhotra, G., Dujmovic, M., & Bowers, J. S. (2022). Feature blindness: A challenge for understanding and modelling visual object recognition. PLoS Computational Biology, 18, e1009572. https://doi.org/10.1101/2021.10.20.465074CrossRefGoogle ScholarPubMed
Malhotra, G., Dujmovic, M., Hummel, J., & Bowers, J. S. (2021). The contrasting shape representations that support object recognition in humans and CNNs. arXiv preprint, 151. https://doi.org/10.1101/2021.12.14.472546Google Scholar
Malhotra, G., Evans, B. D., & Bowers, J. S. (2020). Hiding a plane with a pixel: Examining shape-bias in CNNs and the benefit of building in biological constraints. Vision Research, 174, 5768.CrossRefGoogle ScholarPubMed
Marcus, G. (2009). Kluge: The haphazard evolution of the human mind. Houghton Mifflin Harcourt.Google Scholar
Marcus, G. F. (1998). Rethinking eliminative connectionism. Cognitive Psychology, 37(3), 243282.CrossRefGoogle ScholarPubMed
Marr, D. (1982). Vision: A computational investigation into the human representation and processing of visual information. Henry Holt, 2(4.2).Google Scholar
Mayo, D. G. (2018). Statistical inference as severe testing. Cambridge University Press.CrossRefGoogle Scholar
McClelland, J. L., Rumelhart, D. E., & PDP Research Group. (1986). Parallel distributed processing (Vol. 2). MIT Press.Google Scholar
Mehrer, J., Spoerer, C. J., Jones, E. C., Kriegeskorte, N., & Kietzmann, T. C. (2021). An ecologically motivated image dataset for deep learning yields better models of human vision. Proceedings of the National Academy of Sciences of the United States of America, 118(8), e2011417118.CrossRefGoogle ScholarPubMed
Mehrer, J., Spoerer, C. J., Kriegeskorte, N., & Kietzmann, T. C. (2020). Individual differences among deep neural network models. Nature Communications, 11(1), 112.CrossRefGoogle ScholarPubMed
Messina, N., Amato, G., Carrara, F., Gennaro, C., & Falchi, F. (2021). Solving the same-different task with convolutional neural networks. Pattern Recognition Letters, 143, 7580.CrossRefGoogle Scholar
Millet, J., Caucheteux, C., Boubenec, Y., Gramfort, A., Dunbar, E., Pallier, C., & King, J. R. (2022). Toward a realistic model of speech processing in the brain with self-supervised learning. Advances in Neural Information Processing Systems, 35, 3342833443.Google Scholar
Miozzo, M., & Caramazza, A. (1998). Varieties of pure alexia: The case of failure to access graphemic representations. Cognitive Neuropsychology, 15(1–2), 203238.CrossRefGoogle ScholarPubMed
Mitchell, J., & Bowers, J. (2020, December). Priorless recurrent networks learn curiously. In Proceedings of the 28th international conference on computational linguistics (pp. 5147–5158).CrossRefGoogle Scholar
Mitchell, J., & Bowers, J. S. (2021). Generalisation in neural networks does not require feature overlap. arXiv preprint arXiv:2107.06872, 119.Google ScholarPubMed
Mnih, V., Heess, N., & Graves, A. (2014). Recurrent models of visual attention. In Advances in neural information processing systems (pp. 22042212).Google Scholar
Montero, M. L., Bowers, J. S., Ludwig, C. J., Costa, R. P., & Malhotra, G. (2022). Lost in latent space: Disentangled models and the challenge of combinatorial generalisation. arXiv preprint, 127. http://arxiv.org/abs/2204.02283Google Scholar
Montero, M. L., Ludwig, C. J., Costa, R. P., Malhotra, G., & Bowers, J. (2021). The role of disentanglement in generalisation. In International conference on learning representations. https://openreview.net/forum?id=qbH974jKUVyGoogle Scholar
Nakayama, K., Shimojo, S., & Silverman, G. H. (1989). Stereoscopic depth: Its relation to image segmentation, grouping, and the recognition of occluded objects. Perception, 18, 5568.CrossRefGoogle ScholarPubMed
Nguyen, A., Yosinski, J., & Clune, J. (2015). Deep neural networks are easily fooled: High confidence predictions for unrecognizable images. In Proceedings of the IEEE conference on computer vision and pattern recognition, Boston, USA (pp. 427–436).CrossRefGoogle Scholar
Palmer, S. E. (1999). Color, consciousness, and the isomorphism constraint. Behavioral and Brain Sciences, 22(6), 923943.CrossRefGoogle ScholarPubMed
Palmer, S. E. (2003). Visual perception of objects. In Healy, A. F. & Proctor, R. W. (Eds.), Handbook of psychology: Experimental psychology (Vol. 4, pp. 177211). John Wiley & Sons Inc.CrossRefGoogle Scholar
Pang, Z., O'May, C. B., Choksi, B., & VanRullen, R. (2021). Predictive coding feedback results in perceived illusory contours in a recurrent neural network. Neural Networks, 144, 164175.CrossRefGoogle Scholar
Pater, J. (2019). Generative linguistics and neural networks at 60: Foundation, friction, and fusion. Language, 95(1), e41e74.CrossRefGoogle Scholar
Pepperberg, I. M., & Nakayama, K. (2016). Robust representation of shape in a grey parrot (Psittacus erithacus). Cognition, 153, 146160.CrossRefGoogle Scholar
Pessoa, L., Thompson, E., & Noë, A. (1998). Finding out about filling-in: A guide to perceptual completion for visual science and the philosophy of perception. Behavioral and Brain Sciences, 21(6), 723748.CrossRefGoogle ScholarPubMed
Peters, B., & Kriegeskorte, N. (2021). Capturing the objects of vision with neural networks. Nature Human Behaviour, 5, 11271144.CrossRefGoogle ScholarPubMed
Peterson, J. C., Abbott, J. T., & Griffiths, T. L. (2018). Evaluating (and improving) the correspondence between deep neural networks and human representations. Cognitive Science, 42(8), 26482669.CrossRefGoogle ScholarPubMed
Pinker, S., & Prince, A. (1988). On language and connectionism: Analysis of a parallel distributed processing model of language acquisition. Cognition, 28(1–2), 73193.CrossRefGoogle ScholarPubMed
Pomerantz, J. R., & Portillo, M. C. (2011). Grouping and emergent features in vision: Toward a theory of basic Gestalts. Journal of Experimental Psychology: Human Perception and Performance, 10(37), 13311349. doi:10.1037/A0024330Google Scholar
Puebla, G., & Bowers, J. S. (2022). Can deep convolutional neural networks support relational reasoning in the same-different task? Journal of Vision, 22(10), 1111.CrossRefGoogle ScholarPubMed
Pylyshyn, Z. W., & Storm, R. W. (1988). Tracking multiple independent targets: Evidence for a parallel tracking mechanism. Spatial Vision, 3, 179197.CrossRefGoogle ScholarPubMed
Raizada, R., & Grossberg, S. (2001). Context-sensitive bindings by the laminar circuits of V1 and V2: A unified model of perceptual grouping, attention, and orientation contrast. Visual Cognition, 8, 431466.CrossRefGoogle Scholar
Rajalingham, R., Issa, E. B., Bashivan, P., Kar, K., Schmidt, K., & DiCarlo, J. J. (2018). Large-scale, high-resolution comparison of the core visual object recognition behavior of humans, monkeys, and state-of-the-art deep artificial neural networks. Journal of Neuroscience, 38(33), 72557269.CrossRefGoogle ScholarPubMed
Rajalingham, R., Schmidt, K., & DiCarlo, J. J. (2015). Comparison of object recognition behavior in human and monkey. Journal of Neuroscience, 35(35), 1212712136.CrossRefGoogle ScholarPubMed
Ramachandran, V. S. (1988). Perception of shape from shading. Nature, 331(6152), 163166.CrossRefGoogle ScholarPubMed
Ramachandran, V. S. (1992). Filling in gaps in perception: Part I. Current Directions in Psychological Science, 1(6), 199205.CrossRefGoogle Scholar
Ratan Murty, N. A., Bashivan, P., Abate, A., DiCarlo, J. J., & Kanwisher, N. (2021). Computational models of category-selective brain regions enable high-throughput tests of selectivity. Nature Communications, 12(1), 114.CrossRefGoogle ScholarPubMed
Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You only look once: Unified, real-time object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition, Las Vegas, USA (pp. 779–788).CrossRefGoogle Scholar
Richards, B. A., Lillicrap, T. P., Beaudoin, P., Bengio, Y., Bogacz, R., Christensen, A., … Kording, K. P. (2019). A deep learning framework for neuroscience. Nature Neuroscience, 22(11), 17611770.CrossRefGoogle ScholarPubMed
Ritter, S., Barrett, D. G., Santoro, A., & Botvinick, M. M. (2017, July). Cognitive psychology for deep neural networks: A shape bias case study. In International conference on machine learning, Sydney, Australia (pp. 2940–2949).Google Scholar
Rosenfeld, A., Zemel, R., & Tsotsos, J. K. (2018). The elephant in the room. arXiv preprint arXiv:1808.03305, 112.Google Scholar
Saarela, T. P., Sayim, B., Westheimer, G., & Herzog, M. H. (2009). Global stimulus configuration modulates crowding. Journal of Vision, 9(2), 5, 1–11.CrossRefGoogle ScholarPubMed
Sabour, S., Frosst, N., & Hinton, G. E. (2017). Dynamic routing between capsules. Advances in Neural Information Processing Systems, 30, 38563866.Google Scholar
Santoro, A., Raposo, D., Barrett, D. G., Malinowski, M., Pascanu, R., Battaglia, P., & Lillicrap, T. (2017). A simple neural network module for relational reasoning. Advances in Neural Information Processing Systems, 30, 49674976.Google Scholar
Schaeffer, R., Khona, M., & Fiete, I. (2022). No free lunch from deep learning in neuroscience: A case study through models of the entorhinal–hippocampal circuit. Advances in Neural Information Processing Systems, 35, 1605216067.Google Scholar
Schott, L., von Kügelgen, J., Träuble, F., Gehler, P., Russell, C., Bethge, M., … Brendel, W. (2021). Visual representation learning does not generalize strongly within the same domain. arXiv preprint arXiv:2107.08221, 134.Google ScholarPubMed
Schrimpf, M., Blank, I. A., Tuckute, G., Kauf, C., Hosseini, E. A., Kanwisher, N., … Fedorenko, E. (2021). The neural architecture of language: Integrative modeling converges on predictive processing. Proceedings of the National Academy of Sciences of the United States of America, 118(45), e2105646118.CrossRefGoogle ScholarPubMed
Schrimpf, M., Kubilius, J., Hong, H., Majaj, N. J., Rajalingham, R., Issa, E. B., … DiCarlo, J. J. (2020a). Brain-Score: Which artificial neural network for object recognition is most brain-like? arXiv preprint, 19. https://doi.org/10.1101/407007Google Scholar
Schrimpf, M., Kubilius, J., Lee, M. J., Murty, N. A. R., Ajemian, R., & DiCarlo, J. J. (2020b). Integrative benchmarking to advance neurally mechanistic models of human intelligence. Neuron, 11, 413423.CrossRefGoogle Scholar
Shah, H., Tamuly, K., Raghunathan, A., Jain, P., & Netrapalli, P. (2020). The pitfalls of simplicity bias in neural networks. Advances in Neural Information Processing Systems, 33, 95739585.Google Scholar
Simons, D. J., & Levin, D. T. (1997). Change blindness. Trends in Cognitive Sciences, 1(7), 261267.CrossRefGoogle ScholarPubMed
Sperling, G. (1960). The information available in brief visual presentations. Psychological Monographs, 74, 129.CrossRefGoogle Scholar
Stanley, K. O., Clune, J., Lehman, J., & Miikkulainen, R. (2019). Designing neural networks through neuroevolution. Nature Machine Intelligence, 1(1), 2435.CrossRefGoogle Scholar
Storrs, K. R., Kietzmann, T. C., Walther, A., Mehrer, J., & Kriegeskorte, N. (2021). Diverse deep neural networks all predict human inferior temporal cortex well, after training and fitting. Journal of Cognitive Neuroscience, 33(10), 20442064.Google ScholarPubMed
Treisman, A., & Schmidt, H. (1982). Illusory conjunctions in the perception of objects. Cognitive Psychology, 14(1), 107141.CrossRefGoogle ScholarPubMed
Treisman, A. M., & Gelade, G. (1980). A feature-integration theory of attention. Cognitive Psychology, 12(1), 97136.CrossRefGoogle ScholarPubMed
Truzzi, A., & Cusack, R. (2020, April). Convolutional neural networks as a model of visual activity in the brain: Greater contribution of architecture than learned weights. Bridging AI and Cognitive Science. In International conference on learning representations. https://baicsworkshop.github.io/pdf/BAICS_13.pdfGoogle Scholar
Tsvetkov, C., Malhotra, G., Evans, B., & Bowers, J. (2020). Adding biological constraints to deep neural networks reduces their capacity to learn unstructured data. In Proceedings of the 42nd annual conference of the Cognitive Science Society 2020, Toronto, Canada.Google Scholar
Tsvetkov, C., Malhotra, G., Evans, B. D., & Bowers, J. S. (2023). The role of capacity constraints in convolutional neural networks for learning random versus natural data. Neural Networks, 161, 515524. https://doi.org/10.1101/2022.03.31.486580CrossRefGoogle ScholarPubMed
Tuli, S., Dasgupta, I., Grant, E., & Griffiths, T. L. (2021). Are convolutional neural networks or transformers more like human vision? arXiv preprint arXiv:2105.07197, 17.Google Scholar
Ullman, S. (1979). The interpretation of structure from motion. Proceedings of the Royal Society of London. Series B. Biological Sciences, 203(1153), 405426.Google ScholarPubMed
Ullman, S., & Basri, R. (1991). Recognition by linear combination of models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 13(10), 9921006.CrossRefGoogle Scholar
Vaina, L. M., Makris, N., Kennedy, D., & Cowey, A. (1988). The selective impairment of the perception of first-order motion by unilateral cortical brain damage. Visual Neuroscience, 15, 333348.CrossRefGoogle Scholar
Vankov, I. I., & Bowers, J. S. (2020). Training neural networks to encode symbols enables combinatorial generalization. Philosophical Transactions of the Royal Society B, 375(1791), 20190309.CrossRefGoogle ScholarPubMed
Wagemans, J., Elder, J. H., Kubovy, M., Palmer, S. E., Peterson, M. A., Singh, M., & von der Heydt, R. (2012). A century of Gestalt psychology in visual perception: I. Perceptual grouping and figure-ground organization. Psychological Bulletin, 138(6), 11721217.CrossRefGoogle ScholarPubMed
Wang, J., Zhang, Z., Xie, C., Zhou, Y., Premachandran, V., Zhu, J., … Yuille, A. (2018). Visual concepts and compositional voting. Annals of Mathematical Sciences and Applications, 2(3), 4.Google Scholar
Wang, R., Lehman, J., Rawal, A., Zhi, J., Li, Y., Clune, J., & Stanley, K. (2020, November). Enhanced POET: Open-ended reinforcement learning through unbounded invention of learning challenges and their solutions. In International conference on machine learning (pp. 9940–9951).CrossRefGoogle Scholar
Webb, T. W., Sinha, I., & Cohen, J. D. (2021). Emergent symbols through binding in external memory. arXiv, 128. https://doi.org/10.48550/arXiv.2012.14601Google Scholar
Weerts, L., Rosen, S., Clopath, C., & Goodman, D. F. (2021). The psychometrics of automatic speech recognition. bioRxiv, 2021-04.Google Scholar
Wertheimer, M. (1912). Experimentelle Studien über das Sehen von Bewegung. Zeitschrift für Psychologie, 61, 161265 (in German).Google Scholar
Wolfe, J. M. (1994). Guided search 2.0 a revised model of visual search. Psychonomic Bulletin & Review, 1(2), 202238.CrossRefGoogle ScholarPubMed
Wolfe, J. M., Cave, K. R., & Franzel, S. L. (1989). Guided search: An alternative to the feature integration model for visual search. Journal of Experimental Psychology: Human Perception and Performance, 15(3), 419433.Google Scholar
Wong, Y. K., Twedt, E., Sheinberg, D., & Gauthier, I. (2010). Does Thompson's Thatcher effect reflect a face-specific mechanism? Perception, 39(8), 11251141.CrossRefGoogle ScholarPubMed
Woolley, B. G., & Stanley, K. O. (2011, July). On the deleterious effects of a priori objectives on evolution and representation. In Proceedings of the 13th annual conference on genetic and evolutionary computation, Dublin, Ireland (pp. 957–964).CrossRefGoogle Scholar
Xu, K., Ba, J., Kiros, R., Cho, K., Courville, A., Salakhudinov, R., … Bengio, Y. (2015, June). Show, attend and tell: Neural image caption generation with visual attention. In International conference on machine learning, Lille, France (pp. 2048–2057). PMLR.Google Scholar
Xu, Y., & Vaziri-Pashkam, M. (2021). Limits to visual representational correspondence between convolutional neural networks and the human brain. Nature Communications, 12(1), 116.Google ScholarPubMed
Yamins, D. L., Hong, H., Cadieu, C. F., Solomon, E. A., Seibert, D., & DiCarlo, J. J. (2014). Performance-optimized hierarchical models predict neural responses in higher visual cortex. Proceedings of the National Academy of Sciences of the United States of America, 111(23), 86198624.CrossRefGoogle ScholarPubMed
Young, T. (1802). Bakerian lecture: On the theory of light and colours. Philosophical Transactions of the Royal Society London, 92, 1248. doi:10.1098/rstl.1802.0004Google Scholar
Zador, A. M. (2019). A critique of pure learning and what artificial neural networks can learn from animal brains. Nature Communications, 10(1), 17.CrossRefGoogle ScholarPubMed
Zhang, C., Bengio, S., Hardt, M., Recht, B., & Vinyals, O. (2017). Understanding deep learning requires rethinking generalization. In 5th international conference on learning representations, Toulon, France, April 24–26.Google Scholar
Zhang, R. (2019, May). Making convolutional networks shift-invariant again. In International conference on machine learning, Long Beach, CA, USA (pp. 7324–7334). Proceedings of Machine Learning Research.Google Scholar
Zhao, Z. Q., Zheng, P., Xu, S. T., & Wu, X. (2019). Object detection with deep learning: A review. IEEE Transactions on Neural Networks and Learning Systems, 30, 32123232.CrossRefGoogle ScholarPubMed
Zhou, Z., & Firestone, C. (2019). Humans can decipher adversarial images. Nature Communications, 10(1), 19.Google ScholarPubMed
Zhu, H., Tang, P., Park, J., Park, S., & Yuille, A. (2019). Robustness of object recognition under extreme occlusion in humans and computational models. arXiv preprint arXiv:1905.04598, 17.Google Scholar
Zhuang, C., Yan, S., Nayebi, A., Schrimpf, M., Frank, M. C., DiCarlo, J. J., & Yamins, D. L. (2021). Unsupervised neural network models of the ventral visual stream. Proceedings of the National Academy of Sciences of the United States of America, 118(3), e2014196118.CrossRefGoogle ScholarPubMed