Hostname: page-component-745bb68f8f-b95js Total loading time: 0 Render date: 2025-01-26T00:05:32.180Z Has data issue: false hasContentIssue false

Clarifying status of DNNs as models of human vision

Published online by Cambridge University Press:  06 December 2023

Jeffrey S. Bowers
Affiliation:
School of Psychological Science, University of Bristol, Bristol, UK j.bowers@bristol.ac.uk; https://jeffbowers.blogs.bristol.ac.uk/ gaurav.malhotra@bristol.ac.uk marin.dujmovic@bristol.ac.uk m.lleramontero@bristol.ac.uk christian.tsvetkov@bristol.ac.uk valerio.biscione@gmail.com
Gaurav Malhotra
Affiliation:
School of Psychological Science, University of Bristol, Bristol, UK j.bowers@bristol.ac.uk; https://jeffbowers.blogs.bristol.ac.uk/ gaurav.malhotra@bristol.ac.uk marin.dujmovic@bristol.ac.uk m.lleramontero@bristol.ac.uk christian.tsvetkov@bristol.ac.uk valerio.biscione@gmail.com
Marin Dujmović
Affiliation:
School of Psychological Science, University of Bristol, Bristol, UK j.bowers@bristol.ac.uk; https://jeffbowers.blogs.bristol.ac.uk/ gaurav.malhotra@bristol.ac.uk marin.dujmovic@bristol.ac.uk m.lleramontero@bristol.ac.uk christian.tsvetkov@bristol.ac.uk valerio.biscione@gmail.com
Milton L. Montero
Affiliation:
School of Psychological Science, University of Bristol, Bristol, UK j.bowers@bristol.ac.uk; https://jeffbowers.blogs.bristol.ac.uk/ gaurav.malhotra@bristol.ac.uk marin.dujmovic@bristol.ac.uk m.lleramontero@bristol.ac.uk christian.tsvetkov@bristol.ac.uk valerio.biscione@gmail.com
Christian Tsvetkov
Affiliation:
School of Psychological Science, University of Bristol, Bristol, UK j.bowers@bristol.ac.uk; https://jeffbowers.blogs.bristol.ac.uk/ gaurav.malhotra@bristol.ac.uk marin.dujmovic@bristol.ac.uk m.lleramontero@bristol.ac.uk christian.tsvetkov@bristol.ac.uk valerio.biscione@gmail.com
Valerio Biscione
Affiliation:
School of Psychological Science, University of Bristol, Bristol, UK j.bowers@bristol.ac.uk; https://jeffbowers.blogs.bristol.ac.uk/ gaurav.malhotra@bristol.ac.uk marin.dujmovic@bristol.ac.uk m.lleramontero@bristol.ac.uk christian.tsvetkov@bristol.ac.uk valerio.biscione@gmail.com
Guillermo Puebla
Affiliation:
National Center for Artificial Intelligence, Macul, Chile guillermo.puebla@bristol.ac.uk
Federico Adolfi
Affiliation:
Ernst Strüngmann Institute (ESI) for Neuroscience in Cooperation with Max Planck Society, Frankfurt am Main, Germany fedeadolfi@gmail.com
John E. Hummel
Affiliation:
Psychology Department, University of Illinois Urbana–Champaign, Champaign, IL, USA jehummel@illinois.edu rmflood2@illinois.edu
Rachel F. Heaton
Affiliation:
Psychology Department, University of Illinois Urbana–Champaign, Champaign, IL, USA jehummel@illinois.edu rmflood2@illinois.edu
Benjamin D. Evans
Affiliation:
Department of Informatics, School of Engineering and Informatics, University of Sussex, Brighton, UK b.d.evans@sussex.ac.uk j.mitchell@napier.ac.uk
Jeffrey Mitchell
Affiliation:
Department of Informatics, School of Engineering and Informatics, University of Sussex, Brighton, UK b.d.evans@sussex.ac.uk j.mitchell@napier.ac.uk
Ryan Blything
Affiliation:
School of Psychology, Aston University, Birmingham, UK r.blything@aston.ac.uk

Abstract

On several key issues we agree with the commentators. Perhaps most importantly, everyone seems to agree that psychology has an important role to play in building better models of human vision, and (most) everyone agrees (including us) that deep neural networks (DNNs) will play an important role in modelling human vision going forward. But there are also disagreements about what models are for, how DNN–human correspondences should be evaluated, the value of alternative modelling approaches, and impact of marketing hype in the literature. In our view, these latter issues are contributing to many unjustified claims regarding DNN–human correspondences in vision and other domains of cognition. We explore all these issues in this response.

Type
Authors' Response
Copyright
Copyright © The Author(s), 2023. Published by Cambridge University Press

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Anciukevicius, T., Fox-Roberts, P., Rosten, E., & Henderson, P. (2022). Unsupervised causal generative understanding of images. Advances in Neural Information Processing Systems, 35, 3703737054.Google Scholar
Baker, N., & Elder, J. H. (2022). Deep learning models fail to capture the configural nature of human shape perception. iScience, 25(9), 104913.CrossRefGoogle ScholarPubMed
Baker, N., Garrigan, P., & Kellman, P. J. (2021). Constant curvature segments as building blocks of 2D shape representation. Journal of Experimental Psychology: General, 150(8), 15561580.CrossRefGoogle ScholarPubMed
Biederman, I. (1972). Perceiving real-world scenes. Science (New York, N.Y.), 177, 7780.CrossRefGoogle ScholarPubMed
Biederman, I. (1987). Recognition-by-components: A theory of human image understanding. Psychological Review, 94, 115147.CrossRefGoogle ScholarPubMed
Biscione, V., & Bowers, J. S. (2022). Learning online visual invariances for novel objects via supervised and self-supervised training. Neural Networks, 150, 222236.CrossRefGoogle ScholarPubMed
Biscione, V., Yin, D., Malhotra, G., Dujmović, M., Montero, M., Puebla, G., … Bowers, J. S, . (2023). Introducing the MindSet benchmark for comparing DNNs to human vision. PsyArXiv. https://doi.org/10.31234/osf.io/cneypGoogle Scholar
Bowers, J. S. (2022). Researchers comparing DNNs to brains need to adopt standard Methods of Science. Invited workshop talk at Neural Information Processing Systems, New Orleans.Google Scholar
Bowers, J. S., Malhotra, G., Adolfi, F. G., Dujmović, M., Montero, M. L., Biscione, V., … Heaton, R. F. (2023). On the importance of severely testing deep learning models of cognition. PsyArXiv, 134. https://doi.org/10.31234/osf.io/wzns2Google Scholar
Carpenter, G. A., & Grossberg, S. (1987). A massively parallel architecture for a self-organizing neural pattern recognition machine. Computer Vision, Graphics, and Image Processing, 37, 54115.CrossRefGoogle Scholar
Caucheteux, C., Gramfort, A., & King, J. R. (2022). Deep language algorithms predict semantic comprehension from brain activity. Scientific Reports, 12, 110.CrossRefGoogle ScholarPubMed
Cavanagh, P., Hénaff, M. A., Michel, F., Landis, T., Troscianko, T., & Intriligator, J. (1998). Complete sparing of high-contrast color input to motion perception in cortical color blindness. Nature Neuroscience, 1, 242247.CrossRefGoogle ScholarPubMed
Cowan, N. (2001). The magical number 4 in short-term memory: A reconsideration of mental storage capacity. Behavioral and Brain Sciences, 24, 87114.CrossRefGoogle ScholarPubMed
Dagaev, N., Roads, B. D., Luo, X., Barry, D. N., Patil, K. R., & Love, B. C. (2023). A too-good-to-be-true prior to reduce shortcut reliance. Pattern Recognition Letters, 166, 164171.CrossRefGoogle ScholarPubMed
Da Silva, L. E. B., Elnabarawy, I., & Wunsch, D. C. II. (2019). A survey of adaptive resonance theory neural network models for engineering applications. Neural Networks, 120, 167203.CrossRefGoogle Scholar
de Vries, J. P., Akbarinia, A., Flachot, A., & Gegenfurtner, K. R. (2022). Emergent color categorization in a neural network trained for object recognition. eLife, 11, e76472. https://doi.org/10.7554/eLife.76472CrossRefGoogle Scholar
Doerig, A., Sommers, R. P., Seeliger, K., Richards, B., Ismael, J., Lindsay, G. W., … Kietzmann, T. C. (2023). The neuroconnectionist research programme. Nature Reviews Neuroscience, 24, 431450. https://doi.org/10.1038/s41583-023-00705-wCrossRefGoogle ScholarPubMed
Duan, S., Matthey, L., Saraiva, A., Watters, N., Burgess, C. P., Lerchner, A., & Higgins, I. (2020). Unsupervised model selection for variational disentangled representation learning. In Proceedings of the 8th international conference on learning representations. https://openreview.net/forum? id=SyxL2TNtvrGoogle Scholar
Dujmović, M., Bowers, J. S., Adolfi, F., & Malhotra, G. (2023). Obstacles to inferring mechanistic similarity using Representational Similarity Analysis. bioRxiv. https://doi.org/10.1101/2022.04.05.487135Google Scholar
Evans, B. D., Malhotra, G., & Bowers, J. S. (2022). Biological convolutions improve DNN robustness to noise and generalisation. Neural Networks, 148, 96110.CrossRefGoogle ScholarPubMed
Francis, G., Manassi, M., & Herzog, M. H. (2017). Neural dynamics of grouping and segmentation explain properties of visual crowding. Psychological Review, 124, 483504.CrossRefGoogle ScholarPubMed
Fukushima, K. (1980). Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biological Cybernetics, 36, 193202.CrossRefGoogle ScholarPubMed
Garg, A. K., Li, P., Rashid, M. S., & Callaway, E. M. (2019). Color and orientation are jointly coded and spatially organized in primate primary visual cortex. Science (New York, N.Y.), 364(6447), 12751279. https://doi.org/10.1126/science.aaw5868CrossRefGoogle ScholarPubMed
Geirhos, R., Jacobsen, J. H., Michaelis, C., Zemel, R., Brendel, W., Bethge, M., & Wichmann, F. A. (2020). Shortcut learning in deep neural networks. Nature Machine Intelligence, 2, 665673.CrossRefGoogle Scholar
George, D., Lazaro-Gredilla, M., Lehrach, W., Dedieu, A., & Zhou, G. (2020). A detailed mathematical theory of thalamic and cortical microcircuits based on inference in a generative vision model. bioRxiv, 2020-09.CrossRefGoogle Scholar
George, D., Lehrach, W., Kansky, K., Lázaro-Gredilla, M., Laan, C., Marthi, B., … Phoenix, D. S. (2017). A generative vision model that trains with high data efficiency and breaks text-based CAPTCHAs. Science (New York, N.Y.), 358(6368), eaag2612.CrossRefGoogle ScholarPubMed
Goodale, M. A., & Milner, A. D. (1992). Separate visual pathways for perception and action. Trends in Neurosciences, 15, 2025.CrossRefGoogle ScholarPubMed
Goodale, M. A., & Milner, A. D. (2023). Shape perception does not require dorsal stream processing. Trends in Cognitive Sciences, 27, 333334. https://doi.org/10.1016/j.tics.2022.12.007CrossRefGoogle ScholarPubMed
Grossberg, S. (1967). Nonlinear difference-differential equations in prediction and learning theory. Proceedings of the National Academy of Sciences of the United States of America, 58, 13291334.CrossRefGoogle ScholarPubMed
Grossberg, S. (1980). How does a brain build a cognitive code? Psychological Review, 87, 151.CrossRefGoogle ScholarPubMed
Grossberg, S. (2003). Filling-in the forms: Surface and boundary interactions in visual cortex. In Pessoa, L. & de Weerd, P. (Eds.), Filling-in (pp. 1337). Oxford University Press.CrossRefGoogle Scholar
Grossberg, S. (2014). How visual illusions illuminate complementary brain processes: Illusory depth from brightness and apparent motion of illusory contours. Frontiers in Human Neuroscience, 8, 854. https://doi.org/10.3389/fnhum.2014.00854CrossRefGoogle ScholarPubMed
Grossberg, S. (2021). Conscious mind, resonant brain: How each brain makes a mind. Oxford University Press.CrossRefGoogle Scholar
Hermann, K. L., Chen, T., & Kornblith, S. (2020). The origins and prevalence of texture bias in convolutional neural networks. Advances in Neural Information Processing Systems, 33, 1900019015.Google Scholar
Higgins, I., Matthey, L., Pal, A., Burgess, C., Glorot, X., Botvinick, M., … Lerchner, , A. (2017). Beta-vae: Learning basic visual concepts with a constrained variational framework. In 5th International conference on learning representations, Toulon, France.Google Scholar
Hinton, G. (1979). Some demonstrations of the effects of structural descriptions in mental imagery. Cognitive Science, 3, 231250.Google Scholar
Hinton, G. (2022). How to represent part-whole hierarchies in a neural network. Neural Computation, 35, 413452.CrossRefGoogle Scholar
Huang, K., Arehalli, S., Kugemoto, M., Muxica, C., Prasad, G., Dillon, B., & Linzen, T. (2023). Surprisal does not explain syntactic disambiguation difficulty: Evidence from a large-scale benchmark. PsyArXiv, 179. https://doi.org/10.31234/osf.io/z38u6Google Scholar
Hubel, D. H., & Wiesel, T. N. (1962). Receptive fields, binocular interaction and functional architecture in the cat's visual cortex. The Journal of Physiology, 160, 106152.CrossRefGoogle ScholarPubMed
Hummel, J. E. (2001). Complementary solutions to the binding problem in vision: Implications for shape perception and object recognition. Visual Cognition, 8, 489517.CrossRefGoogle Scholar
Hummel, J. E., & Biederman, I. (1992). Dynamic binding in a neural network for shape recognition. Psychological Review, 99, 480517. https://doi.org/10.1037/0033-295X.99.3.480CrossRefGoogle Scholar
Hummel, J. E., & Stankiewicz, B. J. (1996). An architecture for rapid, hierarchical structural description. In Inui, T. & McClelland, J. (Eds.), Attention and performance XVI: Information integration in perception and communication (pp. 93121). MIT Press.Google Scholar
Irwin, D. E. (1991). Information integration across saccadic eye movements. Cognitive Psychology, 23, 420456.CrossRefGoogle ScholarPubMed
Izhikevich, E. M. (2004). Which model to use for cortical spiking neurons? IEEE Transactions on Neural Networks, 15(5), 10631070. https://doi.org/10.1109/TNN.2004.832719CrossRefGoogle ScholarPubMed
Kim, B., Reif, E., Wattenberg, M., Bengio, S., & Mozer, M. C. (2021). Neural networks trained on natural scenes exhibit gestalt closure. Computational Brain & Behavior, 4, 251263.CrossRefGoogle Scholar
Kubilius, J., Bracci, S., & Op de Beeck, H. P. (2016). Deep neural networks as a computational model for human shape sensitivity. PLoS Computational Biology, 12, e1004896.CrossRefGoogle ScholarPubMed
Lavin, A., Guntupalli, J. S., Lázaro-Gredilla, M., Lehrach, W., & George, D. (2018). Explaining visual cortex phenomena using recursive cortical network. bioRxiv, 380048. https://doi.org/10.1101/380048Google Scholar
Livingstone, M., & Hubel, D. (1988). Segregation of form, color, movement, and depth: Anatomy, physiology, and perception. Science (New York, N.Y.), 240, 740749.CrossRefGoogle ScholarPubMed
Locatello, F., Poole, B., Rätsch, G., Schölkopf, B., Bachem, O., & Tschannen, M. (2020, November). Weakly-supervised disentanglement without compromises. In International conference on machine learning, Vienna, Austria (pp. 6348–6359).Google Scholar
Malhotra, G., Dujmović, M., Hummel, J., & Bowers, J. S. (in press). Human shape representations are not an emergent property of learning to classify objects. Journal of Experimental Psychology: General.Google Scholar
Malhotra, G., Evans, B. D., & Bowers, J. S. (2020). Hiding a plane with a pixel: Examining shape-bias in CNNs and the benefit of building in biological constraints. Vision Research, 174, 5768. https://doi.org/10.1016/j.visres.2020.04.013CrossRefGoogle ScholarPubMed
Marr, D. (1982). Vision: A computational investigation into the human representation and processing of visual information. MIT Press.Google Scholar
Matin, E. (1974). Saccadic suppression: A review and an analysis. Psychological Bulletin, 81, 899917.CrossRefGoogle ScholarPubMed
Mayo, D. G. (2018). Statistical inference as severe testing. Cambridge University Press.CrossRefGoogle Scholar
Medin, D. L., & Schaffer, M. M. (1978). Context theory of classification learning. Psychological Review, 85, 207.CrossRefGoogle Scholar
Meijer, G. (2021). Neurons in the mouse brain correlate with cryptocurrency price: A cautionary tale. Peer Community Journal, 1, e29.CrossRefGoogle Scholar
Messina, N., Amato, G., Carrara, F., Gennaro, C., & Falchi, F. (2021). Solving the same-different task with convolutional neural networks. Pattern Recognition Letters, 143, 7580.CrossRefGoogle Scholar
Mitchell, J., & Bowers, J. (2020, December). Priorless recurrent networks learn curiously. In Proceedings of the 28th international conference on computational linguistics (pp. 5147–5158). International Committee on Computational Linguistics. https://doi.org/10.18653/v1/2020.coling-main.451CrossRefGoogle Scholar
Montero, M., Bowers, J., Ponte Costa, R., Ludwig, C., & Malhotra, G. (2022). Lost in latent space: Examining failures of disentangled models at combinatorial generalisation. Advances in Neural Information Processing Systems, 35, 1013610149.Google Scholar
Nakayama, K., & Shimojo, S. (1992). Experiencing and perceiving visual surfaces. Science (New York, N.Y.), 257, 13571363.CrossRefGoogle ScholarPubMed
Newell, A. (1973). You can't play 20 questions with nature and win: Projective comments on the papers of this symposium. In W. G. Chase (Ed.), Visual information processing: Proceedings of the eighth annual Carnegie symposium on cognition, held at the Carnegie-Mellon University, Pittsburgh, Pennsylvania, May 19, 1972. Academic Press.Google Scholar
Puebla, G., & Bowers, J. S. (2022). Can deep convolutional neural networks support relational reasoning in the same-different task?. Journal of Vision, 22, 11. https://doi.org/10.1167/jov.22.10.11CrossRefGoogle ScholarPubMed
Puebla, G., & Bowers, J. S. (2023). The role of object-centric representations, guided attention, and external memory on generalizing visual relations. arXiv preprint arXiv:2304.07091.Google Scholar
Ramachandran, V. S., & Gregory, R. L. (1991). Perceptual filling in of artificially induced scotomas in human vision. Nature, 350, 699702.CrossRefGoogle ScholarPubMed
Ratcliff, R., & McKoon, G. (2008). The diffusion decision model: Theory and data for two-choice decision tasks. Neural Computation, 20, 873922.CrossRefGoogle ScholarPubMed
Rayner, K. (1978). Eye movements in reading and information processing. Psychological Bulletin, 85, 618660.CrossRefGoogle ScholarPubMed
Rescorla, R. A., & Wagner, A. R. (1972). A theory of Pavlovian conditioning: Variations in the effectiveness of reinforcement and nonreinforcement. In Black, A. H. & Prokasy, W. F. (Eds.), Classical conditioning II: Current research and theory (Vol. 2, pp. 6469). Appleton-Century Crofts.Google Scholar
Rich, P., de Haan, R., Wareham, T., & van Rooij, I. (2021). How hard is cognitive science? In Proceedings of the annual meeting of the cognitive science society (Vol. 43, No. 43).Google Scholar
Rust, N. C., & Movshon, J. A. (2005). In praise of artifice. Nature Neuroscience, 8, 16471650. https://doi.org/10.1038/nn1606CrossRefGoogle ScholarPubMed
Sabour, S., Frosst, N., & Hinton, G. E. (2017). Dynamic routing between capsules. Advances in Neural Information Processing Systems, 30, 38563866.Google Scholar
Schrimpf, M., Blank, I. A., Tuckute, G., Kauf, C., Hosseini, E. A., Kanwisher, N., … Fedorenko, E. (2021). The neural architecture of language: Integrative modeling converges on predictive processing. Proceedings of the National Academy of Sciences of the United States of America, 118, e2105646118.CrossRefGoogle ScholarPubMed
Sexton, N. J., & Love, B. C. (2022). Reassessing hierarchical correspondences between brain and deep networks through direct interface. Science Advances, 8, eabm2219.CrossRefGoogle ScholarPubMed
Shepard, R. N. (1987). Toward a universal law of generalization for psychological science. Science (New York, N.Y.), 237, 13171323.CrossRefGoogle Scholar
Simmons, J. P., Nelson, L. D., & Simonsohn, U. (2011). False-positive psychology: Undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychological Science, 22, 13591366.CrossRefGoogle ScholarPubMed
Simons, D. J., & Levin, D. T. (1997). Change blindness. Trends in Cognitive Sciences, 1, 261267.CrossRefGoogle ScholarPubMed
Storrs, K. R., Kietzmann, T. C., Walther, A., Mehrer, J., & Kriegeskorte, N. (2021). Diverse deep neural networks all predict human inferior temporal cortex well, after training and fitting. Journal of Cognitive Neuroscience, 33, 20442064.Google ScholarPubMed
Thorpe, S., Fize, D., & Marlot, C. (1996). Speed of processing in the human visual system. Nature, 381, 520522.CrossRefGoogle ScholarPubMed
Tsvetkov, C., Malhotra, G., Evans, B. D., & Bowers, J. S. (2023). The role of capacity constraints in convolutional neural networks for learning random versus natural data. Neural Networks, 161, 515524.CrossRefGoogle ScholarPubMed
Vannuscorps, G., Galaburda, A., & Caramazza, A. (2021). The form of reference frames in vision: The case of intermediate shape-centered representations. Neuropsychologia, 162, 108053.CrossRefGoogle ScholarPubMed
van Rooij, I. (2022). Psychological models and their distractors. Nature Reviews Psychology, 1, 127128. https://doi.org/10.1038/s44159-022-00031-5CrossRefGoogle Scholar
Wagemans, J., Elder, J. H., Kubovy, M., Palmer, S. E., Peterson, M. A., Singh, M., & von der Heydt, R. (2012). A century of Gestalt psychology in visual perception: I. Perceptual grouping and figure-ground organization. Psychological Bulletin, 138, 1172.CrossRefGoogle ScholarPubMed
Winograd, T. (1971). Procedures as a representation for data in a computer program for understanding natural language. AITR-235. Retrieved from http://hdl.handle.net/1721.1/7095Google Scholar
Yovel, G., Grosbard, I., & Abudarham, N. (2022). Computational models of perceptual expertise reveal a domain-specific inversion effect for objects of expertise. PsyXiv, 125.Google Scholar
Zeki, S. (1991). Cerebral akinetopsia (visual motion blindness). A review. Brain, 114, 811824.CrossRefGoogle ScholarPubMed
Zhang, H., Zhang, Y. F., Liu, W., Weller, A., Schölkopf, B., & Xing, E. P. (2022). Towards principled disentanglement for domain generalization. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 8024–8034).CrossRefGoogle Scholar
Zhou, Z., & Firestone, C. (2019). Humans can decipher adversarial images. Nature Communications, 10, 19.Google ScholarPubMed