To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
In the previous chapter, we introduced the idea of directly comparing computational models versus human behavior in visual tasks. For example, we assess how models classify an image versus how humans classify the same image. In some tasks, the types of errors made by computational models can be similar to human mistakes. Here we will dig deeper into what current computer vision algorithms can and cannot do. We will highlight the enormous power of current computational models, while at the same time emphasizing some of their limitations and the exciting work ahead of us to build better models.
We want to understand how neuronal circuits give rise to vision. We can use microelectrodes and the type of neurophysiological recordings introduced in Section 2.7. In the case of the retina, it is evident where to place the microelectrodes to examine function. However, there are about 1011 neurons in the human brain, and we do not have any tools that enable us to record from all of them. How do we figure out what parts of the brain are relevant for vision so we can study them at the neurophysiological level?
And there was light. Vision starts when photons reflected from objects in the world impinge on the retina. Although this may seem rather clear to us right now, it took humanity several centuries, if not more, to arrive at this conclusion. The compartmentalization of the study of optics as a branch of physics and visual perception as a branch of neuroscience is a recent development. Ideas about the nature of perception were interwoven with ideas about optics throughout antiquity and the middle ages. Giants of the caliber of Plato (~428–~348 BC) and Euclid (~300 BC) supported a projection theory according to which cones of light emanating from the eyes either reached the objects themselves or met halfway with other rays of light coming from the objects, giving rise to the sense of vision. The distinction between light and vision can be traced back to Aristotle (384–322 BC) but did not reach widespread acceptance until the investigations of properties of the eye by Johannes Kepler (1571–1630).
As discussed in the last two chapters, there has been significant progress in computer vision. Machines are becoming quite proficient at a wide variety of visual tasks. Teenagers are not surprised by a phone that can recognize their faces. Self-driving cars are a matter of daily real-world discussions. Having cameras in the house that can detect a person’s mood is probably not too far off. Now imagine a world where we have machines that can visually interpret the world the way we do. To be more precise, imagine a world where we have machines that can flexibly answer a seemingly infinite number of questions on a given image. Let us assume that we cannot distinguish the answers given by the machine from the answers that a human would give; that is, assume that machines can pass the Turing test for vision, as defined in Section 9.1. Would we claim that such a machine can see? Would such a machine have visual consciousness?
Understanding how the brain works constitutes the greatest scientific challenge of our times, arguably the greatest challenge of all times. We have sent spaceships to peek outside of our solar system, and we study galaxies far away to build theories about the origin of the universe. We have built powerful accelerators to scrutinize the secrets of subatomic particles. We have uncovered the secrets to heredity hidden in the billions of base pairs in DNA. But we still have to figure out how the three pounds of brain tissue inside our skulls work to enable us to do physics, biology, music, literature, and politics.
We have come a long way since our initial steps toward defining the basic properties of vision in Chapter 1. We started with characterizing the spatial and temporal statistics of natural images (Chapter 2). We summarized visual behavior – that is, how observers perceive the images around them (Chapter 3). Lesion studies helped define specific circuits in the cortex that are responsible for processing distinct types of visual information (Chapter 4). We explored how neurons in the retina, the thalamus, and the ventral visual cortex respond to a variety of different stimulus conditions (Chapters 2, 5, and 6).