To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
When we see objects in the world, what we actually “see” is much more than the retinal image. Our perception is three-dimensional. Moreover, it reflects constant properties of the objects and the environment, regardless of changes in the retinal image with varying viewing condition. How does the visual system make this possible?
Two different approaches have been evident in the study of visual perception. One approach, most successful in recent times, is based on the idea that perception emerges automatically by some combination of neuronal receptive fields. In the study of depth perception, this general line of thinking has been supported by psychophysical and physiological evidence. The “purely cyclopean” perception in the Julesz' random dot stereogram (Julesz, 1960) shows that depth can emerge without the mediation of any higher order form recognition. This suggested that relatively local disparity-specific processes could account for the perception of a floating figure in an otherwise camouflaged display. Corresponding electrophysiological experiments using single cell recordings demonstrated that the depth of such stimuli could be coded by neurons in the visual cortex, receiving input from the two eyes (Barlow et al., 1967; Poggio & Fischer, 1977). In contrast to this more modern approach, there exists an older tradition which asserts that perception is inferential, that it can cleverly determine the nature of the world with limited image data. Starting with Helmholtz's unconscious inference (Helmholtz, 1910) and with more recent formulations such as Gregory's “perceptual hypotheses”, this approach stresses the importance of problem solving in the process of seeing (Hochberg, 1981; Gregory, 1970; Rock, 1983).
The world we live in is a very structured place. Matter does not flit about in space and time in a completely unorganized fashion, but rather is organized by the physical forces, biological processes, social interactions, and so on which exist in our world (McMahon, 1975; Thompson, 1952). It is this structure, or regularity, which makes it possible for us to make reliable inferences about our surroundings from the signals taken in from various senses (Marr, 1982; Witkin and Tenenbaum, 1983). In other words, regularities in the world make sense data reliably informative about the world we move around in. But what is the nature of these regularities, and how can they be used for the purposes of perception?
In this chapter, we consider one class of environmental regularities which arise from what we call the modal structure of the world and which has the effect of making sensory information for certain types of perceptual judgements highly reliable (Bobick and Richards, 1986). Our definition of modal regularities is motivated by careful analyses of some simple examples of reliable perceptual inferences. Given the resulting definition, we then briefly discuss some of the implications for the knowledge required of a perceiver in order for it to make reliable inferences in the presence of such modal structure.
Modal structure: An example
When can we infer that an object is stationary?
A common perceptual inference is that of whether an object is moving or at rest. How can we make this inference given only the two-dimensional projection of a three-dimensional object?
Ideas about perception have changed a great deal over the past half century. Before the discovery of maps of sensory surfaces in the cerebral cortex the consensus view was that perception involved “mind-stuff”, and because mind-stuff was not matter the greatest care was required in using crude, essentially materialistic, scientific methods and concepts to investigate and explain phenomena in which it had a hand. One approach was to reduce the role of this mind-stuff to a minimum by designing experiments so that the mind was used simply as a null-detector, analogous to a sensitive galvanometer in a Wheatstone bridge, which had to do no more than detect the identity or non-identity of two percepts. Those who followed this approach might be termed the hard-psychophysics school – exemplified by people such as Helmholtz, Stiles, Hecht and Rushton – and it had some brilliant successes in explaining the properties of sensation: for instance the trichromatic theory; the relation between the quality of the retinal image and visual acuity; and the relation between sensitivity and the absorption of quanta in photo-sensitive pigments. But it left the mystery of the mind-stuff untouched.
Others attempted to discover the properties of the mind-stuff by defining the physical stimuli required to elicit its verbally recognizable states – a soft-psychophysics approach that could be said to characterize the work of the Gestalt school and, for example, Hering, Gibson, and Hurvich and Jameson.
The luminance of a surface results from the combined effect of its reflectance (albedo) and its conditions of illumination. Luminance can be directly observed, but reflectance and illumination can only be derived by perceptual processes. Human observers are good at judging an object's reflectance in spite of large changes in illumination; this skill is known as “lightness constancy”.
Most research on lightness constancy has used stimuli consisting of grey patches on a single flat plane. The models are typically based on the assumption that slow variations in luminance are due to illumination gradients, while sharp changes in luminance are due to reflectance edges. The retinex models for use with “Mondrian” stimuli are good examples (Horn, 1974; Land & McCann, 1971). But in three dimensional scenes, sharp luminance changes can arise from either reflectance or from illumination, as illustrated in Figure 11.1. The edge marked (1) is due to a reflectance change, such as might result from a different shade of paint. The edge marked (2) results from a change in surface normal which leads to a change in the angle of incidence of the light – an effect that we may simply refer to as “shading.” As Gilchrist and his colleagues have emphasized (Gilchrist et al., 1983), three-dimensional scenes introduce large and important effects that are completely missed in the traditional approach to lightness perception.
Intrinsic image analysis
Using the terminology of Barrow & Tenenbaum (1978) we may cast the perceptual task as a problem of computing intrinsic images – images that represent the underlying physical properties of a scene.
Visual perception is the process of inferring world structure from image structure. If the world structure we recover from our images “makes sense” as a plausible world event, then we have a “percept” and can often offer a concise linguistic description of what we see. For example, in the upper panel of Figure 3.1, if asked, “What do you see?”, a typical response might be a pillbox with a handle either erect (left) or flat (right). This concise and confident response suggests that we have identified a model type that fits the image observation with no residual ambiguities at the level of the description. In contrast, when asked to describe the two lower drawings in Figure 3.1, there is some hesitancy and uncertainty. Is the handle erect or not? Does it have a skewed or rectangular shape? The depiction leaves us somewhat uncertain, as if several options were possible, but none where all aspects of the interpretation collectively support each other. What then, leads us to the certainty in the upper set and to the ambiguity in the lower pair?
To be a bit more precise about our goal, let us assume that some Waltz-like algorithm has already identified the base of the pillbox and the wire-frame handle as separate 3D parts. Even with this decomposition, there remains an infinity of possible interpretations for any of these drawings.
It has been known since at least the time of Leonardo da Vinci that encoded within a pair of stereo images is information detailing the scene geometry (Leonardo, 1989). The animal brain has known this for millions of years and has developed as yet barely understood neuronal mechanisms for decoding this information. Hold your hand inches in front of your face and, with both eyes focused, stare at your fingers – they appear vividly in three-dimensions (3-D). In fact everywhere you gaze, you are aware of the relative depths of the observed objects.
Stereo vision is not the only clue to depth; there is a whole host of monocular clues which humans bring to bear in determining depth, evidenced by the fact that if you close one eye, it is still relatively simple to determine 3-D spatial relations. Nevertheless, monocular clues are less exact and often ambiguous. Otherwise, why would the mammalian anatomy have bothered to narrow the visual field in order to reposition the eyes for stereo vision? As a simple demonstration of the precision of stereo vision, try to touch the tips of two pencils with your arms outstretched, one pencil in each hand. With one eye closed, the task is frustratingly difficult; with both eyes open, the relative depths of the tips of the pencils are clear, and the task becomes as simple as touching your nose.
Researchers and industry are actively developing Software Agents (SAs), autonomous software that will assist users in achieving various tasks, collaborate with them, or even act on their behalf. To explore new interaction modes for SAs which need to be more sophisticated than simple exchanges of messages, we have analysed human conversations and elaborated an interaction approach for SAs based on a conversation model. Using this approach, we have developed a multi-agent system that simulates conversations involving SAs. We assume that SAs perform communicative acts to negotiate about mental states, such as beliefs and goals, turn-taking and special conversational sequences. We also assume that SAs respect communication protocols when they negotiate. In this paper, we describe the conceptual structure of communicative acts, the knowledge structures used to model a conversation, and the communication protocols. We show how an inference engine using ‘conversation-managing rules’ can be integrated in a conversational agent responsible for interpreting communicative acts, and we discuss the different kinds of rules that we propose. The prototype PSICO was implemented to simulate conversations on a computer platform.
Language tools that help people with their writing are now usually included in today's word processors. Although these various tools provide increasing support to native speakers of a language, they are much less useful to non-native speakers who are writing in their second language (e.g. French speakers writing in English). Real errors may go undetected and potential errors or non-errors that are flagged by the system may be taken to be genuine errors by the non-native speaker. In this paper, we present the prototype of an English writing tool which is aimed at helping speakers of French write in English. We first discuss the kind of problems non-native speakers have when writing in a second language. We then explain how we collected a corpus of errors which we used to build a typology of errors needed in the various stages of the project. This is followed by an overview of the prototype which contains a number of writing aids (dictionaries, on-line grammar helps, verb conjugator, etc.) and two checking tools: a problem word highlighter which lists all the potentially difficult words that cannot be dealt with correctly by the system (false friends, confusions, etc.) and a grammar checker which detects and corrects morphological and syntactic errors. We describe in detail the automata formalism we use to extract linguistic information, test syntactic environments and detect and correct errors. Finally, we present a first evaluation of the correction capacity of our grammar checker as compared to that of commercially available systems.
This paper discusses different issues in the construction and knowledge representation of an intelligent dictionary help system. The Intelligent Dictionary Help System (IDHS) is conceived as a monolingual (explanatory) dictionary system for human use (Artola and Evrard, 1992). The fact that it is intended for people instead of automatic processing distinguishes it from other systems dealing with the acquisition of semantic knowledge from conventional dictionaries. The system provides various access possibilities to the data, allowing to deduce implicit knowledge from the explicit dictionary information. IDHS deals with reasoning mechanisms analogous to those used by humans when they consult a dictionary. User level functionality of the system has been specified and a prototype has been implemented (Agirre et al., 1994a). A methodology for the extraction of semantic knowledge from a conventional dictionary is described. The method followed in the construction of the phrasal pattern hierarchies required by the parser (Alshawi, 1989) is based on an empirical study carried out on the structure of definition sentences. The results of its application to a real dictionary has shown that the parsing method is particularly suited to the analysis of short definition sentences, as it was the case of the source dictionary. As a result of this process, the characterization of the different lexical-semantic relations between senses is established by means of semantic rules (attached to the patterns); these rules are used for the initial construction of the Dictionary Knowledge Base (DKB). The representation schema proposed for the DKB (Agirre et al., 1994b) is basically a semantic network of frames representing word senses. After construction of the initial DKB, several enrichment processes are performed on the DKB to add new facts to it; these processes are based on the exploitation of the properties of lexical-semantic relations, and also on specially conceived deduction mechanisms. The result of the enrichment processes show the suitability of the representation schema chosen to deduce implicit knowledge. Erroneous deductions are mainly due to incorrect word sense disambiguation.
Operating system command languages assist the user in executing commands for a significant number of common everyday tasks. On the other hand, the introduction of textual command languages for robots has provided the opportunity to perform some important functions that leadthrough programming cannot readily accomplish. However, such command languages assume the user to be expert enough to carry out a specific task in these application domains. On the contrary, a natural language interface to such command languages, apart from being able to be integrated into a future speech interface, can facilitate and broaden the use of these command languages to a larger audience. In this paper, advanced techniques are presented for an adaptive natural language interface that can (a) be portable to a large range of command languages, (b) handle even complex commands thanks to an embedded linguistic parser, and (c) be expandable and customizable by providing the casual user with the opportunity to specify some types of new words as well as the system developer with the ability to introduce new tasks in these application domains. Finally, to demonstrate the above techniques in practice, an example of their application to a Greek natural language interface to the MS-DOS operating system is given.
Categorical structures suitable for describing partial maps, viz. domain structures, are introduced and their induced categories of partial maps are defined.
The representation of partial maps as total ones is addressed. In particular, the representability (in the categorical sense) and the classifiability (in the sense of topos theory) of partial maps are shown to be equivalent (Theorem 3.2.6).
Finally, two notions of approximation, contextual approximation and specialisation, based on testing and observing partial maps are considered and shown to coincide. It is observed that the approximation of partial maps is definable from testing for totality and the approximation of total maps; providing evidence for taking the approximation of total maps as primitive.
Categories of Partial Maps
To motivate the definition of a partial map, observe that a partial function u : A ⇀ B is determined by its domain of definition dom(u) ⊆ A and the total function dom(u) → B induced by the mapping a ↦ u(a). Thus, every partial function A ⇀ B can be described by a pair consisting of an injection D ↣ A and a total function D → B with the same source.