Perception is our sensory experience of the world.
The set of processes that organize sensory experience into an understanding of our surrounding world.
Gives insight of who properties of the physical world are transformed into our mental wold and informs our understanding of behaviors like navigation and recognition.
Perception is standing in the continuum between sensation (where physical energy is transformed into brain signals) and cognition (where mental representations of the world and our goals are used to reason and plan behavior.
How physical properties of the world are represented mentally.
Perceptual information is essential to inform us about our surroundings and guide our interactions with the physical and social world.
Visual illusions provide clear evidence that our perceptual systems do not always faithfully represent the physical world.
Somatic perception: perception of the body through touch and sensing the orientation of limbs in space.
From physical world to perceptual representation
The essential problem of perception is that the physical world is ‘out there’ and our mental word is inside our head.
Describes why even for the best sensory organs perception cannot typically guarantee a faithful representation of the physical world.
There are fundamental ways that information is lost in the sensory encoding of the physical world.
The fidelity of our mental representations of the physical world cannot wholly depend upon the incoming information. It must depend upon the ability of perceptual processes to use assumptions about the structure of the world to analyze incoming sensory information in a way that we can overcome the inverse problem to build plausible interpretations of what is out there.
Our perceptual systems have evolved effective principles to overcome theoretical limitations to the processing of perceptual information.
Principles and theories of perception
To tackle the inverse problem, we focus on how best to characterize the flow of information in the fully developed perceptual system and what principles might be at work or organize this information.
The flow of information: bottom-up and top-down processing
A fundamental distinction in perceptual processing is whether we achieve and understanding of the world through bottom-up or top-down mechanisms.
Bottom-up: the original sensory input is transformed in an uninterrupted cascade of transformations feeding forward the information, one transformation following the other until the final representation is obtained.
Also known as data-driven processing.
Characterized by perceptual mechanisms that can independently create increasingly complex representations.
Involves connections between the higher levels and the lower ones.
There are feedback connections that mediate the transformations with higher-level information.
It is critical that we start out with some expectation of what we are looking for, and this knowledge exerts influence on lower-level processes that will interact with the processing of colour, shape and texture.
At the extremes:
- Bottom-up holds that what we experience is an inevitable consequence of what sensation strikes our eyes, ears or skin.
- Top-down holds that this perception will be substantially changes by what we expect to experience.
Perceptual organization: likelihood principle
The direction of information flow is one aspect of information processing and another is how the incoming data is transformed.
The likelihood principle.
The preferred organization of a perceptual object or event will be the one which is most likely.
The likelihood that an object or event will occur is important for the perceptual processing of that object/event.
Something additional is necessary for us to infer the properties of the world.
The likelihood principle suggests a statistical view is appropriate for evaluating our perceptual input to determine what we are experiencing.
From a Bayesian point of view: perception is an inference problem: what is the most likely event responsible for my perception?
For vision this becomes: given the image on my retina, what is the most likely scene to have caused it?
Three components involved in answering this question:
- The likelihood that represents all the uncertainty in the image. The larger the number of scenes consistent with the image, the larger the uncertainty.
- The prior. Represents the knowledge one has about the scene before even looking at the image. The stronger the prior, the less one is subject to the uncertainty of the likelihood.
- The decision rule. Depends on the task and the objectives of the observer, one might be interested in finding the most likely interpretation given all the information available, or instead explore randomly one of the possible interpretations every time the same image is presented. The decision rule adds flexibility to the general framework to model behavior.
Information processing approach
Ecological psychology hold that perception works in a largely bottom-up fashion by exploiting regularities in the visual world that are termed invariant.
Invariants in vision are properties of the three-dimensional object being viewed that can be derived from any two-dimensional image of the object.
Direct perception: (also termed event perception and ecological perception) is the bottom-up process by which objects and their function are recognized.
For any information-processing device to be completely understood it must be understood at three different levels.
The generality of this three-level approach was influential in opening boundaries between researchers working on computer vision, visual psychology and the physiology of vision.
- The computational theory
The computational theory to understand the purpose of the computation and to demonstrate its appropriateness for the task at hand.
Question: what is the purpose of a computation and why does it do what it does?
In the broadest sense, the purpose of the perceptual processes are to keep us aware of our external world and support our adaptability to the changing world.
This processes exist to ensure our survival.
- Choice of representation
The choice of representation for the input and output, and the algorithm to achieve the transformation between input and output.
Our choice of representation will motivate the use of different algorithms to achieve addition.
This level is an essential aspect of cognitive science.
Although transformations are in one sense transformations of physical energy from one form to another, a fundamental view of cognitive science is to consider them as transformations from one information state to another. With this perspective we will model human behavior, and experience of the world, as the result of algorithms operating on representations of information.
- Achieving the computations
How to realize these computations (for example in a human or digital computer).
The actual way in which the computations are achieved.
Every organism or machine will have its own limitations imposed by the device performing the computations. These limitation introduce practical considerations on the second-level of what representation and type of algorithm is optimal to use but they will not impact the first-level computational theory of what is the goal of the computation.
Thus, keeping this choice of device as a separate consideration allows us to discuss perception in terms of transforming incoming stimulus energy into appropriate representation of information without worrying about specific implementation.
The body and perception
Embodied cognition: holds that cognition is about the experiences arising from a perceptual system tightly linked to an action system rather than the manipulations of abstract representations.
What one perceptually experiences of the world is related not only to the perceptual input, but also to one’s purpose, physiological state and emotions.
Perceiving spatial layout combines the geometry of the world with behavioral goals and the costs associated with achieving these coals. But, this claim is controversial.
Six claims that form a basis for embodied cognition:
- Cognition is situated
It takes place in the real world and inherently involves perception and action
- Cognition is time-pressure
We need to evaluate our situation in the environment as quickly as it changes
- We off-load cognitive work onto the environment
Like organizing a hand of cards, we actively change our environment to reduce cognitive workload
- The environment is a part of the cognitive system.
- Cognition is for action
- Off-line cognition is body based.
The mind is grounded in mechanisms involving perception and action
This provides a basis for considering the essential role of perception (and action) in cognition.
The systems that produce perceptual features (such as orientation, colour, motion, timbre, pitch and pressure). Features are important since the modeling of high-level perception and cognition is often focused on the information provided by particular features.
The human brain is just one particular implementation of an information-processing device.
The human brain exhibits a large degree of modularity in its arrangement of sensory processing areas for audio, video and somatosensory processing.
There are some instances where modularity appears violated and one of these cases is syneastesia.
Synaesthesia: an uncommon condition where stimulation of one perceptual modality results in experiencing a precept in a typically unrelated modality. For example, tasting a sound.
There is a degree of structural similarity across the visual, auditory and somatosensory systems.
The basic organization is a hierarchy from specialized receptors, through dedicated neural pathways to centers in the brain with specialized patterns of organization. →
These centers in the brain can either be found in the cortex for information requiring elaborate processing and conscious awareness, or in brain tissue at sub-cortical levels if the perceptual information is needed for immediate monitoring without conscious awareness.
Proprioception: the sense of how our limbs are positioned in space.
Vestibular sensation: the sense of balance and orientation in space.
Location of receptors
Cones, rods in retina
Pathway from receptor to cortex
Optic nerve → thalamus → cortex
Primary cortical receiving area/ organization
Visual cortex/ retinotopic
Color, form, motion, orientation, distance/ depth
Location of receptors
Inner hair cells, outer hair cells in organ or Corti on basilar membrane
Pathway from receptor to cortex
Auditory nerve → thalamus → cortex
Primary cortical receiving area/ organization
Auditory cortex/ tonotopic
Loudness, pitch, timbre, distance
Location of receptors
Semicircular canals or ears
Meissner, Merkel, Riffini and Pacinian receptors in skin
Golgi tendon organs, muscle spindles
Hair cells in otolith organ
Pathway from receptor to cortex
Nerve fibers → spinal cord → thalamus → cortex
Nerve fibers → spinal cord → cerebellum → cortex
Nerve fibers → brainstem → nuclei
Primary cortical receiving area/ organization
Primary somatosensory cortex/ somatotopic
(Brodmann areas 1, 2, 3a and 3b)
Brodmann areas 2 and 3a of somatosensory cortex
No dedicated area
Force of muscles, joint angels
Body movement and body orientation
The encoding of visual information begins in the retinas of the two eyes and is transmitted from there to the primary visual cortex.
This process follows the basic pattern of using specialized receptors to transform light energy to a neural signal that is sent to specific brain regions with a unique functional organization.
Towards the center of each retina is a region known as the fovea that contains an abundance of receptors known as cones that encode color and high-resolution spatial form information.
Surrounding the cones are receptors known as rods that encode motion and low-resolution form information.
Cones → coloured light and fine image detail
Rods → Effective in low levels of light and sense motion
The mapping of visual information from retina to cortex follows a systematic retinotopic organization that preserves spatial order (neighboring regions in the retina are represented in neighboring region in cortex).
The right visual world ends up in the left half of the brain’s primary visual cortex and vice versa.
The center of the visual flied, the fovea with its abundance of high spatial resolution cones, has a disproportionate amount of visual cortex dedicated to processing the incoming visual information.
From the primary cortex, there are two primary pathways for visual processing that lead into the occipital cortex and beyond.
- The ventral stream.
From visual cortex to the temporal lobe.
It is specialized for determining what objects are.
- The dorsal stream.
From visual cortex towards parietal cortex.
Specialized in determining where objects are.
A more complete understanding is available if we divide the dorsal, action stream, into two separate components of planning and control.
- Planning: the inferior parietal lobe
- Control: the superior parietal lobe
It is possible to localize brain areas within the ventral and dorsal streams that are responsible for representing particular visual features.
The encoding of auditory information begins within a special structure in the ear known as the cochlea and is transmitted from there to a part of the brain knows as primary auditory cortex.
The cochlea contains a band of nervous tissue known as the basilar membrane (a stiff structural element located in the inner ear which contains specialized fluids as well as the hair cells) on which hair cells are located . These hair cells move in response to sound pressure to transduce vibration into a nervous signal to be sent along the auditory nerve.
The perceive pitch of a sound depends on the frequency of the sound pressure vibrations and one way that pitch is encoded is that different sections of the basilar membrane are sensitive to different pitches of sound.
- The basilar membrane near to the base of the cochlea encode high-frequency sound
- Aspects at the apex of the cochlea encode low frequency.
In the primary auditory cortex this segregation of pitches is preserved with pitches of similar frequencies neighboring each other. This is a tonotopic map.
Additional mechanism for pitch encoding exploits the fact that firing rates in the auditory nerve can vary
- Higher pitch sounds creating higher-frequency firing rates
Firing rates vary with perceived loudness.
The secondary auditory cortex.
Includes the important speech perception region (Wernicke’s area) has been found to be sensitive to patterns of timing.
Most sounds we hear contain a complex mixture of sound amplitudes and frequencies and decoding this information requires precise timing.
The somatoperception system is a combination of several different subsystems including propriocepion, vestibular sensation and touch.
Proprioception and vestibular sensation give us a sense of the position of our limbs relative to our body and our body in space.
The processing of touch begins in specialized receptors in the skin, which project pathways of neurons to the brain.
These pathways terminate in a portion of the brain called the primary somatosensory cortex (SI), located next to the central sulcus (a major anatomical landmark on the brain that forms the boundary between parietal cortex and frontal cortex.). the organization of this region is somatotopic with local regions of cortex dedicated to specific body parts.
A further organizing principle of the somatosensory system:; the subdivision of processing specialization that run in strips along the length of the primary somatosensory cortex.
These different strips can be identified by the anatomic convention of brain areas defined by Brodmann areas. This division includes area 3A, which involves proprioception and area 3B, which involves simple representations of touch.
Areas 1 and 2 show sensitivity to more complex features (like particular skin simulation in area 1 and particular shapes in area 2). Brain regions adjacent to the primary somatosensory cortex such as the secondary somatosensory area (SII) and the posterior parietal cortex, have been shown to be involved in further elaboration of somatosensory representations.
Each different source of information n has its own particular strengths and weaknesses and thus combining the information should provide benefit.
The modality appropriate hypothesis:for each physical property of the environment there is a particular sensory modality that has a higher acuity for estimation this property than the other senses.
This modality will always dominate bimodal estimates of the property.
Visual capture: vision dominates other senses.
Maximum-likelihood estimation strategy: the more reliable perceptual information is weighted more heavily than the less reliable perceptual information. In this way the perceptual system actively monitors the reliability of the incoming information and attaches more significance to the reliable input.
General perceptual processes produce an object representation that can be compared to a stored internal representation.
One property of effective recognition systems it that they are able to represent the information in a way that preserves the essence of the object upon different transformations.
Feature analysis: involves deconstructing an object into a set of component features that can be compared to a library. Inside this library each object is described by a unique set of features.
The difficulty with such an approach is coming up with a unique feature list that could capture all the different versions of an object.
The Pandemonium model:
So-called demons are arranged in a hierarchy.
Lower demons: evaluate the utility of individual features.
Higher demons evaluate the success of these sub-demons.
The goal is to find what member of a category is the best example of that category.
Determining what members of a category are more central than others allows a more graded response to distinguish across the members of a category.
Basic level categorization. The response that is most likely to be produced when asked to categorize an object.
The boundaries between different basic level categories are not fixed since we are dynamically taking on new information that might cause us to rearrange our category boundaries.
Categorization works to come up with basic level categories that maximize the difference between other basic level categories and minimize the variability within elements of the same basic level category.
Visual object recognition
Most objects in the natural world are three-dimensional. This is problematic for vision since if faces the task of recognizing a three-dimensional object with only the two-dimensional information on the retina.
Viewpoint invariant relationship: any aspect of an object that is preserved no matter the direction from which we view the object.
If we can model objects as created by a set of volumetric primitives then we can recognize an object from arbitrary viewpoints since each part of the object is recognizable by its unique collection of viewpoint invariant properties.
This volumetric primitives are geons.
The heart of recognition by components: objects can be thought of as composed of a collection of geons.
Since every geon in an object can be recovered by its unique collection of viewpoint invariant properties, this allows the entire object to be recognized.
Multiple views theory;
Recognition is fundamentally image-based.
Object recognition can be achieve by storing representations of a few select views of the object that has been learned.
From these select views, sophisticated mechanisms could fill in representations of the intermediate views.
Variants of these two approaches are developed.
Somatoperceptive object recognition
Free exploration of an object will engage subsystems of the somatoperceptual system that involve estimating the weight and texture of an object as well as the position of the body parts touching the object.
Collectively these subsystems contribute to what is called haptic perception.
Haptic perception: the combination of abilities that allow us to represent the material characteristics of objects and surfaces of recognition.
Touch movements made by an active observer provide the phenomenal experience of touching an object, while the same physical contact made by placing an object with a passive observer provide the experience of being touched.
When the body is moving, sensory transmission of touch is diminished. Substantial differences between active and passive touch are not necessary revealed.
Although some complex recognition tasks might require extensive exploration of the object, this is not always necessary.
Visual agnosia and prosopagnosia
Visual agnosia: patients are able to extract a reasonably intact perception of what they see but are unable to assign any meaning to this percept.
Different forms of visual agnosia can be extremely specific to the type of visual stimuli.
Prosopagnosia: the recognition of faces is severely impaired after brain damage.
Scene recognition involves perception of an environment and includes not only perception of individual objects but also the nature of all the objects taken together.
Scene recognition is important for understanding how recognition works in the typical cluttered scenes we view when outside of perception labs.
An abundance of cortical area is dedicated to processing the fovea (the center of the retina)
For this extra processing power to be effective, the eye must place the center of the retina at the point of interest and keep it fixed at this location. Eyes are poor in capturing information is it is not fixated.
The pattern of eye movement is complex and not every part of the scene will be fixated.
Driving our eye movements. Two basic possibilities:
- Bottom-up novel image properties such as brightness or color, make particular image locations salient and this image salience is capturing our eye movements.
- Top-down our goals and expectations are at work to direct the eye movements.
Amplitude of sound wave is one obvious cue to distance and the timbre of the incoming sound wave contains distance information.
The atmosphere filters sound waves so the high frequencies are attenuated, and this change in the distribution of sound frequencies also signal distance.
What we hear is a combination of both the sound wave taking a direct path to our ear as well as the reflections (echoes) of that sound wave.
Conditions in which a sequence gives rise to the impression that one object has launched the other into motion:
- The timing of the motion change
- The relative velocities of the objects’ motions.
Event perception: changes in layout, changes in surface existence of changes in color and texture.
An event is ‘a segment of time at a given location that is conceived by an observer to have a beginning and end’.
The perceptual cycle:
- Memory in the form of schema (a framework that represents a plan or a theory, supporting the organization of knowledge) drives exploration.
- Information pick-up of the kind described by ecological psychology
- Potential modification of schema and subsequent repetition of the steps in this cycle.
The important situation arises when the happening of the world do not unfold to match expectations.
The time that these prediction errors occur can be used to define the time of the boundary of one event finishing and the next one beginning.
Understanding what perceptual information signals social meaning will inform our understanding of human-human interaction at a deeper level.
A precise understanding of how social signals are processed can inform human-computer and human-robot interfaces.
Human activity is constrained by our biology. This informs us about our basic cognitive capabilities.
There is the unique link between perceiving others and our social and emotional responses.
Capgras syndrome: the belief that people have been replaced with duplicates. The emotional response by a person is gone.
Faces are important sources of social information that we use to recognize person properties.
Recognition of faces can be surprisingly accurate.
General properties of face recognition:
- Humans are exquisitely tuned to recognize familiar faces and can do so under many adverse conditions.
- Recognition of unfamiliar faces tells a different story and for unfamiliar faces recognition performance can be surprisingly poor.
- There are specialized brain areas and networks for facial recognition.
- The mechanisms of facial recognition are holistic, the particular way a configuration of facial features makes up a face is important in its own right, and we cannot deconstruct facial recognition into any simple collection of how individual facial features are recognized.
The Bruce and Young model of face recognition
The primary encoding of faces must feed into processes of recognition, identification, analyses of emotion through facial expression and the combination of additional information such as voice to augment facial processing.
Recognition of identity and expression should be independent of one another and, the separation is not complete.
A neural model of face recognition
Compromised multiple regions spread throughout the brain.
- The representation of invariant
Responsible for the recognition of individuals
- Changeable aspects of faces.
Facilitate social communication.
- Core system
Primary face processing occurs in the inferior occipital gyrus.
Representation of invariant aspects is mediated by face-responsive neurons in the fusiform gyrus.
Representation of changeable aspects is mediated by face-response neurons in the superior temporal sulcus.
- Extended system
Includes other brain areas that aid face processing with functions of attention, emotion and identification as well as providing supplementary information from speech processing.
Further processing in concert with other neural systems.
One way that voice carries information independent of linguistic content is found in the fact that the emotional content of an utterance can be carried in the prosody of the speech. (the rhythm, intonation and stress patterns in speech).
The sound quality of a voice is constrained by the combination of the folds of the larynx which provide a sound source, and the vocal tract including the tongue, nasal cavity and lips that filter the sound. The resulting sound of each individual’s voice is made unique by not only the size and shape of these physical structures but also the manner in which individuals form and articulate their vocal tract.
Phonagnosia: the loss of ability to recognize identity from voice. Individuals can understand the content of speech but are unable to identify the speaker.
Humans have distinctive regions outside the primary auditory cortex, in the upper bank of the superior temporal sulcus (STS) that appear to be sensitive to human voice.
This temporal voice area has been found to more actively respond to human voice sounds than to a variety of other sounds including animal vocalizations and assorted non-vocal sounds.
A distributed system exists for independently representing acoustics and identity form voice.
Observing the actions of others can be socially informative.
The ability of observers to use display of human actions to recognize identity, gender, emotion, the action being carried out, and even whether a person appears vulnerable to attack.
Even when there is very little information available in a visual display, people are very efficient at using the limited information present to obtain judgments of social properties like gender.
The body structure and body-motion information are independently processed before being recombined in the posterior region of the superior temporal sulcus (pSTS).
The pSTS is a key area specialized for the perception of human activity.
Structural information form a single ‘snapshot’ is sufficient to inform the recognition of many properties of point-light displays.
Motion is still important to enhance the perception of human activity, but the processing of static information is a vital first step.
There is an occipitotemporal brain region known as the extrastriate body area (EBA) which represents body postures.
Add new contribution