Article summary of Hierarchical models of object recognition in cortex by Riesenhuber & Poggio - Chapter

Recognition of visual objects

The recognition of visual objects is fundamental. Research often takes place with a repeated cognitive task with two essential requirements: invariance and specificity. Cells from the inferotemporal cortex (IT, the highest visual area in the ventral visual pathway) appear to play a key role in object recognition. The cells respond to what one sees with complex objects such as faces. Certain neurons respond specifically to certain faces and not to other faces. The question remains: how can they respond to different faces while the stimulus offer is practically the same in the retina?

This is also reflected in the striate cortex in cats. Both simple and complex cells respond to a presented bar. For example, it appears that the small simple cells have narrow receptive fields that are strongly position-dependent and that the complex cells have large receptive fields and are not position-dependent. Hubel and Wiesel have made a model in which the simple cells respond as if they are neighbor cells. Where cells that sit next to each other also see the world next to each other. So you often see a group of cells firing together. A direct follow-up of this model leads to a higher-order-complex cells scheme.

Cells in the V4 can control their attention and they can respond to an adaptation in their receptive field. There is little evidence that this mechanism is used to translate invariant object recognition. Invariance of each transformation can be built up by converting afferent cells with different variations of the same stimulus. Evidence has now been found that groups of cells that respond to whole or partial vision are learned through a learning process. The vision invariance problem can then be presented by a small number of neurons. This idea gives us two problems.

Problem 1

In monkeys, it is that learning (for them) unknown stimuli (such as faces) is possible because they learn a part of the invariant via just one view of the object. If this object is presented with a lot of distractor objects around it, it can be learned in combination with these objects. The cells thus become invariant at other positions.

Problem 2

The model does indicate how view tuned units (VTU, groups that fire at a specific object) are built, but not how they arise.


The model is based on a simple hierarchical feedforward architecture. It is assumed that the structure reflects the invariance and that characteristic specificity must be built up from different mechanisms. The pooling mechanism should provide robust feature detectors. This means that it must allow detection on specific characteristics without getting confused by clutter or context in the receptive field.

There are two alternatives to a pooling mechanism.

Linear addition = SUM.

Equal weights are hereby weighed. Responses to a complex cell are invariant as long as the stimulus remains in the receptive field of the cell. However, there is no response as to whether there actually is a bar in the receptive field. The output signal is the sum of the afferent cells and so there is no characteristic specificity.

Non-linear maximum operation = MAX.

The strongest afferent cell determines the postsynaptic response. With MAX, the response is determined by determining the most active afferent cell and this signal is seen as the best match for a portion of the stimulus. This makes MAX respond better.

In both cases, the response of a complex cell is invariant to the bar on the receptive field. A non-linear MAX function is a good way that correctly describes the pool when invariant. This includes implicit scanning of afferent cells of the same type. The strongest is then selected from the cells that respond and this is the most consistent with the invariance. Pooling combinations of afferent cells provides a mixed signal caused by different stimuli.

MAX systems are comparable in some respects to neurophysiological data. For example, if two stimuli are offered in the receptive field of an IT neuron, then the neuron's response is dominated by the stimulus that receives the most responses separately. This corresponds to how the MAX model predicts when it comes to afferent neurons. A number of studies provide support for the MAX model. These studies often find a high non-linear tuning of IT cells. This corresponds to the MAX response function. A linear model cannot make such strong changes with a small change in input.

In some cases, clutter can cause the value to change from the MAX function. The quality of the match in the final phase has then changed, so that the power of the VTU response is also different. A solution for this is to add more specific characteristics. Simulations have shown that this model is able to recognize objects in a context.

The MAX model can be used well to describe brain processes. MAX responses are probably from cortical microcircuits in lateral inhibition between neurons in the cortical layer. In addition, the MAX response is important for object recognition.

Check page access:
Work for WorldSupporter


JoHo can really use your help!  Check out the various student jobs here that match your studies, improve your competencies, strengthen your CV and contribute to a more tolerant world

Working for JoHo as a student in Leyden

Parttime werken voor JoHo

How to use more summaries?

Online access to all summaries, study notes en practice exams

Using and finding summaries, study notes en practice exams on JoHo WorldSupporter

There are several ways to navigate the large amount of summaries, study notes en practice exams on JoHo WorldSupporter.

  1. Starting Pages: for some fields of study and some university curricula editors have created (start) magazines where customised selections of summaries are put together to smoothen navigation. When you have found a magazine of your likings, add that page to your favorites so you can easily go to that starting point directly from your profile during future visits. Below you will find some start magazines per field of study
  2. Use the menu above every page to go to one of the main starting pages
  3. Tags & Taxonomy: gives you insight in the amount of summaries that are tagged by authors on specific subjects. This type of navigation can help find summaries that you could have missed when just using the search tools. Tags are organised per field of study and per study institution. Note: not all content is tagged thoroughly, so when this approach doesn't give the results you were looking for, please check the search tool as back up
  4. Follow authors or (study) organizations: by following individual users, authors and your study organizations you are likely to discover more relevant study materials.
  5. Search tool : 'quick & dirty'- not very elegant but the fastest way to find a specific summary of a book or study assistance with a specific course or subject. The search tool is also available at the bottom of most pages

Do you want to share your summaries with JoHo WorldSupporter and its visitors?

Quicklinks to fields of study (main tags and taxonomy terms)

Field of study

Access level of this page
  • Public
  • WorldSupporters only
  • JoHo members
  • Private
Comments, Compliments & Kudos:

Add new contribution

This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Enter the characters shown in the image.