We here address the problem of integrating information about multiple objects and their positions on the visual scene. A primate visual system has little difficulty in rapidly achieving integration, given only a few objects. Unfortunately, computer vision still has great difficultly achieving comparable performance. It has been hypothesized that temporal binding or temporal separation could serve as a crucial mechanism to deal with information about objects and their positions in parallel to each other. Elaborating on this idea, we propose a neurally plausible mechanism for reaching local decision-making for "what" and "where" information to the global multi-object recognition.
The model we propose here is inspired by the binding-by-synchrony  as well as the dynamic link architecture . The decision-making is done by so-called control (C) macrocolumn units, which are responsible not only for the synchronization or de-synchronization of selected feature macrocolumns, but also for signaling the position of the object in the scene. The feature macrocolumns are placed on two distinct domains. The input (I) domain contains the sensory data from the scene while the gallery (G) domain stores the reference objects to be recognized. Each macrocolumn consists of subunits called minicolumns, which are bound together by common afferents and lateral inhibition modulated by an autonomous oscillator of the integrate-and-fire (IF) type, being a further development of the previous modeling approach of a macrocolumn cortex . The binding-by-synchrony, establishing the related dynamic links, is achieved via similarity computation between the feature columns and the similarity-based modulation of a time constant and weight of the IF synaptic couplings, influenced by the C column subunits.
Figure 1 demonstrates that the binding-by-synchrony in our system is achieved so rapidly within a few of hundred milliseconds. More precisely, the IF neural oscillators in the feature macrocolumns of I and G with the higher similarity become synchronized with zero-lag, showing asynchronous behavior between the IF oscillators of the feature macrocolumns with lower similarity. Transition of synchrony to asynchrony occurs by modulating a time constant and weight of the IF synaptic couplings, under the influence of subunit activities in the C. The zero-lag synchronization between the IF oscillators is the global object recognition, assigning each object the corresponding position in the scene, which is signaled by the activities in the C column units.
Figure 1. 1 Synchronization process between the integrate-and-fire (IF) oscillators in the feature columns representing the same object together across different domains.
This work was supported by the Hertie Foundation, by the EU project "Daisy", FP6-2005-015803 and by the German Federal Ministry of Education and Research (BMBF) within the "Bernstein Focus: Neurotechnology" through research grant 01GQ0840.