In the visual system, attending to important objects in the visual field relies on the transfer of top-down, object-based task information to the spatially organised areas of cortex. How this occurs and the method by which this information can influence the dorsal stream and redirect gaze are not well understood. Current models of the ventral stream mostly focus on the feed-forward mechanisms involved and current feedback models do not seem to address the issue of object-space binding in a comprehensive and plausible manner.
We investigated these questions using the following modeling framework. A bidirectional, ventral stream object recognition hierarchy up to anterior inferior temporal cortex (AIT) from primary visual cortex (V1) and a model of dorsal stream to frontal eye fields (FEF) with our previously developed oculomotor system . Selection is performed in both the object-based mapping of AIT  and the spatial mapping of FEF  by basal ganglia loops . Modeling of the ventral stream consists of a hierarchy of increasingly spatially invariant cortical areas linked by both feed-forward excitatory and feedback connections. Within each receptive field, there is a competition to represent the strongest and thus most likely representation for that region, which can be biased by the feedback from higher visual areas. Three models of feedback attention mechanism were tested: additive feedback, shunting (multiplicative) feedback and a shunt "gating" of feedback by feed-forward. The model was tested using a simple visual world (colored "flags") that nevertheless challenged all the main competencies being investigated. Performance was measured by (i) eliciting saccadic "behavior" in simulated visual search with different numbers of distractors, and (ii) target segmentation in cluttered scenes within a fixed time window.
In the target segmentation task, the additive feedback model consistently fails to bind the AIT representation of the object to the correct location on the visual field. The shunting model was able to segment 58% of scenes while the gating model was most successful (83%) (Figure 1). We then took the most successful (gating) model and challenged it with a conjunction visual search task. Here, by simulating models trained and naïve to the target stimulus, we showed that subsequent learning of a combined representation of an untrained target stimulus can explain the experimentally observed decrease in the slope of reaction time against number of distractors for that target (Figure 2) .