A problem with many traditional neural network learning algorithms is that they lack a biologically plausible method of assigning credit to the nodes in the earlier levels of the network that played a decisive role in the stimulus response mapping. In the Attention-Gated Reinforcement Learning Model (AGREL)  the credit assignment problem is overcome by the focal influence of attention through the feedback connections in the network. In each trial, a global reward value (conveyed by neuromodulators) is calculated and serves to increase the likelihood that successful behavior be repeated. The activity in the output level of the network (which is limited so that only a single node at a time may be active/attended) is back propagated through the network via feedback connections. The feedback connections, which are reciprocal in strength to the feedforward connections, allow the network to selectively target the lower level nodes that drove the current output, which in combination with the global reward factor, allows the network to enhance connections with nodes that caused successful behaviors, see figure 1. This attentional gating of the reward signal offers a biologically realistic solution to the credit assignment problem and in so doing provides a coherent and unifying framework for learning, with attentional selection at its core. AGREL has previously been shown to closely replicate the changes in tuning curves observed in two physiological categorization tasks . In the current study, we first replicate the findings of two further physiological tasks using AGREL, and second, examine the effect of removing the feedback connections, on our models performance in these and the previous tasks.
Figure 1. AGREL network architecture, and learning rules for a 3 layered network. δ = reward signal.
One of the modeled tasks was conducted by Freedman and Assad , where monkeys were trained to group 12 directions of coherently moving dots into 2 arbitrary categories. They found that neurons in region LIP learned to respond selectively to a particular category following training (see figure 2). This finding was replicated in AGREL using a three layer network, in which the units in the hidden layer where shown to exhibit similar categorization dependent tuning properties, see figure 2. In general, when the feedback connections were removed, the performance of the network was severely impaired, and the physiologically observed changes in tuning curves were no longer seen, see figure 2. However, depending on the complexity of the task, the network was still able to converge to a solution, although convergence times were now far longer.