Hebbian learning has been implicated as a possible mechanism in a wide range of learning and memory functions in the brain. A large body of theoretical studies and simulations has investigated its implications in the dynamics of single neuron as well as network models. For example, neural network models have been found to produce meaningful internal states when driven by structured external stimulation. These studies, however, typically lack a notion of a "desired output" in the form of a well specified pattern of network activity, corresponding to a relevant functional output. To impose a desired input-output relation, various forms of supervised learning (or at least reinforcement in the form of an external cue) are often invoked. Recently there has been increasing interest in computational models that involve a separation of time scales between relatively fast plasticity rules and considerably slower reinforcement mechanisms. A large majority of these studies focuses on the role of neuromodulators, such as dopamine. Here, we study a training protocol within such a closed loop setup, with the separation of time scales appearing between a fast learning rule and slower synaptic fatigue.
Our model is motivated in part by a series of experiments on ex-vivo cultures of neuronal networks [1,2]. Such self-assembled networks are perhaps closest in their topology to the random, recurrent networks underlying typical neural network simulation models and lack the complexity of a whole brain, or even a slice. It is an open question whether ex-vivo cultures of neurons and glia can support learning, and if so, what is their capacity and what mechanisms underlie such phenomena . Here, we study a recurrent network of integrate-and-fire neurons with competitive Hebbian learning (STDP), subject to a learning protocol, in which stimulation is suppressed in response to the onset of a desired output. A local activity-dependent second messenger is used to modulate the level of plasticity. The activity of the network (mediated by external stimulation and reinforcement) directly regulates the second messenger, thus effectively closing the loop. We show how successful learning in these networks depends on the interplay between the network's ability, first, to explore its space of configurations to obtain a desired output, and second, to converge reliably to that configuration in response to the external cues. These results extend the traditional competitive view of Hebbian learning by refining the dependency of the rule to slow (or long-term) input patterns. By explicitly subjecting the network to (i) competitive learning, (ii) explicit reinforcement and (iii) activity-dependent plasticity modulation, meaningful patterns of input-output relations can be learned by the network.
The authors acknowledge funding from the EPSRC grant EP/D00232X/1.