Email updates

Keep up to date with the latest news and content from BMC Neuroscience and BioMed Central.

This article is part of the supplement: Eighteenth Annual Computational Neuroscience Meeting: CNS*2009

Open Access Poster presentation

A spiking temporal-difference learning model based on dopamine-modulated plasticity

Wiebke Potjansu12*, Abigail Morrison1 and Markus Diesmann134

  • * Corresponding author: Wiebke Potjansu

Author Affiliations

1 Theoretical Neuroscience Group, RIKEN Brain Science Institute, Wako City, Saitama, Japan

2 Institute of Neurosciences and Medicine, Research Center Jülich, Jülich, Germany

3 Brain and Neural Systems Team, RIKEN Computational Science Research Program, Wako City, Saitama, Japan

4 Bernstein Center for Computational Neuroscience, Albert-Ludwigs-University, Freiburg, Germany

For all author emails, please log on.

BMC Neuroscience 2009, 10(Suppl 1):P140  doi:10.1186/1471-2202-10-S1-P140


The electronic version of this article is the complete one and can be found online at: http://www.biomedcentral.com/1471-2202/10/S1/P140


Published:13 July 2009

© 2009 Potjansu et al; licensee BioMed Central Ltd.

Poster presentation

Making predictions about future rewards and adapting the behavior accordingly is crucial for any higher organism. One theory specialized for prediction problems is temporal-difference (TD) learning. Experimental findings suggest that TD learning is implemented by the mammalian brain. In particular, the resemblance of dopaminergic activity to the TD error signal [1] and the modulation of corticostriatal plasticity by dopamine [2] lend support to this hypothesis. We recently proposed the first spiking neural network model to implement actor-critic TD learning [3], enabling it to solve a complex task with sparse rewards. However, this model calculates an approximation of the TD error signal in each synapse, rather than utilizing a neuromodulatory system.

Here, we propose a spiking neural network model which dynamically generates a dopamine signal based on the actor-critic architecture proposed by Houk [4]. This signal modulates as a third factor the plasticity of the synapses encoding value function and policy. The proposed model simultaneously accounts for multiple experimental results, such as the generation of a TD-like dopaminergic signal with realistic firing rates in conditioning protocols [1], and the role of presynaptic activity, postsynaptic activity and dopamine in the plasticity of corticostriatal synapses [5]. The excellent agreement between the predictions of our synaptic plasticity rules and the experimental findings is particularly noteworthy, as the update rules were postulated employing a purely top-down approach.

We performed simulations in NEST [6] to test the learning behavior of the model in a two dimensional grid-world task with a single rewarded state. The network learns to evaluate the states with respect to its reward proximity and adapt its policy accordingly. The learning speed and equilibrium performance are comparable to those of a discrete time algorithmic TD learning implementation.

The proposed model paves the way for investigations of the role of the dynamics of the dopaminergic system in reward-based learning. For example, we can use lesion studies to analyze the effects of dopamine treatment in Parkinson's patients. Finally, the experimentally constrained model can be used as the centerpiece of closed-loop functional models.

Acknowledgements

Partially funded by EU Grant 15879 (FACETS), BMBF Grant 01GQ0420 to BCCN Freiburg, Next-Generation Supercomputer Project of MEXT, Japan, and the Helmholtz Alliance on Systems Biology.

References

  1. Schultz W, Dayan P, Montague PR: A neural substrate of prediction and reward.

    Science 1997, 275:1593-1599. PubMed Abstract | Publisher Full Text OpenURL

  2. Reynolds JN, Hyland BI, Wickens JR: A cellular mechanism of reward-related learning.

    Nature 2001, 413:67-70. PubMed Abstract | Publisher Full Text OpenURL

  3. Potjans W, Morrison A, Diesmann M: A spiking neural network model of an actor-critic learning agent.

    Neural Computation 2009, 21:301-339. PubMed Abstract | Publisher Full Text OpenURL

  4. Houk JC, Adams JL, Barto AG: A model of how the basal ganglia generate and use neural signals that predict reinforcement. MIT Press, Cambridge, MA; 1995. OpenURL

  5. Reynolds JN, Hyland BI, Wickens JR: Dopamine-dependent plasticity of corticostriatal synapses.

    Neural Networks 2002, 15:507-521. PubMed Abstract | Publisher Full Text OpenURL

  6. Gewaltig M-O, Diesmann M: NEST (neural simulation tool).

    Scholarpedia 2007, 2:1430. OpenURL