Email updates

Keep up to date with the latest news and content from BMC Neuroscience and BioMed Central.

This article is part of the supplement: Seventeenth Annual Computational Neuroscience Meeting: CNS*2008

Open Access Poster presentation

Properties of synaptic plasticity rules implementing actor-critic temporal-difference learning

Wiebke Potjans1*, Abigail Morrison1 and Markus Diesmann12

Author Affiliations

1 Computational Neuroscience Group, RIKEN Brain Science Institute, Wako-shi, Saitama, 351-0198 Japan

2 Bernstein Center for Computational Neuroscience, Albert-Ludwigs-University, 79104 Freiburg, Germany

For all author emails, please log on.

BMC Neuroscience 2008, 9(Suppl 1):P69  doi:10.1186/1471-2202-9-S1-P69

The electronic version of this article is the complete one and can be found online at: http://www.biomedcentral.com/1471-2202/9/S1/P69


Published:11 July 2008

© 2008 Potjans et al; licensee BioMed Central Ltd.

Poster presentation

There is considerable interest in establishing a link between system-level learning and synaptic plasticity [1-3]. In a previous study [4] we presented a specific set of biologically plausible synaptic plasticity rules implementing temporal-difference (TD) learning in a spiking neuronal network inspired by the actor-critic architecture [5]. We showed the equivalence between the plasticity rules and the traditional discrete-time TD(0) algorithm and demonstrated that the network learns a complex task with a similar speed to its discrete time counterpart and attains the same equilibrium performance. However, the set of learning rules represents only one possible way in which actor-critic TD learning could be implemented in the brain, and so the model has only limited predictive power for experimental work.

Here, we extract properties of synaptic plasticity rules that suffice to implement actor-critic TD(0) learning, under the assumption that states are represented by elevated rates in disjunct sets of neurons. On this basis we define generalized classes of continuous time synaptic plasticity rules that implement value function and policy updates. The main property is that the amount and sign of the weight update depends on a characteristic change in the activity of the critic module combined with a global reward signal. We present concrete examples belonging to the defined class and demonstrate that they are able to solve a non-trivial task. We further analyze to what extent the defined class of plasticity rules are compatible with experimental findings of synaptic plasticity [6,7].

Acknowledgements

Partially funded by DIP F1.2, BMBF Grant 01GQ0420 to the Bernstein Center for Computational Neuroscience Freiburg, and EU Grant 15879 (FACETS).

References

  1. Izhikevich EM: Solving the distal reward problem through linkage of STDP and dopamine signaling.

    Cerebral Cortex 2007, 17(10):2443-2452. PubMed Abstract | Publisher Full Text OpenURL

  2. Baras D, Meir R: Reinforcement Learning, Spike-Time-Dependent Plasticity, and the BCM Rule.

    Neural Computation 2007, 19:2245-2279. PubMed Abstract | Publisher Full Text OpenURL

  3. Florian RV: Reinforcement learning Through Modulation of Spike-Timing – Dependent Synaptic Plasticity.

    Neural Computation 2007, 19:1468-1502. PubMed Abstract | Publisher Full Text OpenURL

  4. Potjans W, Morrison A, Diesmann M: A spiking neural network model for the actor-critic temporal-difference learning algorithm.

    342.6. 37th SFN meeting, San Diego, USA OpenURL

  5. Sutton RS, Barto AG: Reinforcement learning, An Introduction. The MIT press; 1998. OpenURL

  6. Kirkwood A, Rioult MG, Bear MF: Experience-dependent modification of synaptic plasticity in visual cortex.

    Nature 1996, 381:526-528. PubMed Abstract | Publisher Full Text OpenURL

  7. Reynolds JNJ, Wickens JR: Dopamine-dependent plasticity of corticostriatal synapses.

    Neural Networks 2002, 15:507-521. PubMed Abstract | Publisher Full Text OpenURL