Direct reinforcement learning, spike time dependent plasticity and the BCM rule

Barash, Dorit; Meir, Ron

doi:10.1186/1471-2202-8-S2-P197

Volume 8 Supplement 2

Sixteenth Annual Computational Neuroscience Meeting: CNS*2007

Poster presentation
Open access
Published: 06 July 2007

Direct reinforcement learning, spike time dependent plasticity and the BCM rule

Dorit Barash¹ &
Ron Meir²

BMC Neuroscience volume 8, Article number: P197 (2007) Cite this article

1999 Accesses
1 Citations
Metrics details

Learning agents, whether natural or artificial, must update their internal parameters in order to improve their behavior over time. In reinforcement learning, this plasticity is influenced by an environmental signal, termed a reward, which directs the changes in appropriate directions. We model a network of spiking neurons as a Partially Observed Markov Decision Process (POMDP) and apply a recently introduced policy learning algorithm from Machine Learning to the network [1]. Based on computing a stochastic gradient approximation of the average reward, we derive a plasticity rule falling in the class of Spike Time Dependent Plasticity (STDP) rules, which ensures convergence to a local maximum of the average reward. The approach is applicable to a broad class of neuronal models, including the Hodgkin-Huxley model. The obtained update rule is based on the correlation between the reward signal and local data available at the synaptic site. This data depends on local activity (e.g., pre and post synaptic spikes) and requires mechanisms that are available at the cellular level. Simulations on several toy problems demonstrate the utility of the approach. Like most stochastic gradient based methods, the convergence rate is slow, even though the percentage of convergence to global maxima is high. Additionally, through statistical analysis we show that the synaptic plasticity rule established is closely related to the widely used BCM rule [2], for which good biological evidence exists. The relation to the BCM rule captures the nature of the relation between pre and post synaptic spiking rates, and in particular the self-regularizing nature of the BCM rule. Compared to previous work in this field, our model is more realistic than the one used in [3], and the derivation of the update rule applies to a broad class of voltage based neuronal models, eliminating some of the additional statistical assumptions required in [4]. Finally, the connection between Reinforcement Learning and the BCM rule is, to the best of our knowledge, new.

References

Baxter J, Bartlett PL: Infinite-horizon policy-gradient estimation. Journal of Artificial Intelligence Research. 2001, 15: 319-350. 10.1016/S0954-1810(01)00028-0.
Google Scholar
Bienenstock EL, Cooper LN, Munro PW: Theory for the development of neuron selectivity: Orientation specificity and binocular interaction in visual cortex. J Neurosci. 1982, 2 (1): 32-48.
PubMed CAS Google Scholar
Bartlett PL, Baxter J: Hebbian synaptic modifications in spiking neurons that learn. Technical report, Reasearch School of Information Sciences and Engineering. 1999, Australian National University
Google Scholar
Xie X, Seung HS: Learning in neural networks by reinforcement of irregular spiking. Physical Review E. 2004, 69: 041909-10.1103/PhysRevE.69.041909.
Article Google Scholar

Download references

Author information

Authors and Affiliations

IBM Haifa Research Lab, Mount Carmel, Haifa, 31905, Israel
Dorit Barash
Department of Electrical Engineering, Technion, Haifa, 32000, Israel
Ron Meir

Authors

Dorit Barash
View author publications
You can also search for this author in PubMed Google Scholar
Ron Meir
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ron Meir.

Rights and permissions

Open Access This article is published under license to BioMed Central Ltd. This is an Open Access article is distributed under the terms of the Creative Commons Attribution 2.0 International License (https://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Barash, D., Meir, R. Direct reinforcement learning, spike time dependent plasticity and the BCM rule. BMC Neurosci 8 (Suppl 2), P197 (2007). https://doi.org/10.1186/1471-2202-8-S2-P197

Download citation

Published: 06 July 2007
DOI: https://doi.org/10.1186/1471-2202-8-S2-P197

Sixteenth Annual Computational Neuroscience Meeting: CNS*2007

Direct reinforcement learning, spike time dependent plasticity and the BCM rule

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

BMC Neuroscience

Contact us

Sixteenth Annual Computational Neuroscience Meeting: CNS*2007

Direct reinforcement learning, spike time dependent plasticity and the BCM rule

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

BMC Neuroscience

Contact us