An abstract model of the basal ganglia, reward learning and action selection

Berthet, Pierre; Lansner, Anders

doi:10.1186/1471-2202-12-S1-P189

Volume 12 Supplement 1

Twentieth Annual Computational Neuroscience Meeting: CNS*2011

Poster presentation
Open access
Published: 18 July 2011

An abstract model of the basal ganglia, reward learning and action selection

Pierre Berthet^1,2 &
Anders Lansner^1,2,3

BMC Neuroscience volume 12, Article number: P189 (2011) Cite this article

1724 Accesses
Metrics details

Learning is of major interest for various fields, from human pathologies to artificial intelligence. Animal and biology experiments have provided a great amount of data in this area, especially in classical and operant conditioning. Reinforcement learning is largely used in computational models in order to reproduce and explain these observations. This method enables to model a wide range of phenomena from the neuronal level to the behavior of a whole organism. Studying how information about reward or punishment is processed in the brain is crucial in the understanding of the action selection and decision-making in normal and pathological conditions. Basal Ganglia are a group of nuclei playing a major role in processing motor, associative and limbic information [1] and could be specialized in resolving conflicts between these sub-systems that compete for access to limited cognitive ressources [2]. Dopaminergic neurons activity is believed to be related to the reward prediction error and is involved in long term potentiation (LTP) and depression (LTD) in the striatum [3].

We present here an abstract computational model of the Basal Ganglia using reinforcement learning and Bayesian inference. It is based on a dual pathway architecture similar to the direct / Go and indirect / NoGo pathways that have been found in biology [4]. The units could be seen as grandmother cells, with inputs consisting of states and outputs being actions. Thus as a result of the activity in both pathways, an action will be selected based on the state and the previous outcomes of the different actions when dealing with the same state, that is if a reward has been obtained or not. One aim of our model is to be biologically plausible, weights can be updated via a triple activation Hebbian learning rule, similar to pre-synaptic activity, post-synaptic depolarisation and dopamine level [5]. The update equation is based on the probability of co-activity of the different active units. In its current form, the model uses trace activation and a simple delay mechanism. Basically, when a reward occurs, the weight between the active units is increased in the Go projection while it is decreased in the NoGo connection.

In a one to one mapping learning scheme (only one specific action for a given state triggers a reward), the simulation shows good results in both learning and re-learning, i.e. the mapping is then shuffled.

One interesting feature is the homeostasis property of the units: the variations of the sum of the outgoing and ingoing weights and bias, of a particular unit, are very small. In constrained set up (low numbers of possible actions and states), similar to experimental design, learning in stochastic reward occurrence is handled and the results are similar to the data. The response of midbrain dopamine neurons is positively correlated with the number of unrewarded trials [6] and our model produces a similar result with the prediction-error value. Future works will focus on reproducing conditioning phenomena and implementing spiking neurons.

References

Mink JW: The basal ganglia: focused selection and inhibition of competing motor programs. Progress in neurobiology. 1996, 50 (4): 381-425. 10.1016/S0301-0082(96)00042-1.
Article CAS PubMed Google Scholar
Redgrave P, Prescott T, Gurney K: The basal ganglia: a vertebrate solution to the selection problem?. Neuroscience. 1999, 89 (4): 1009-1023. 10.1016/S0306-4522(98)00319-4.
Article CAS PubMed Google Scholar
Cohen MX, Frank MJ: Neurocomputational models of basal ganglia function in learning, memory and choice. Behavioural brain research. 2009, 199 (1): 141-156. 10.1016/j.bbr.2008.09.029.
Article PubMed Central PubMed Google Scholar
Surmeier DJ, Ding J, Day M, Wang Z, Shen W: D1 and D2 dopamine-receptor modulation of striatal glutamatergic signaling in striatal medium spiny neurons. Trends in neurosciences. 2007, 30 (5): 228-235. 10.1016/j.tins.2007.03.008.
Article CAS PubMed Google Scholar
Reynolds JNJ, Wickens JR: Dopamine-dependent plasticity of corticostriatal synapses. Neural Networks. 2002, 15 (4-6): 507-521. 10.1016/S0893-6080(02)00045-X.
Article PubMed Google Scholar
Nakahara H, Itoh H, Kawagoe R, Takikawa Y, Hikosaka O: Dopamine neurons can represent context-dependent prediction error. Neuron. 2004, 41 (2): 269-280. 10.1016/S0896-6273(03)00869-9.
Article CAS PubMed Google Scholar

Download references

Acknowledgements

FACETS-ITN 237955, Swedish Scientific Counsel

Author information

Authors and Affiliations

Department of Numerical Analysis and Computer Science, Stockholm University, Stockholm, 106 91, Sweden
Pierre Berthet & Anders Lansner
Stockholm Brain Institute, Karolinska Institutet Stockholm, 171 77, Sweden
Pierre Berthet & Anders Lansner
Department of Computational Biology, Royal Institute of Technology (KTH), Stockholm, 106 91, Sweden
Anders Lansner

Authors

Pierre Berthet
View author publications
You can also search for this author in PubMed Google Scholar
Anders Lansner
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Pierre Berthet.

Rights and permissions

This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Berthet, P., Lansner, A. An abstract model of the basal ganglia, reward learning and action selection. BMC Neurosci 12 (Suppl 1), P189 (2011). https://doi.org/10.1186/1471-2202-12-S1-P189

Download citation

Published: 18 July 2011
DOI: https://doi.org/10.1186/1471-2202-12-S1-P189

Twentieth Annual Computational Neuroscience Meeting: CNS*2011

An abstract model of the basal ganglia, reward learning and action selection

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

BMC Neuroscience

Contact us

Twentieth Annual Computational Neuroscience Meeting: CNS*2011

An abstract model of the basal ganglia, reward learning and action selection

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

BMC Neuroscience

Contact us