We recently proposed that short-latency, sensory-evoked dopamine release is critical for learning action-outcome causality . If an action causes an unexpected outcome associated with a phasic visual event, there will be a phasic burst of dopamine in the striatum. Subsequent reinforcement of the striatal response to the cortical representation of the action then makes the selection of the action (and its outcome) more likely; i.e. there is "repetition biasing" of action selection. This, in turn, facilitates associative learning of the action-outcome pairing elsewhere in the brain. Here, we present a model of cortico-striatal plasticity in medium spiny neurons (MSNs) that could form the basis for a quantitative account of action-outcome learning in basal ganglia.
We used an Izhikevich-style spiking model MSN with 200 synapses. We constructed new cortico-striatal learning rules based on a recent in vitro study by Shen et al . This study provided, for the first time, comprehensive data on MSN plasticity in the D1 and D2 receptor-dominated MSN subpopulations. For each population (D1/D2) we ascribed STDP-like kernel "templates" to very high and low levels of dopamine in a manner consistent with the data. At intermediate levels of dopamine, the kernels were formed from a linear superposition of these templates. These kernels then gave rise to an eligibility trace for learning that induced plasticity in the presence of subsequent delivery of dopamine . We refer to this mechanism as spike-timing dependent eligibility: STDE. We then mimicked the cortical and dopaminergic signals that an MSN might see during action-outcome learning. Each cortical input comprised 50 highly active afferents with others at a background rate. The selection of the active afferents was fixed for the causal action, and chosen randomly for other actions at each trial (see Figure 1a).
Figure 1. a. MSN spike count in response to causal, and other, actions over trials. The phasic DA dip occurs after assigning a noxious value to the outcome of the causal action. b. Synaptic conductances at key trials. Only the first 50 synapses contribute to the action representation.
When phasic dopamine is elicited by the causal action, it induced a rapid increase in MSN response that would be the foundation for inducing repetition bias of action selection. Further, the MSN has become receptive to the action request through synaptic pattern matching (Fig 1b top panel). Subsequent trials induce more selective synaptic patterning (reduced response at trial 1200). Dopamine dips (caused by assigning a noxious value to the outcome) induce a reduction in response. We conclude that the recently discovered complex dopamine-receptor dependent forms of STDP  can lead to cortico-striatal plasticity that can support action-outcome learning.
This work was part funded by EPSRC grant EP/C516303/1