Perceptual discrimination may be interpreted as a decision between alternatives based on available sensory evidence. In many experiments, the different alternatives are encoded by quite distinct neuronal groups. In this case, proposed neural models consider that the decision results from the competition between decision-specific neuronal groups, each of these integrating distinct sensory evidence . Alternatively, evidence may be presented in a sequential manner, and the different stimuli may be encoded by the same neuronal group, as exemplified by experiments where monkeys are engaged in a vibrotactile discrimination task . To achieve discrimination in this case, the nervous system needs to keep a trace of the previously presented stimuli. How the correct discrimination can be learned and implemented is poorly understood.
To address these questions, we use a modeling approach. We concentrate on the particular case of the vibrotactile discrimination task on which experimental insight has been accumulated over the last years . The partial differential (PD) neurons in monkey area VPC, encoding both sequentially presented vibrotactile stimuli (with frequencies f1 and f2) by keeping the memory of the first one during a delay period, have been reproduced in a spiking neuron network model with short-term facilitating synapses . We want to explore how these PD neurons may be used to discriminate between both stimuli configurations: f1 > f2 or f1 < f2. Based on the experimental evidence, we model a heterogeneous PD neuronal population, encoding both frequencies in multiple ways. Downstream to the first network, we add a competition-based decision making spiking neuron network . To obtain the desired neural dynamics of the coupled two-networks model, we choose the model parameters using a simplified mean-field description of the model. To make the best possible decisions, the strengths of the synapses projecting from the PD neurons to the decision neurons must be learned. Based on reinforcement learning theory, we use a learning rule which maximizes reward. It depends on a reward prediction error, evaluated using the reward history. We instantiate this rule using a reward based spike-timing-dependent plasticity . Learning takes place after the second stimulus presentation. We find that the task can be efficiently learned for any number of PD neuron types, even when their stimulus encoding function is nonlinear and noisy.
In conclusion, the present biologically plausible two-networks model is able to solve a perceptual discrimination task under sequential sensory evidence.
This work has been supported by the European Community's Seventh Framework Program (grant agreement 269921, BrainScaleS).