The representation of acoustic stimuli on the level of the brainstem forms the basis for further auditory processing. While some simple characteristics of this representation are widely accepted, it remains a challenge to predict the firing rate at high temporal resolution in response to arbitrary stimuli. Such predictive models would be helpful tools for further investigations, in particular sound localization. Devising a model involves several choices: the stimulus representation, the modeling framework, and the performance measure. In this study we explore these choices for single cell responses from the medial nucleus of the trapezoid body (MNTB), which constitute a well-identifiable and homogeneous neuronal population. Detailed models of MNTB responses have not been studied before. We estimate a recently introduced family of models, the multilinear models (, Figure 1), which encompass the classical spectrotemporal receptive field (STRF) and allows arbitrary input nonlinearities and certain multiplicative time-frequency interactions. To reliably quantify the explained variance for noisy responses, we use the predictive power  as performance measure. We find that nonlinear models and a cochlear-like (gamma-tone) stimulus representation lead to significant improvements in predictive power. On average, 75% of the explainable variance can be predicted. Since the models deliver faithful predictions, a meaningful interpretation of the estimated model structures becomes possible. Including multiplicative interactions strongly reduce the inhibitory fields in the linear kernels. Together with their spectrotemporal location, this suggests cochlear suppression as their source. Similar improvements in predictive power are obtained for input and output-nonlinearities, with best performance for the combination of both. In conclusion, the context model provides a rich and still interpretable extension over other nonparametric models for modeling responses in the MNTB.
Figure 1. Schematic overview of estimated models. An acoustic stimulus is created from broadband amplitude modulations. Three spectrotemporal representations of the sound are used as input for the following models: First, a multilinear model (dimensions: time, frequency, level) is estimated, e.g. a STRF, an input nonlinearity model (IN+STRF) or a Context model. Second, an estimated output linearity rescales the multilinear predicton to the final firing rate prediction. We compare the performance contributed by the individual parts.