Sparse coding and relevant theories have well explained the response properties of visual system  and early stages of auditory system, suggesting their adaptation to the statistics of natural stimuli. The present study continues this line of research in a higher layer of auditory system, specifically focusing on the harmonic structure in natural sounds. Harmony is often found in behaviourally important sounds like animal vocalization, and should be captured in higher stages of auditory system, since a peripheral auditory neuron typically responds to only one frequency component. Indeed, neurophysiological studies have revealed that monkey primary auditory cortex (A1) contains neurons with response properties related to harmony: some A1 neurons are activated or modulated by multiple frequencies, which are often harmonically related to each other or in ratio of simple integers .
We hypothesize that such harmonic relations emerge from sparse coding of harmonic natural sound. In order to test this, we first learn ‘basis’ spectra by applying the sparse coding model and algorithm in  to the frequency-domain representation of a sound source, which is an approximation of the output from peripheral auditory system. Next, we regard the obtained bases as model A1 neurons, and then investigate how in the model they respond to pure- and two-tone stimuli, following the experimental scheme used in ; the bases can be classified into two groups (‘single-peaked’ and ‘multi-peaked’) as in the experiment, and moreover, we find a similar harmonic tendency in the ratio distributions of their response and modulation peak frequencies. A comparison using three sound sources with different degrees of harmony shows that the level of the harmonic tendency correlates with the harmonic degree of the sound source used for learning, and it is notable that the level with a highly harmonic sound, namely piano performance, is quantitatively comparable to that of the A1 neurons. Considering together an absence of such relations with a non-harmonic sound source, we conclude that the harmonic relations emerge as a reflection of the harmonic structure in input sound.
This result suggests that the sparse coding still prevails in higher stages of sensory systems, supporting an idea of the adaptation to behaviourally important stimuli. Furthermore, the modulatory behaviours can be explained by divisive interaction between model neurons that partially share their harmonic structures. We propose physiological experiments to confirm such interaction and predict their results.