Material below summarizes the article Cortical Transformation of Spatial Processing for Solving the Cocktail Party Problem: A Computational Model, published on January 13, 2016, in eNeuro and authored by Junzi Dong, H. Steven Colburn, and Kamal Sen.
The problem of following a speaker’s voice in the presence of others, the Cocktail Party Problem (CPP), remains a focus of intensive research in a diverse range of fields, including neuroscience, computer science, and speech recognition, more than 50 years after it was named. Although a difficult problem for machines to solve, humans with normal hearing solve it with relative ease, indicating a solution exists somewhere in the brain. Our eNeuro paper proposes a model cortical network for solving this problem based on recent physiological data.
Like humans, many animals are capable of listening to a single sound source in a mixture of sources. Thus, neural circuits for solving the CPP also likely exist in animals. The CPP is a highly complex problem with many dimensions. For example, the auditory stimulus consists of sound mixtures, the sounds can come from different spatial locations, and the listener can selectively pay attention to specific sounds.
Such complexity leads to the need for integrative processing that combines information across spectral and spatial dimensions. This suggests an important role for auditory cortex in solving the CPP, based on its position in the auditory hierarchy. Thus, studies of cortical processing of sounds under CPP-like conditions are likely to elucidate the biological solution to the CPP. Currently, data on cortical mechanisms underlying the CPP are very limited.
We discovered auditory neurons at the cortical level with surprising spatial response properties that are well suited for segregating target and masker sounds from different spatial locations. We found that auditory cortical neurons are broadly tuned to single “target” sound sources from different spatial locations. However, when the target is presented at the same time as a competing “masker” from a different location, cortical neurons sharpen their spatial tuning. Specifically, quantifying neural discrimination performance over a spatial grid of all possible combinations of target and masker locations revealed “hotspots” of high performance at particular positions on the spatial grid.
In our recent modelling study, we constructed a multi-layer network model of auditory cortex to replicate the experimentally observed responses of cortical neurons. In this network, spatially selective input “channels” are activated when auditory stimuli are presented at particular spatial locations. These input channels excite downstream channels of relay neurons and inhibitory inter-neurons, which inherit their spatial tuning. Excitatory relay neurons across spatial channels converge to excite the cortical neuron, making it broadly tuned to single sound sources from all directions. In the presence of multiple sound sources, inhibition from inter-neurons across spatial channels allows the target response to suppress the masker response, generating hotspots of performance, which can be controlled by the pattern of cross-channel inhibitory connections. Thus, the network model proposes specific cortical circuitry underlying a key component of the solution to the CPP.
The model makes specific predictions, which can be tested experimentally. The model can also be extended to design cortically-inspired engineering solutions that may improve the ability of hearing assistive devices, such as hearing aids and cochlear implants, for coping with the CPP.
Cortical Transformation of Spatial Processing for Solving the Cocktail Party Problem: A Computational Model. Junzi Dong, H. Steven Colburn, Kamal Sen. eNeuro Jan 2016, 3 (1) DOI: 10.1523/ENEURO.0086-15.2015