Material below summarizes the article The Temporal Dynamics of Scene Processing: A Multifaceted EEG Investigation, published on September 12, 2016, in eNeuro and authored by Assaf Harel, Iris I. A. Groen, Dwight J. Kravitz, Leon Y. Deouell, and Chris I. Baker.
Real-world scenes are highly complex, cluttered, and heterogeneous visual stimuli. For example, when we look at a busy street scene, we are faced with multiple types of information: different perceptual cues (such as depth and texture), information about the objects in the scene (such as to what class they belong to and their spatial arrangement), and information about the spatial layout of the scene (the global arrangement of its large-scale elements).
In spite of this computational complexity, human observers recognize scenes rapidly and with great ease. Oftentimes, just brief a glance is sufficient for successful recognition. But, how do we achieve this? How does our brain distinguish scenes from other types of complex visual stimuli? Does our visual system contain specialized neural mechanisms for recognizing scenes?
Neuroimaging studies have established that the ability to recognize scenes is supported by a network of dedicated brain regions which represent various types of scene information. While we have a growing understanding of how this scene-selective network operates and the kind of information its different regions compute, there is currently little understanding of how these neural processes unfold over time.
To overcome this, we conducted an Event-Related Potential (ERP) study aimed at uncovering the time course of scene processing. ERP is a form of electro-encephalography (EEG) analysis which can be used to determine how cortical processes evoked by specific sensory or motor events unfold over time. Its main advantage is its high temporal precision, with an almost millisecond-by-millisecond resolution.
We used ERP in our study to answer two outstanding questions. First, how early does our brain distinguish scenes from other complex non-scene categories? Second, when do global scene properties, known to facilitate scene recognition, get processed? In other words, how early can we see diagnostic scene information being utilized?
We conducted two experiments recording ERPs from healthy volunteers while they viewed images of scenes and other complex visual categories. In the first experiment, we presented the participants with images of scenes, faces, and everyday objects. Each of these categories spanned a range of images from several subcategories (the face category included male and female, Asian, and Caucasian faces), to ensure that any potential differences in scalp-recorded electrical activity would not reflect a narrow choice of stimuli. We inspected the ERP waveform following the presentation of the images, searching for an ERP component that would exhibit a higher response to scenes relative to both faces and objects. We found such a component, peaking at 220 milliseconds after stimulus onset. This positive going ERP component is known as the visual P2, with an amplitude that is most pronounced at posterior electrode sites.
Interestingly, the P2 trails a more well-known ERP component: the face-selective N170. The N170 showed an opposite pattern relative to the P2, with highest response to faces relative to either scenes or objects. The results of our first experiment suggested that P2 might serve as an ERP correlate of scene-sensitive processing. To further test this idea, we examined the extent to which P2 amplitude will vary as a function of the type of scenes participants view. We recorded ERPs from the same participants in a second experiment where they viewed a different set of scene images, which were diverse naturalistic scene images that spanned three global scene properties previously shown to be essential for scene categorization: spatial expanse (open/closed), naturalness (manmade/natural), and distance (near/far). Our rationale was that if the 220ms time window is essential for high-level scene perception, then the peak amplitude of the P2 should vary according to variations in these global scene properties.
Consistent with our hypothesis, we found that the amplitude of the P2 component is sensitive to global scene properties. This is based on two independent analyses we have conducted. Our first analysis was a standard ERP analysis, averaging across the specific exemplars and categories of scenes corresponding the eight combinations of scene properties. This analysis revealed that the P2 amplitude varied as a function of both the naturalness of the scene and its spatial expanse: P2 amplitude was higher in response to closed scenes relative to open scenes, and this effect was restricted to the natural scenes and was not evident in the manmade scenes. P2 amplitude was higher in response to natural than to manmade scenes irrespective of the spatial expanse of the scene. The amplitude of the N1 component was also modulated by naturalness, implying that naturalness might be processed earlier than spatial expanse.
Increasing the granularity of our investigation, in the second analysis we examined the extent to which P2 is sensitive to diagnostic scene information conveyed by single images. This analysis complemented the standard ERP analysis, which due to the averaging involved, glosses over differences between individual images. Using summary image statistics, we quantified the distribution of contrast and spatial frequency information of each image, and then ran a multilinear regression analysis of the single image ERP amplitude on these image statistics. We found that both contrast and spatial frequency image statistics could predict the P2 amplitude elicited by single scene images.
Critically, the variance explained by these factors was partly shared with behavioral ratings of naturalness and spatial expanse of the scenes, highlighting that diagnostic scene properties found to impact the ERPs at the categorical level can be mapped onto natural image statistics.
In summary, we have identified in two experiments the visual P2 ERP component as the earliest marker of scene selectivity, peaking 220 milliseconds post-stimulus onset. The scene selective P2 effect reflects the processing of diagnostic scene information as we found it to be modulated by two global scene properties: spatial expanse and naturalness.
Further, image statistics diagnostic of the global scene properties and behavioral ratings of individual images were predictive of the P2 response. Together, these results suggest that higher order scene properties become maximally represented around 220ms after stimulus onset, and establish the P2 as an ERP marker for scene processing.
The Temporal Dynamics of Scene Processing: A Multifaceted EEG Investigation. Assaf Harel, Iris I. A. Groen, Dwight J. Kravitz, Leon Y. Deouell, Chris I. Baker. eNeuro Sep 2016 DOI: 10.1523/ENEURO.0139-16.2016