INSERM, Cognitive Neuroimaging Unit, Gif sur Yvette 91191, France

Commissariat à l’Energie Atomique, Direction des Sciences du Vivant, I2BM, NeuroSpin center, Gif sur Yvette 91191, France

Université Paris-Sud 11, Orsay 91405, France

Collège de France, 11 Place Marcelin Berthelot, Paris 75005, France

Abstract

Background

Multi-sensor technologies such as EEG, MEG, and ECoG result in high-dimensional data sets. Given the high temporal resolution of such techniques, scientific questions very often focus on the time-course of an experimental effect. In many studies, researchers focus on a single sensor or the average over a subset of sensors covering a “region of interest” (ROI). However, single-sensor or ROI analyses ignore the fact that the spatial focus of activity is constantly changing, and fail to make full use of the information distributed over the sensor array.

Methods

We describe a technique that exploits the optimality and simplicity of matched spatial filters in order to reduce experimental effects in multivariate time series data to a single time course. Each (multi-sensor) time sample of each trial is replaced with its projection onto a spatial filter that is matched to an observed experimental effect, estimated from the remaining trials (Effect-Matched Spatial filtering, or EMS filtering). The resulting set of time courses (one per trial) can be used to reveal the temporal evolution of an experimental effect, which distinguishes this approach from techniques that reveal the temporal evolution of an anatomical source or region of interest.

Results

We illustrate the technique with data from a dual-task experiment and use it to track the temporal evolution of brain activity during the psychological refractory period. We demonstrate its effectiveness in separating the means of two experimental conditions, and in significantly improving the signal-to-noise ratio at the single-trial level. It is fast to compute and results in readily-interpretable time courses and topographies. The technique can be applied to any data-analysis question that can be posed independently at each sensor, and we provide one example, using linear regression, that highlights the versatility of the technique.

Conclusion

The approach described here combines established techniques in a way that strikes a balance between power, simplicity, speed of processing, and interpretability. We have used it to provide a direct view of parallel and serial processes in the human brain that previously could only be measured indirectly. An implementation of the technique in MatLab is freely available via the internet.

Background

Many techniques for the measurement of neural activity result in multivariate time series. Methods such as electroencephalography (EEG), magnetoencephalography (MEG), electro-corticography (ECoG), and near-infrared spectroscopy (NIRS) may involve tens or even hundreds of sensors. Although all of these methods have some degree of spatial selectivity, there is also a significant amount of redundancy between different sensors: any given experimental effect, no matter how well localized, will normally appear in more than one sensor. With many sensors and potentially many possible experimental effects and interactions, one is confronted with the question of which sensors to subject to analysis and how to choose them. This is paramount in analyses where one is specifically interested in the within-trial time course of an experimental effect.

Perhaps the most widely-used approach is to select a single sensor or take the average over a contiguous cluster of sensors – a "region of interest" or ROI. Although the ROI approach is simple and readily interpretable, it does not take into account the distribution of activity across the sensor array. Nor does it account for situations where a given experimental effect appears in two or more non-contiguous regions, with potentially opposite signs. Thus many sensors that are sensitive to a given experimental effect may be left out of the ROI, and the sensors that are included in the ROI are all treated equally even though some may carry much more signal than others.

An ROI is a special case of a linear spatial weighting applied to the sensors

In signal processing a common technique for detecting the presence of a known signal,

A given spatial filter can only capture the time course of activity from a single fixed vantage point. Experimental effects, on the other hand, almost never have the same distribution across the sensor array throughout the time course of a trial epoch. The sensor(s) that most strongly exhibit the effect will change over time, reflecting the spatio-temporal evolution of the underlying activity in the brain. In order to examine the time course of an experimental effect, the vantage point (i.e. the weighting applied to the sensors) has to change over time (see Figure

Illustrative comparison of region of interest (ROI) and EMS filtering analyses applied to a representative subject

**Illustrative comparison of region of interest (ROI) and EMS filtering analyses applied to a representative subject.** The mean time course of an ROI **(A)**, fixed spatial filter **(B)**, and evolving spatial filter **(C)** for the difference between the **(B)** was derived using a stationary filter computed over the data in a specific time window (EMS**(A)**, the spatial filter is identical at all time points, but that unlike **(A)**, the spatial filter is continuous-valued rather than discrete. The time course in **(C)** was derived from the output of canonical EMS filtering. The dashed line shows the time course of the stationary template used in panel **B ****(**for visual comparison with panel **B)**. Note in panel **C** that the spatial filters are continuous-valued and are also changing across time in the epoch (the spatial filter is computed independently at each time point in the epoch according to the objective function, which in this case is simply the difference between the

Here we propose a simple, powerful, and versatile technique that uses matched spatial filters^{a} in order to reduce epoched multi-sensor data to a single time course that tracks a given experimental effect across the timespan of each trial. Rather than being defined a-priori, the filters are estimated directly from the data itself, independently at each time point – hence the name “

Because the method is driven by both the data and the data analysis question, and is applied separately at each time point, the resulting time series can be thought of as a functional reconstruction: instead of attempting to reconstruct the time course of an anatomical source, we reconstruct the time course of an experimental or behavioral effect (whose anatomical generators may change across time) in the original units (e.g. micro-volts or femto-tesla). We validate the method using MEG data from a previously-published study

Use of EMS filtering to measure serial and parallel brain processes

We tested

We used EMS filtering to examine the precise relation between the duration of the sensory stage, latency of the central stage, and the duration of T1 processing, at the single-trial level, and to reconstruct T2-related main effects in the time-locked average. We demonstrate the efficacy of EMS filtering in improving the signal to noise ratio of individual trials and in revealing the time course of an experimental effect (the difference between two conditions) in the time-locked average. We also use EMS filtering to test specific predictions derived from the router model

Methods

Applying a spatial filter: signal projection

Throughout the manuscript we use the word “projection” in the sense of “orthogonal projection onto a line”, which is equivalent to taking the dot product of two vectors,

Effect-matched spatial filtering

When analyzing multivariate signal data, the goal is most often to address a particular scientific question, for example “what is the time course of the experimental effect (condition A) with respect to that of the control condition (condition B) in my data?” This question could apply to EEG, MEG, ECoG, or other multi-sensor data. One can estimate the effect across time in the trial epoch by taking the mean (across trials) of the data belonging to condition A, the mean (across trials) of the data belonging to condition B, and then computing the difference between them. This has to be done separately for each sensor, resulting in as many time courses as there are sensors, with the experimental result (if there is one) distributed among them. We want to summarize each trial’s data (a

The canonical matched-filtering approach would be to start with a

We also want the resulting time course to “follow” the experimental effect across time. Hence the procedure is repeated separately and indpendently for each sample in the trial resulting in a

In order to avoid circularity in the procedure ^{b} and Additional file

**Projection onto the difference between two means with LOO cross--- validation on guassian random data.**

Click here for file

EMS filtering improves the quality of single-trial time courses

**EMS filtering improves the quality of single-trial time courses.** Panel **(A)** presents data from a representative subject, comparing the ROI method (left) to the EMS filtering method (right). There are two columns of two raster plots, with time on the horizontal and trial # on the vertical. Amplitude is coded in color, going from blue (negative) to green (zero) to red (positive), and the color axis is scaled to the minimum and maximum amplitude in each data matrix. Below each pair of raster plots (top: **(B)** shows the estimated mean signal-to-noise ratio as a function of the number of trials averaged together, for the ROI (blue) and EMS filtering (green) methods. Panel **(C)** shows the mean performance (10 subjects) of a univariate Gaussian naïve-Bayes (GNB) classifier tested on the output of a nested EMS filtering procedure (green; see methods). The performance of a GNB classifier applied to the mean over the ROI (blue) and the performance of a linear support-vector machine (red) are shown for comparison. Notice that the performance of a univariate decision rule (GNB) applied to the output of EMS filtering is comparable to the performance of a multivariate linear SVM applied to the original sensor data.

EMS filtering algorithm

(For a formal mathematical description of EMS filtering, see Appendix A).

Consider the data-analysis question posed above: “what is the time course of the experimental effect (condition A) with respect to that of the control condition (condition B) in my data?” We want to summarize this kind of experimental result in a single time course, using a weighted combination of the sensors (spatial filter) that yields a higher signal-to-noise ratio than any single sensor. We also want the resulting time course to “follow” the experimental effect across time.

The procedure is simple (the two steps below are applied separately and independently at each time point). Let ⟨χ⟩ denote the mean over the elements of the vector χ. Let χ_{A} be a vector of measurements at a single channel and single time point, for all trials belonging to condition _{B}. Let χ^{(k)} be a vector of measurements at a single channel and single time point, for all trials ^{th} trial left out if ^{th} trial). Then

1. For each trial,

a. Compute 〈_{
A
}
^{(k)}〉 - 〈_{
B
}
^{(k)}〉 at each channel and treat the resulting vector,

b. Set

c. Use

2. Compute ⟨S_{
A
}⟩•⟨S_{
B
}⟩ over the resulting surrogate values.

The results (one for each time point in the trial epoch), when strung together in temporal order, yield a single time course that gives an answer to the data analysis question, in this case “what is the time course of the experimental effect (condition A) with respect to that of the control condition (condition B) in my data?” The entire set of surrogate values belonging to a single trial is referred to as a

Although the procedure above is defined in an iterative, leave-one-out (LOO) fashion, note that a leave-one-out-per-condition (LOOPC) procedure can be used if there are an equal number of trials in each condition. Group-level analyses can be applied to the results from each subject using standard methods. The algorithm can also be applied across subjects, where ⟨χ_{
A
}⟩ and ⟨_{B}⟩ are pre-computed separately for each subject, in which case a leave-one-

In order to generalize to data-analysis questions other than the example given above (the difference between the means of two experimental conditions), we refer to the computational operation corresponding to the data analysis question (e.g. ⟨χ_{
A
}⟩ - ⟨χ_{B}⟩) as the _{
A
}⟩ - ⟨χ_{B}⟩ in step 1 above is replaced by

For some purposes one might want to compute a single spatial filter and apply the same filter at all time points in the trial epoch, and this is also supported by our toolbox. By doing so it is possible to examine the time course of a specific stationary topography. For instance, if one was interested in the time course of the P300 component of the ERP in EEG data, one could compute a single topography over a time window centered on the peak of the P300. Applying the topography as a spatial filter would result in a single time course highlighting the onset, peak, and cutoff of that specific spatial pattern. Another scenario where one might use EMS filtering in this way would be to estimate the “readiness potential” (RP)

Non-independence of the surrogate time courses

Although each trial is independent of the spatial filter onto which it is projected, the spatial filters themselves (one per trial) are not independent of one another since they are all based on nearly the same set of trials. Therefore, the resulting surrogate time courses are not independent of one another. However, if they are grouped by condition and averaged together for each subject, then the resulting averages for each subject

Applying EMS filtering to test predictions regarding the psychological refractory period

**Applying EMS filtering to test predictions regarding the psychological refractory period.** Data presented are for Seen and Unseen trials from the Lag 1 condition sorted according to RT1 and time locked to T2. The spatial filter used in the analysis was computed by subtracting Lag 1 (seen) and Control conditions. Each panel represents trials as a function of time sorted according to the speed of RT1, after using an evolving spatial filter **(A)**, or stationary spatial filters computed at specific latencies: the averaged amplitude between 200–300 ms **(B)** or 700–900 ms **(C)** after T2 onset. For subsequent analysis, we refer to these two components as the

Average across subjects for the

**Average across subjects for the ****(M250) and ****(M550) components for lag 1 and control conditions, time locked to T2.** Seen trials were split according to the speed of RT1. Trials where RT1 was smaller than the first quartile was classified as fast (blue line) and trials above the third quartile were classified as slow (red line). Blinked and Control trials are represented in green and black respectively. Over the group, the M250 was time locked to the onset of the stimulus and larger for slow versus fast RT1, while the M550 was delayed for slow versus fast RT1 trials. Bar plots on the right represent the group averaged peak latency and duration of the M250 (upper part) and the M550 (lower part) for seen trials in lag 1 condition. Each bar represent a quartile of RT1 from fast (blue) to slow (red) reaction times. An ANOVA with RT1 quartiles as a within-subject factor revealed a significant effect on the duration of the M250 but not on its latency. It was the opposite for the M550: a significant effect was observed on the latency but not on the duration.

Multiple comparisons

EMS filtering eliminates the need for multiple comparisons corrections across the sensor array. However, it does not eliminate the need to correct for multiple comparisons across time (samples) in the trial epoch. The same practices normally applied in the context of time-locked averaging of a single sensor or ROI are appropriate.

Data set: MEG experiment (Marti et al. 2012)

All details about participants and MEG recordings are reported in Marti et al. (2012). Briefly, ten subjects were included in MEG analyses. All subjects were naïve to the task, had normal or corrected-to-normal vision, and gave written informed consent to participate.

Stimuli and apparatus

All participants performed a dual-task paradigm in which the first target was a monotonic sound presented to both ears and the second was a letter of the alphabet presented visually. The first target (auditory) could be a high pitch (1100 Hz) or a low pitch (1000 Hz) and was presented for 84 ms. The second target (visual) was either the letter "Y" or the letter "Z" presented in black on a white background, (0.64 º of visual angle). The target letter was embedded in a visual stream of 12 random black letters used as distractors. Each letter was presented at the center of the screen for 34 ms with an inter-stimulus interval of 66 ms. The target sound (T1) was always synchronized to the third distractor and followed by the second target (T2) after a variable inter-target lag of 100, 200, 400 or 900 ms. In a fifth condition, T2 was replaced by a non-target letter of the alphabet (“distracter” or “control” condition). Participants were instructed (1) to respond as fast as possible first to the sound and then to the letter, (2) to respond as soon as the corresponding stimulus appeared, thus avoiding "grouped responses", and (3) that the second stimulus would occasionally be absent, in which case they should simply not perform the second task.

The experiment consisted of two training blocks of 20 trials each, one to practice the auditory task and the other one to practice the visual task, followed by 5 experimental blocks. In four of these experimental blocks, participants performed 100 trials of the dual-task and in one block they performed 50 trials of only the visual task while they had to listen passively to the sound (T1-irrelevant condition). Thus, a maximum of 80 trials per inter-target lag were recorded.

MEG recordings and pre-processing

While subjects performed the cognitive tasks, we continuously recorded brain activity using a 306-channel whole-head magnetoencephalography system (Elekta Neuromag®) with 102 gradiometers and 102 pairs of orthogonally oriented planar gradiometers. The subject's head position was measured at the beginning of each run and this information was used during data pre-processing to compensate for differences in head position between runs. Electro-oculogram (EOG) and electrocardiogram (ECG) were recorded simultaneously for offline rejection of eye movements and cardiac artifacts. Signal Space Separation, head movement compensation, and interpolation of bad channels were applied using the MaxFilter Software (Elekta Neuromag®). Epoching, trial rejection, and baseline correction were then applied using the Fieldtrip software package (

Measure of signal-to-noise ratio

To estimate the signal-to-noise ratio (SNR; Figure

Where

_{
S
} is the number of samples in the “signal” interval, _{
N
} is the number of samples in the “noise” interval, and

To estimate the SNR for different numbers of trials, for each

Decoding analysis

In Figure

For each time point in each of the two outer-loop left-out trials, we computed its posterior probability given the mean and variance of the surrogate measures, output from the inner loop, for each of the two categories (

For comparison, we also performed decoding using a linear support vector machine (SVM) with five-fold cross validation (L2 loss function, L2 penalty, penalty parameter (C) = 1). As with the other decoding analyses, we equalized the number of trials in each condition by selecting a random subset of trials from the condition with a greater number of trials. We ran the SVM five times for each subject and averaged these together and pooled the variance (for Figure

Results and discussion

Results

Demonstrating the efficacy of EMS filtering

Advantages of EMS filtering at the single-trial level

Figure

The above conclusion is of course dependent on the choice of ROI, and in particular on how well the ROI targeted the effect of interest. The ROI that we used (see Figure

To further test the quality of single trial data we attempted to classify each time point in each trial as belonging to either the

Advantages of EMS filtering at the group level

The algorithm is not only able to increase the quality of single-trial data but it also has two important advantages at the group level compared to standard subject averaging. First, because the spatial filter evolves across time (Figure

Spatial filters for the same experimental effect are different for individual subjects

**Spatial filters for the same experimental effect are different for individual subjects.** The topography of the spatial filters for each of the 10 subjects used to produce Figure

Revealing brain mechanisms of dual-task interference using EMS filtering

Figure

As can be seen in Figure

Thus, EMS filtering afforded a unique view of the interference between the two tasks: we were able to investigate the precise relation between T1 processing and T2 processing both at the single trial level and at the group level. These results extend those obtained by S Marti, M Sigman and S Dehaene

Performance

Since different software was used for the EMS filtering and SVM analyses (MatLab and Python, respectively), performance comparisons can only be considered descriptive and approximate. All analyses were tested on a Dell Precision T3400 PC (64-bit dual Intel Core-2 Duo CPU E8500 3.16 GHz, 6144 KB cache) running Ubuntu Linux 10.04. All processes were single-threaded, so only one processor was used. In all cases, results are reported for the

Discussion

A number of techniques are available for deriving graded linear combinations of sensors so as to capture distinct sources of variance. Data-driven techniques such as principal component analysis (PCA) and independent component analysis (ICA)

Hypothesis-driven methods, such as the covariance analysis

The metric known as “global field power” (GFP)

In the present study we propose a method capable of revealing the time course of experimental effects at the single-trial level and illustrate its potential by investigating serial and parallel processing in the brain. The algorithm presents several advantages compared to standard methods applied in human electrophysiology: EMS filtering (1) reduces high-dimensional data to one dimension, functionally determined by experimental conditions, (2) increases the SNR for single-trial data, (3) avoids the problem of anatomical variability when averaging across subjects, (4) is optimal in maximizing the difference between the means of two experimental conditions (see Appendix B), and (5) yields weight vectors (spatial filters) that are directly interpretable as topographies over the sensor array.

EMS filtering attenuates trial-specific noise by projecting the data from each trial onto a matrix of spatial filters derived from all of the other trials. In this sense EMS filtering is analogous to the method of “sensor noise suppression” (SNS)

**Visualizing the temporal evolution of the spatial filters.**

Click here for file

Applying EMS filtering to the data of Marti et al. (2012) revealed an event at around 200–300 ms after T2, that matched the properties of a parallel perceptual buffer as described by the router model of the PRP

Comparison with other methods

EMS filtering versus the ROI approach

Throughout the present paper, we have directly compared the ROI approach to EMS filtering. It is important to note that the use of one or the other depends on the scientific question being addressed. If the question concerns a specific region of the brain, then an ROI or projection onto a fixed anatomical source might be appropriate. However, if the question is about a specific experimental effect, then EMS filtering is a more appropriate tool because it tries to maximally reveal that effect separately at each time point by pooling over all sensors. We have shown that the method significantly increases the signal to noise ratio, especially when the number of trials is relatively small. We have also shown that a very simple univariate classifier (GNB) can perform well at discriminating between the

EMS filtering versus multiple regression with spatial templates

EMS filtering versus Fisher’s linear discriminant

EMS filtering is a matched spatial filtering technique: if one is testing for a difference between two conditions, then we compute the difference between their means and use that as a filter (or template). Fisher’s linear discriminant (FLD) is a well-known technique that operates in a similar way. In order to compute the weight vector for separating two classes, FLD takes the difference between the means of the two classes, and then multiplies this by the inverse noise-covariance matrix. The latter step can improve the separability of the two classes by taking account of non-uniformities in the noise distribution. However when the signal-to-noise ratio is low, trying to account for both signal and noise can yield a weight vector that is not representative of the difference between them.

A tradeoff between separability of the classes and recovery of the true underlying features (i.e. interpretability) is common among machine-learning techniques

For the analysis of stimulus-evoked responses, it is convenient for both the topography of the weight vector, and the resulting time course to be directly interpretable in terms of the particular experimental effect under investigation. Thus we have not incorporated the noise covariance into EMS filtering by default (although the option is available in our MatLab toolbox). Given that EMS filtering yields a significant improvement in signal quality over the most commonly-used techniques (single-sensor and ROI), foregoing an additional margin of separability at the single-trial level is a reasonable compromise in favor of readily interpretable weight vectors. Computing the noise covariance matrix also lengthens the computation time, especially in the context of a LOO cross validation.

EMS filtering versus pattern classification

Technically speaking, EMS filtering is a method of dimensionality reduction, and is not, by itself, a pattern classifier. However, it can be used for pattern classification by applying a simple decision rule (e.g. nearest-mean or GNB) to its one-dimensional output. In Figure

The decoding approach requires that the performance of the decoder be estimated at each time point in the epoch, and this can be computationally expensive given that a cross validation must be performed for each estimate. Even with only a five-fold cross validation (far fewer rounds of cross validation than the leave-one-out procedure used by EMS filtering) the SVM still took, on average, more than 30 times longer to compute than EMS filtering (~ 1 – 2 minutes per subject for the SVM versus ~ 2 – 3 seconds per subject for EMS filtering; see Results / Performance).

Also, while the SVM yields slightly better separability of the two classes at the single-trial level, this may come at the expense of the interpretability of the topography of the weight vectors, as discussed above. In addition to the weight vectors, the time courses output from EMS filtering are also more readily interpretable. For a simple objective function such as the difference between the means of two conditions, the axis along which the surrogate time courses vary is simply (and exactly)

Finally, the pattern-classification approach (i.e. examining the time course of classification accuracy) is limited to questions concerning the difference between experimental conditions, which is only one possible objective function that can be applied using EMS filtering. Using EMS filtering one could, for example, project the data onto “the correlation between signal amplitude and reaction time” or “the difference in signal amplitude between t0-50 ms and t0-500 ms” (where t0 is the time of a motor response). The latter objective function would tend to capture activity that is changing prior to a movement, such as the readiness potential

EMS filtering with linear regression using a temporally defined predictor based on reaction-time data, applied to a single subject

**EMS filtering with linear regression using a temporally defined predictor based on reaction-time data, applied to a single subject.** For this analysis, the objective function used by EMS filtering performed a linear regression on the data from each sensor (i.e. a matrix of trials x samples), and returned the beta weight. The predictor variable (shown in panel **A)** was constructed by coding each sample with a -1 if it was in the range 200 ms before to 50 ms after RT1, a +1 if it was in the range 200 ms before to 50 ms after RT2, and a zero otherwise. Since task 1 responses and task 2 responses were made with opposite hands (left and right, respectively, for this particular subject) then this regressor should reveal response-related activity that is different for right-handed and left-handed responses. The topography of the resulting spatial filter (magnetometers) is shown in panel **B**, and is clearly lateralized, consistent with the coding of the regressor. Panels **C** and **D** show the surrogate time courses sorted by RT1 and RT2, respectively, with the reaction time marked by black dots. The color map goes from blue (negative) to green (zero) to red (positive). Response-related activity is plainly visible in the form of a bluish vertical band at ~ 100 to 600 ms and a reddish vertical band at ~ 1400 to 2000 ms, and shows a clear relationship with the reaction time by which the data were sorted. Panels **E** and **F** show the mean over the surrogate time courses when the trials were aligned to RT1 and RT2, respectively. Data were arbitrarily aligned to the median reaction time in each case, which is marked by a thin vertical line. A confidence boundary equal to one standard error of the mean is shown in a lighter shade of blue.

Same as Figure

**Same as Figure****, but with the reaction times replaced by random values, resulting in a random model. ****A**, predictor variable (same as in Figure **B**, topography of the resulting spatial filter. **C**, output sorted by RT 1; **D**, output sorted by RT2; **E**, mean aligned to RT1; **F**, mean aligned to RT2.

EMS filtering applied to the difference between means gives as a solution the square-root of the total power of the ERP difference, and is thus very similar to the GFP of the ERP difference. However, GFP of the ERP difference is strictly an aggregate measure, and is not defined at the single-trial level, whereas EMS filtering produces one noise-suppressed time course per trial, and thus allows for both aggregate and single-trial analyses.

Topography of the spatial filters

One might also want to ask “what sensors in the topography show a significant effect?”, but this is a different question from the one addressed by EMS filtering. In the context of EMS filtering, the significant effect, if there is one, is in the amplitude of the resulting time course at a particular time point – the “topography” at that time point is simply the vector onto which the original data were projected in order to reveal that significant effect. To make this point more clear, consider that it is possible to construct a topography such that all coefficients have the same absolute value (no sensor being weighted any more or less heavily than any other), and yet the vector of coefficients exposes a significant effect when the data are projected onto it. One could, however, statistically compare the spatial filter associated with a given experimental effect to that associated with a different effect (using correlation, for example), and one can also compare the spatial filter at a given time point with the spatial filter at a different time point, as illustrated in Figure

It can, of course, be informative to examine the temporal evolution of the spatial filters. In Additional file

Contribution of EMS filtering to the dual-task literature: testing the sensory buffer hypothesis

The router model of the PRP

Objective functions other than the difference between means

Direct comparison between two experimental conditions is a very simple, common, and powerful analysis, and this is why we have focused on the difference between means as an objective function for the purpose of demonstrating the method. However, as an algorithm EMS filtering is independent of the particular objective function that is used. The minimum requirement is that the function return a vector of coefficients, one for each sensor. As one extreme (and useless) example, the objective function could return a vector of random numbers – highly unlikely to reveal any interesting effects in the data, but technically a valid objective function. Correlation with a behavioral variable such as reaction time is an example of a potentially informative objective function. Although in principle any objective function can be used, note that the interpretability of the results will depend, at least in part, on the choice of objective function. Note also that the magnitude and direction of the skew introduced by the LOO procedure

In order to illustrate the use of EMS filtering with an objective function other than the difference between means, we used the reaction-time data from

Conclusions

We have presented a method for reducing multi-variate time series data to a single time course by projecting the data onto a single vector that is chosen so as to reveal a given experimental effect. A leave-k-out procedure ensures that each filter is independent of the trial/s that is/are projected onto to it. Although we have presented the method primarily using the example of a difference between two experimental conditions, we reiterate that other objective functions – such as correlation with a dependent or independent variable, or a temporal difference within a single experimental condition – can also be used, and this capability has been implemented in the freely-available computer code. However, the properties of any function other than the difference between two means would have to be worked out independently. We used MEG data to illustrate the method, but the method can be applied to any kind of multivariate time-series data, such as slow event-related functional magnetic resonance imaging (fMRI), or data from domains outside of neuroimaging. We have demonstrated the effectiveness of the method in dramatically improving the quality of single trial data vis-à-vis a given experimental effect, and specifically in revealing the time course of the psychological refractory period. An implementation of the method is freely available as a MatLab toolbox at

Endnotes

^{a}A clarification on the use of the term "spatial filter": A spatial filter can be two-dimensional, taking into account the relative spatial locations of each of the individual variables (pixels or sensors). In this case a two-dimensional convolution is used for filtering, as in the example of detecting faces in photographs. However, even when we treat the data as a one-dimensional vector (as is the case here), the term "spatial filter" is still commonly used because the data (across sensors, at a single time sample) are in the spatial domain rather than the time domain. When we refer to a "spatial filter" we are always referring to a one-dimensional vector in sensor space (and hence a one-dimensional filtering operation).

^{b}One can show analytically, that the expected mean of this LOO procedure is zero. However, for low dimensional data (D<10) the distribution is far from the Normal distribution and thus one can not use the conventional t-test for the mean (as the variance will be over-estimated and thus one looses statistical power). Non-parametric tests for zero-median are also not adequate as the distribution is skewed and thus it has a non-zero median. To establish statistical significance one has to resort therefore to random shuffle statistics. The problem may be less severe for high-dimensional data in which case the distribution is approximately Normal and a simple t-test may suffice. (LC Parra, personal communication). [Note that the above is not a concern when performing statistics across subjects, as long as the means are normally distributed].

Appendix A

EMS filtering – formal description

In all formulae, **
X
**.

The EMS filtering algorithm

Let **
X
** be a data matrix of size

If the objective is to maximize the difference between two experimental conditions, then

Given only the data matrix, ^{
t
}) derived based on the data at the corresponding time point (

We could simply compute

Each time point (

For the simple case where

Let

be the mean for condition **
y
** = 1, with the

Let

be the mean for condition **
y
** = 2, with the

Then

is the matrix of spatial filters, one for each sample (

is the matrix of spatial filters after normalizing each filter to unit length, and

is the resulting matrix of surrogate time courses.

For the general case where **
f
** is some function other than the above (see Discussion),

Where

and

Recall the formal definition of a matched filter as a “known template” that we correlate with an unknown (and noisy) signal. In the context of EMS filtering we do not know what the template is (i.e. the real underlying difference between two experimental conditions), and so we estimate it from the data itself. This means that performance of the filter will depend in part on the accuracy of the estimate. If the estimate is optimal, then this is the best that can be done

Filtering with a stationary-template

Although we present EMS filtering as a method for revealing the time course of an experimental effect, it is often useful to examine the time course of a fixed spatial filter e.g. the mean around the peak of an evoked potential of interest. While conceptually different from the procedure described above, its implementation is very similar. Consider an experimental effect that is expected to appear at a specific latency (e.g. ~200 ms) with respect to a certain event – e.g. the onset of a stimulus or the issuance of a motor response – to which the data epochs are aligned (**
X
**, the set of trial labels,

So **
X
**

Where

is the spatial filter computed on the ^{th} iteration. In the example given above,

Since the average signal at time(s) t ∈

Appendix B

Derivation of the weight vector for the difference between two means

In the Methods section we introduced **
X
**, the data matrix, as having dimensions

_{
A
} if the corresponding trial belongs to condition _{
B
} if the corresponding trial belongs to condition _{
A
} and _{
B
} are the number of trials belonging to conditions

Where ⟨

The straightforward solution to this problem is:

Note that the numerator in (10) above is equivalent to the vector labeled

Cross validation

The accuracy of the model is simply the dot product of the discriminative vector

To discount over-fitting effects in the accuracy estimate, we replace it by a cross-validated estimate,

where ^{
(k)
} = ^{
(k)
} and ^{
(k)
}, where the superscript ^{
th
} trial left out”.

Competing interests

The authors declare that they have no competing interests.

Authors’ contributions

AS devised the method and wrote the MatLab computing toolbox. SM contributed computer code to the toolbox. AS and SM tested and refined the method. AS and SM performed the data analyses. AS, SM, and SD wrote the manuscript, and all authors read and approved the final manuscript.

Acknowledgements

AS was supported by a Marie Curie post-doctoral fellowship from the European Commission (FP7-PEOPLE-2009-IIF, project number 252665). SM was supported by a grant from the Human Frontiers Science Program. Additional support came from a European Research Council senior grant "NeuroConsc" to SD. The Neurospin MEG facility was sponsored by grants from INSERM, CEA, FRM, the Bettencourt-Schueller Foundation, and the Région île-de-France. Special thanks to Bertrand Thirion for assistance with the mathematical derivation in Appendix B, and for technical guidance. Special thanks to Lucas Parra for technical guidance and comments on an earlier draft of the manuscript. Special thanks also to Gaël Varoquaux for technical guidance.