The Story of Signal Processing

EEG data is noisy. Most researchers love to remind you of this when you tell them you're trying to productize real-time control systems with non-invasive methods.

It's been well documented that there's high variance day-to-day, and person-to-person. Then come the claims of a biological ceiling to the information you can extract from EEG data paired with exclamations about noise, artifact corruption, and nonstationarity (a term that gets thrown around more than it gets solved).

But problems exist to be solved, and challenges exist to be taken on.

Hans Berger recorded the first EEG signals in 1924. It wasn't until 60 years later that ICA (Independent Component Analysis), the most common artifact removal technique today, was invented and 10 years after that when it was first applied to EEG to help remove data contamination from eye blinks and muscle noise.

There have been other similar advancements such as the application of the surface laplacian to help reduce volume conduction, or the use of ASR (Artifact Subspace Reconstruction) to try and use a baseline to reconstruct data that's artifact free.

But each technique still has its flaws: ASR requires a clean baseline to begin with, and there's still some spatial smearing even after the surface laplacian is applied. Then the same approach is taken again, trying to create an additional statistical or algorithmic approach to remove artifacts and recreate what "ideal" lab data would look like, with the goal of mimicking what's found in invasive data.

It inevitably falls short, and the idea of an effective non-invasive approach is pushed further down the ladder.

I'm someone who's deeply invested in building non-invasive systems that scale. Why would I be doing this if the reality is so grim?

What's the path forward to build EEG-based control systems that drive real impact?


I've written previously about the need to work on both algorithms and sensors in order to make meaningful progress. But I want to focus on the situation where you have reliable electrodes that have given you data.

People spend their time trying to recreate invasive data, or mimic the quality, resolution, and data behavior by trying to squeeze out as much signal to improve the SNR of the data.

In the long run, this will always fall short. It's analogous to trying to add jet engines to a car in order to make it fly, instead of driving and taking the road.

Non-invasive modalities, EEG included, are a fundamentally different form of data. It's still time-series data, but we shouldn't treat it like all characteristics carry over.

We need to think differently. Build with noise and variance included.

The first place change is needed is in data preprocessing and feature extraction. We should get rid of most of it. Barring a line / bandpass filter, all these techniques are creating situations that restrict the realm of what's learnable. While there are some known biological relationships that you can exploit, there is quite a bit that is unknown because of the limitations mentioned above. Limiting the spread of features that a model you're building has access to will inadvertently put a ceiling on performance, if your manual selection of features was not 100% correct. We should build models that train and learn from raw data as much as possible. Yes, they will take in artifacts and noise, but there might be context hidden underneath. You're allowing an algorithm to figure out which features are more important based on the data, instead of letting a human decide in advance.

The next avenue for change is in evaluation methods. Novel methods and algorithmic advances in the EEG space are driven by results in carefully curated test setups. Within-session or within-subject evaluation. Building for offline analysis results, instead of real-time, closed-loop performance. While the numbers look nice on paper, they don't tell you whether the model you've built will work for the BCI use case you're working towards. The metrics that need to be tested are inference pipelines that replicate offline results, and models that reduce calibration and transfer from subject to subject.


The field keeps trying to sanitize EEG data into something it's not, and then evaluates success in conditions that don't reflect reality. The paradigm shift is to stop fighting the nature of the data and start building systems that are designed for the messiness of the real world from day one.

This perspective has been the foundation of the work we're doing at Morph Labs, and the backbone of our models that we've built.

Non-invasive EEG will make a successful control system, but it will come from systems built with the chaos of the everyday world in mind.