In the past few months, I’ve been collaborating with researchers from the Turk-Browne Lab at Yale University. Their ongoing work is about learning the origins of cognition in the human brain. Equipped with fMRI scanners, they scan kids to analyze their cognitive skills at different ages. Their proposal is simple but quite challenging. The challenges start by recruiting families, making sure they are safe and comfortable during the experiments, developing tasks that are suitable for kids of very young ages, and overcoming the data challenges. In particular, the latter requires to rethink machine learning methods that neuroscientists typically use for analyzing data of experiments with adults. The brain develops fast at these ages, and changes are to be expected over time.
A few days ago, our paper “A Semi-supervised Method for Multi-Subject fMRI Functional Alignment” was accepted to the IEEE International Conference on Acoustics, Speech and Signal Processing that will be held next March in New Orleans, Louisiana, USA. This work presents an extension to the original Shared Response Model (SRM), an unsupervised method for multi-subject functional alignment of fMRI data. Using a semi-supervised approach, we show how to train SRM taking into consideration data from a supervised task (multi-label classification). In this way, we need almost half the number of unlabeled samples to achieve the same accuracy level, or achieve higher accuracy with the same number of unlabeled samples.
The method extends the deterministic SRM formulation with a Multinomial Logistic Regression penalty. The semi-supervised SRM inherits the characteristics of the SRM problem, defining a non-convex optimization problem. We solve it using a block-coordinate descent approach, where each block is an unknown matrix. We show similarities to the SRM and MLR, and note that finding the mappings requires to solve an optimization problem in the Stiefel manifold. While this has an closed-form in the SRM case, in the SS-SRM this requires general techniques to solve it. We use the excellent pymanopt package that allowed us to implement a solution for python. Also, the source code of SS-SRM has been published as part of the Brain Imaging Analysis Kit (BrainIAK).
Neuroscientist is the science of learning how the brain works and understanding, among other things, how the brain stores and processes all the information that is received from the world around it. Several imaging techniques have been developed in recent years that allow neuroscientists to peek inside the human brain. The most important step on this direction is the functional Magnetic Resonance Imaging, or fMRI, that captures the brain activation indirectly from the blood oxygenation levels. With fMRI we can capture a full brain scan every few seconds. Such scans are volumes of the brain comprised of thousands-to-millions of voxels. Processing of these scans is done usually with machine learning algorithms and statistic tools.
Storing a subject information in memory is possible with today servers. However, doing it with a tens of them is very limiting. Therefore, storing all this data requires multiple machines to be stored at once. Moreover, using multi-subject datasets helps to improve the statistical capacity of the machine learning methods that are incorporated in the neuroscience experiments. In a recent work from our research group, we published a manuscript describing how we scale out two factor analysis methods (for dimensionality reduction). We show that is possible to use hundreds to thousands of subjects for neuroscience studies.
The first method is called the Shared Response Model (SRM). The SRM computes a series of mappings from the subjects’ volumes to a shared subspace. These mappings improve the predictability of the model and help increase the accuracy of subsequent machine learning algorithms used in a study. The second method, dubbed Hierarchical Topographic Analysis (HTFA), is a model that abstracts the brain activity with hubs (spheres) of activity and dynamic interlinks across them. HTFA helps with the interpretation of the brain dynamics, outputting networks as the one in the figure below. For both methods, we present algorithms that can run distributively and process a 1000-subject dataset. Our work “Enabling Factor Analysis on Thousand-Subject Neuroimaging Datasets” aims to push the limits of what neuroscientist can do with multi-subject data and enable them to propose experiments that were unthinkable before.