Unraveling Biomarkers through Multi-Modal Data Fusion
Evrim Acar, Simula Research Lab
Fusing complementary signals from different modalities holds the promise to lead to the discovery of more accurate diagnostic biomarkers for various diseases. For instance, in neuroscience, joint analysis of signals from different neuroimaging techniques has the potential to reveal biomarkers for neurological disorders. However, biomarker discovery through data fusion is challenging since it requires extracting interpretable and reproducible patterns from data sets consisting of shared as well as unshared patterns, and often of different orders, e.g., multi-channel electroencephalography (EEG) signals from multiple subjects can be represented as a third-order tensor with modes: subjects, time, and channels, while functional magnetic resonance imaging (fMRI) data may be in the form of a subjects by voxels matrix. Traditional fusion methods rearrange higher-order tensors as matrices and uses matrix factorization-based fusion approaches with additional constraints such as statistical independence or orthogonality to extract patterns, i.e., biomarkers, uniquely. Rather than imposing such constraints, we preserve the multiway structure of higher-order tensors, formulate data fusion as a coupled matrix and tensor factorization (CMTF) problem and discuss its extension to structure-revealing data fusion, i.e., fusion models that can identify shared and unshared patterns in coupled data sets. Numerical experiments on prototypical and real coupled data sets demonstrate that the structure-revealing CMTF model can capture the underlying patterns more accurately than matrix factorization-based fusion methods by exploiting the low-rank structure of higher-order tensors. We will discuss applications of CMTF-based fusion models in metabolomics and neuroscience.