Maximum Likelihood estimation

Throughout the remainder of this paper, we are going to keep the parameter
** fixed at its ML estimate:
**

Without loss of
generality we will also assume that the sources have unit variance for we can
freely exchange arbitrary scaling factors between the source signals
and the associated columns of the mixing matrix
**.
**

If the noise covariance
** was known we can
use
its Cholesky decomposition
**** to rewrite equation
2:
**

Noise and signal are assumed to be uncorrelated and therefore

where

Estimating the mixing matrix
** thus reduces to
identifying the square matrix
**** after whitening the data with respect to the
noise covariance
**** and projecting the temporally whitened observations
onto the space spanned by the eigenvectors of
**** with largest
eigenvalues.
From
, the maximum likelihood source estimates are
obtained using generalised least squares:
**

and the ML estimate of becomes

i.e. is the average of the eigenvalues in the minor subspace spanned by the smallest eigenvectors. Solving the model in the case of a full noise covariance where the noise covariance is unknown can be achieved by iterating estimates for and and re-estimating the noise covariances from the residuals . The estimation process in the presence of non-isotropic noise is computationally much more involved than estimation in the standard noise free setting. The form of

The maximum likelihood solutions given in equations 5-7 give important insight into the methodology.
Firstly, in the case where and
** are known, the
maximum likelihood solution for
is contained in the principal
eigenspace of
**** of dimension , i.e. the span of the first
eigenvectors equals the span of the unknown mixing matrix
****. Projecting the
data onto the principal eigenvectors is not just a convenient technique to deal
with the high dimensionality in FMRI data but is part of the maximum likelihood
solution under the sum of square loss. Even if estimation techniques are
employed that do not use an initial PCA step as part of the ICA estimation, the
final solution under this model is necessarily contained in the principal
subspace. Secondly, combining these results with the uniqueness results stated
earlier we see that ***only* in the case where the analysis is performed in
the appropriate lower-dimensional subspace of dimension are the source
processes uniquely identifiable.
Finally, equations 5-7 imply that the standard
noise-free ICA approach with dimensionality reduction using PCA implicitly
operates under an isotropic noise model.

The remainder of this paper
illustrates that by making this specific noise model explicit in the
*modelling* and *estimation* stages, we can address important questions of model order selection, estimation and inference in a consistent way.

An immediate consequence of the fact that we are operating under an isotropic
noise model is that as an initial pre-processing step we will modify the
original data time courses to be normalised to zero mean and unit variance. This
appears to be a sensible step in that on the one hand we know that the voxel-wise standard deviation of resting state data varies significantly over the brain but on the other hand,
all voxels' time courses are assumed to be generated from the same noise process.
This variance-normalisation pre-conditions the data under the 'null
hypotheses' of purely Gaussian noise, i.e. in the absence of any signal: the
data matrix
** is identical up to second order statistics to a simple set of
realisations from a
noise process. Any signal
component contained in
**** will have to reveal itself via its deviation from
Gaussianity. This will turn out to be of prime importance both for the
estimation of the number of sources and the final inferential steps.
**

After a voxel-wise normalisation of variance, two voxels with comparable
noise level that are modulated by the same signal time course,
** say, but
by different amounts will have the same regression coefficient upon regression
against
****. The difference in the original amount of modulation is therefore
contained in the standard deviation of the residual noise. Forming voxel-wise
statistics, i.e. dividing the PICA maps by the estimated standard deviation
of
****, thus is invariant under the initial variance-normalisation.
**