Throughout the remainder of this paper, we are going to keep the parameter fixed at its ML estimate:
Without loss of generality we will also assume that the sources have unit variance for we can freely exchange arbitrary scaling factors between the source signals and the associated columns of the mixing matrix .
If the noise covariance was known we can use its Cholesky decomposition to rewrite equation 2:
Noise and signal are assumed to be uncorrelated and therefore
Estimating the mixing matrix thus reduces to identifying the square matrix after whitening the data with respect to the noise covariance and projecting the temporally whitened observations onto the space spanned by the eigenvectors of with largest eigenvalues. From , the maximum likelihood source estimates are obtained using generalised least squares:
The maximum likelihood solutions given in equations 5-7 give important insight into the methodology. Firstly, in the case where and are known, the maximum likelihood solution for is contained in the principal eigenspace of of dimension , i.e. the span of the first eigenvectors equals the span of the unknown mixing matrix . Projecting the data onto the principal eigenvectors is not just a convenient technique to deal with the high dimensionality in FMRI data but is part of the maximum likelihood solution under the sum of square loss. Even if estimation techniques are employed that do not use an initial PCA step as part of the ICA estimation, the final solution under this model is necessarily contained in the principal subspace. Secondly, combining these results with the uniqueness results stated earlier we see that only in the case where the analysis is performed in the appropriate lower-dimensional subspace of dimension are the source processes uniquely identifiable. Finally, equations 5-7 imply that the standard noise-free ICA approach with dimensionality reduction using PCA implicitly operates under an isotropic noise model.
The remainder of this paper illustrates that by making this specific noise model explicit in the modelling and estimation stages, we can address important questions of model order selection, estimation and inference in a consistent way.
An immediate consequence of the fact that we are operating under an isotropic noise model is that as an initial pre-processing step we will modify the original data time courses to be normalised to zero mean and unit variance. This appears to be a sensible step in that on the one hand we know that the voxel-wise standard deviation of resting state data varies significantly over the brain but on the other hand, all voxels' time courses are assumed to be generated from the same noise process. This variance-normalisation pre-conditions the data under the 'null hypotheses' of purely Gaussian noise, i.e. in the absence of any signal: the data matrix is identical up to second order statistics to a simple set of realisations from a noise process. Any signal component contained in will have to reveal itself via its deviation from Gaussianity. This will turn out to be of prime importance both for the estimation of the number of sources and the final inferential steps.
After a voxel-wise normalisation of variance, two voxels with comparable noise level that are modulated by the same signal time course, say, but by different amounts will have the same regression coefficient upon regression against . The difference in the original amount of modulation is therefore contained in the standard deviation of the residual noise. Forming voxel-wise statistics, i.e. dividing the PICA maps by the estimated standard deviation of , thus is invariant under the initial variance-normalisation.