Estimation of the unmixing matrix

Next: Incorporation of prior knowledge Up: Maximum Likelihood estimation Previous: Model order selection

Estimation of the unmixing matrix

Recall from equation 5 that in order to estimate the mixing matrix and the sources, we need to optimise an orthogonal rotation matrix in the space of whitened observations:

$\displaystyle \widehat{\mbox{\protect\boldmath$s$}}=\mbox{\protect\boldmath$W$}... ...t\boldmath$x$}= \mbox{\protect\boldmath$Q$}\tilde{\mbox{\protect\boldmath$x$}},$

(11)

where

$\displaystyle \tilde{\mbox{\protect\boldmath$x$}} = (\mbox{\protect\boldmath$\L... ...ath$U$}^{\mbox{\scriptsize\textit{\sffamily {t}}}}_q\mbox{\protect\boldmath$x$}$

denotes the spatially whitened data.

In order to choose a technique for the unmixing step note that all previous results have highlighted the importance of non-Gaussianity of the source distributions: the split into a non-Gaussian part plus additive Gaussian noise is at the heart of the uniqueness results. Also, the estimation of the intrinsic dimensionality is based on the identification of eigenvectors of the data covariance matrix that violate the sphericity assumption of the isotropic Gaussian noise model. Consistent with this, we will estimate the unmixing matrix based on the principle of non-Gaussianity. [Hyvärinen et al., 2001] have presented an elegant fixed point algorithm that uses approximations to neg-entropy in order to optimise for non-Gaussian source distributions and give a clear account of the relation between this approach to statistical independence. In brief, the individual sources are obtained by projecting the data ${\mbox{\protect\boldmath $x$}}$ onto the individual rows of , i.e. the th source is estimated as

$\displaystyle \widehat{\mbox{\protect\boldmath$s$}}_r=\mbox{\protect\boldmath$v... ...box{\scriptsize\textit{\sffamily {t}}}}_r\tilde{\mbox{\protect\boldmath$x$}},$

where $^{\mbox{\scriptsize\textit{\sffamily {t}}}}_r$ denotes the th row of . In order to optimise for non-Gaussian source estimates, [Hyvärinen et al., 2001] propose the following contrast function:

$\displaystyle J($ $\displaystyle \mbox{\protect\boldmath$s$}$ $\displaystyle _r) \propto \langle F(\widehat{\mbox{\protect\boldmath$s$}_r})\rangle-\langle F(\nu)\rangle,$

(12)

where $\nu$ denotes a standardised Gaussian variable and is a general non-quadratic function that combines the high-order cumulants of in order to approximate the 'true' neg-entropy $J_{\tiny\cal N}(\widehat{\mbox{\protect\boldmath $s$}}_r)$ . From equation 12, the vector $^{\mbox{\scriptsize\textit{\sffamily {t}}}}_r$ is optimised to maximise $J(\widehat{\mbox{\protect\boldmath $s$}}_r)$ using an approximative Newton method. This finally leads to the following fixed point iteration scheme:

$\displaystyle \widehat{\mbox{\protect\boldmath$v$}}^{\mbox{\scriptsize\textit{\... ...t\boldmath$s$}}_r)\rangle\widehat{\mbox{\protect\boldmath$v$}}_r\right\rangle ,$

(13)

where denotes the derivative of . This is followed by a re- normalisation step such that $\widehat{\mbox{\protect\boldmath $v$}}^{\mbox{\scriptsize\textit{\sffamily {t}}}}_r$ is of unit length. A proof of convergence and discussion about the choice of the non- linear function can be found in [Hyvärinen et al., 2001]. In order to estimate sources, this estimation is simply performed times under the constraint that the vectors are mutually orthogonal. The constraint on the norm and the mutual orthogonality assure that these vectors actually form an orthogonal rotation matrix . Thus, estimation of the sources is carried out under the assumption that all marginal distributions of $\widehat{\mbox{\protect\boldmath $s$}}$ have maximally non-Gaussian distribution.

The choice of the nonlinear function is domain specific and in our case will be strongly linked to the inferential steps that are being performed after IC estimation (see section 4 below).

Next: Incorporation of prior knowledge Up: Maximum Likelihood estimation Previous: Model order selection

Christian F. Beckmann 2003-08-05