next up previous
Next: Tensor PICA Up: tr04cb1 Previous: Introduction

2 Parallel Factor Analysis (PARAFAC)

The three-way PARAFAC technique is characterised by the following generative model:
\begin{displaymath}
x_{ijk}=\sum_r^R a_{ir}b_{jr}c_{kr} + \epsilon_{ijk}
\end{displaymath} (1)

( $ i=1,\dots,I; j=1,\dots,J; k=1,\dots,K)$ with an associated sum-of-squares loss:
\begin{displaymath}
\min_{\mbox{\protect\boldmath$A$},\mbox{\protect\boldmath$B$...
...t x_{ijk}-\sum_r^R
a_{ir}b_{jr}c_{kr}\right\vert\right\vert^2.
\end{displaymath} (2)

Here, $\mbox{\protect\boldmath$A$}=(\mbox{\protect\boldmath$a$}_{1},\dots,\mbox{\prote...
...ath$B$}=(\mbox{\protect\boldmath$b$}_{1},\dots,\mbox{\protect\boldmath$b$}_{R})$ and $\mbox{\protect\boldmath$C$}=(\mbox{\protect\boldmath$c$}_{1},\dots,\mbox{\protect\boldmath$c$}_{R})$ denote the $I\times R$, $J\times R$ and $K\times
R$ matrices containing the $R$ different factor loadings in the temporal, spatial and subject domain as column vectors. Within this model, any solution to equation 1[*] is a maximum likelihood solution under the assumptions of Gaussian noise.

The tri-linear model can alternatively be written in matrix notation, giving an expression for the individual 2-D subsets of $\mbox{\protect\boldmath$X$}$ [Bro, 1998]:

$\displaystyle \mbox{\protect\boldmath$X$}_{i..}$ $\textstyle =$ $\displaystyle \mbox{\protect\boldmath$B$}{\rm diag}(\mbox{\protect\boldmath$a$}...
...size\textit{\sffamily {t}}}}+\mbox{\protect\boldmath$E$}_{i..}\quad i=1,\dots,I$ (3)
$\displaystyle \mbox{\protect\boldmath$X$}_{.j.}$ $\textstyle =$ $\displaystyle \mbox{\protect\boldmath$C$}{\rm diag}(\mbox{\protect\boldmath$b$}...
...size\textit{\sffamily {t}}}}+\mbox{\protect\boldmath$E$}_{.j.}\quad j=1,\dots,J$ (4)
$\displaystyle \mbox{\protect\boldmath$X$}_{..k}$ $\textstyle =$ $\displaystyle \mbox{\protect\boldmath$A$}{\rm diag}(\mbox{\protect\boldmath$c$}...
...ize\textit{\sffamily {t}}}}+\mbox{\protect\boldmath$E$}_{..k}\quad k=1,\dots,K,$ (5)

where ${\rm diag}(\mbox{\protect\boldmath$a$}_i)$ denotes a $R\times R$ diagonal matrix where the diagonal elements are taken from the elements in row $i$ of $\mbox{\protect\boldmath$A$}$ (similarly for ${\rm diag}(\mbox{\protect\boldmath$b$}_i)$ and ${\rm diag}(\mbox{\protect\boldmath$c$}_i)$). This gives rise to a set of coupled sum-of-square loss functions. Based on these, a simple way of estimating the factor matrices is to use an iterative Alternating Least Squares (ALS) approach, iterating between the least-squares estimates for one of $\mbox{\protect\boldmath$A$}, \mbox{\protect\boldmath$B$}$ and $\mbox{\protect\boldmath$C$}$ separately while keeping the other two matrices fixed at their most recent estimate:
$\displaystyle \widehat{\mbox{\protect\boldmath$A$}}$ $\textstyle =$ $\displaystyle \left(\sum_k\mbox{\protect\boldmath$X$}_{..k}\mbox{\protect\boldm...
...\mbox{\protect\boldmath$C$})\right)^{\mbox{\scriptsize\textit{\sffamily {-1}}}}$  
$\displaystyle \widehat{\mbox{\protect\boldmath$B$}}$ $\textstyle =$ $\displaystyle \left(\sum_i\mbox{\protect\boldmath$X$}_{i..}\mbox{\protect\boldm...
...mbox{\protect\boldmath$A$}})\right)^{\mbox{\scriptsize\textit{\sffamily {-1}}}}$ (6)
$\displaystyle \widehat{\mbox{\protect\boldmath$C$}}$ $\textstyle =$ $\displaystyle \left(\sum_j\mbox{\protect\boldmath$X$}_{.j.}\widehat{\mbox{\prot...
...box{\protect\boldmath$B$}})\right)^{\mbox{\scriptsize\textit{\sffamily {-1}}}},$  

where $\circ$ denotes the direct (or element-wise) product. The ALS algorithm iteratively calculates OLS estimates for the three factor matrices. Directly fitting these so as to minimise the sum-of-squares error provides a simple way of jointly estimating the factor loadings that describe processes in the temporal, spatial and subject domain without requiring orthogonality between factor loadings in any one of the domains: the multi-way PARAFAC model, unlike PCA, does not suffer from rotational indeterminacy, i.e. a rotation of estimated factors has impact on the overall fit [Harshman and Lundy, 1984,Harshman and Lundy, 1994]. The ALS algorithm , however, can suffer from slow convergence, in particular, when a set of column vectors in one of the factor matrices is (close to being) collinear. Also, it is sensitive to specifying the correct number of factors $R$ (i.e. the number of columns in $\mbox{\protect\boldmath$A$}, \mbox{\protect\boldmath$B$}$ and $\mbox{\protect\boldmath$C$}$). In order to address these issues, [Cao et al., 2000] have proposed to extend the standard PARAFAC loss function to include a diagonalisation error, such that
$\displaystyle {\cal{L}}(\mbox{\protect\boldmath$A$})$ $\textstyle =$ $\displaystyle \sum_k\left.\left\vert\left\vert\mbox{\protect\boldmath$X$}_{..k}...
...B$}^{\mbox{\scriptsize\textit{\sffamily {t}}}}\right\vert\right\vert^2_F\right.$  
    $\displaystyle \left.+\sum_i\left\vert\left\vert\mbox{\protect\boldmath$B$}^\dag...
...}}-{\rm diag}(\mbox{\protect\boldmath$a$}_i)\right\vert\right\vert^2_F
\right.,$ (7)

(similarly for ${\cal{L}}(\mbox{\protect\boldmath$B$})$ and ${\cal{L}}(\mbox{\protect\boldmath$C$})$). Here, $\vert\vert\mbox{\protect\boldmath$V$}\vert\vert^2_F=~{\mathrm {tr}}~(\mbox{\pro...
...math$V$}^{\mbox{\scriptsize\textit{\sffamily {t}}}}\mbox{\protect\boldmath$V$})$ denotes the Fröbenius norm and $\mbox{\protect\boldmath$B$}^\dagger$ denotes the pseudo-inverse of $\mbox{\protect\boldmath$B$}$. The first term corresponds to the sum-of-square loss function while the second term penalises the $I$ different $R\times R$ projection matrices. A modified ALS algorithm can be derived by iterating solutions for

\begin{displaymath}
\frac{
\partial{\cal{L}}(\mbox{\protect\boldmath$V$})}{
\partial \mbox{\protect\boldmath$V$}}=0
\end{displaymath}

with $\mbox{\protect\boldmath$V$}= \mbox{\protect\boldmath$A$}, \mbox{\protect\boldmath$B$}, \mbox{\protect\boldmath$C$}$. The ordinary least-squares solutions then becomes [Cao et al., 2000]:

\begin{displaymath}
\widehat{\mbox{\protect\boldmath$A$}}=\left(\sum_k\mbox{\pro...
...ight)^{\mbox{\scriptsize\textit{\sffamily {-1}}}},\nonumber\\
\end{displaymath}

where $\mbox{\protect\boldmath$P$}=[\mbox{\protect\boldmath$p$}_1,\dots,\mbox{\protect\boldmath$p$}_I]^{\mbox{\scriptsize\textit{\sffamily {t}}}}$ and where $\mbox{\protect\boldmath$p$}_i$ are column vectors formed by the elements on the main diagonal of the $R\times R$ matrix $\mbox{\protect\boldmath$B$}^\dagger\mbox{\protect\boldmath$X$}_{i..}(\mbox{\protect\boldmath$C$}^\dagger)^{\mbox{\scriptsize\textit{\sffamily {t}}}}$ (similar for $\widehat{\mbox{\protect\boldmath$B$}}$ and $\widehat{\mbox{\protect\boldmath$C$}}$). This modified ALS algorithm has been used for all later PARAFAC calculation.

It is interesting to note that the ALS approach to three-way PARAFAC does provide a unique decomposition, provided the data has appropriate 'system variation' [Harshman and Lundy, 1984,Harshman and Lundy, 1994], i.e. when $\mbox{\protect\boldmath$A$}, \mbox{\protect\boldmath$B$}$ and $\mbox{\protect\boldmath$C$}$ are of full rank and there are proportional changes in the relative contribution from one factor to another in all three domains so that no two factors in any domain are collinear. In FMRI, however, we might expect the individual vectors in subject space to exhibit a significant amount of collinearity between some of them, e.g. in the case of two spatially different physiological signals, we might expect the relative contribution of the individual subjects to be very similar, so that two columns in $\mbox{\protect\boldmath$C$}$ are (close to being) collinear. The effects of collinearity of some of the factors on the ability to extract the latent structure of the data will be evaluated in section 4[*].


next up previous
Next: Tensor PICA Up: tr04cb1 Previous: Introduction
Christian Beckmann 2004-12-14