next up previous
Next: Initialisation Up: tr04mw2 Previous: Determining Basis Set Constraints


Inference

The distribution we are interested in inferring upon is the posterior distribution $ p(\theta\vert y)$ (equation 3), where $ \theta$ is the set of parameters $ \{\beta,a,\phi_{\epsilon}\}$. It is not possible to solve for this distribution analytically. Hence we use the framework introduced to FMRI by Penny et al. (2003) of Variational Bayes. For a general introduction to Variational Methods see Jordan (1999). Using this approach we can approximate a posterior distribution $ p(\theta\vert y)$ with $ q(\theta\vert y)$ by minimising the KL-divergence, or equivalently by maximising the variational free energy, $ F$, between them:
$\displaystyle F=\int{q(\theta\vert y)\log{\frac{p(y,\theta)}{q(\theta\vert y)}}d\theta}$     (18)

To maximise this function, we need to ensure that the resulting integrals are tractable. A standard way to help achieve this is to use conjugate priors and to factorise the approximate posterior. In the modelling section, we parameterised the model in terms of parameters $ a$, $ \phi_{\epsilon}$, $ D$ and $ \bar{D}$, and wherever possible specified conjugate priors on them. However, using these parameters and factorising the posterior into
$\displaystyle q(D,\bar{D},a,\phi_{a},\phi_{\epsilon}\vert y)=\prod_p\{
q(a_p\ve...
... y)\}\prod_i\{q(D_{i}\vert y)q(\bar{D}_{i}\vert y)q(\phi_{\epsilon_i}\vert y)\}$     (19)

is not tractable to Variational Bayes as we can not derive the update equations for $ q(D_{ie}\vert y)$ and $ q(\bar{D}_{ie}\vert y)$. To overcome this problem, instead of using the two parameters $ D_{ie}$ and $ \bar{D}_{ie}$, we reparameterise to use $ \beta_{ie}$ and $ \bar{\beta}_{ie}$ by rewriting equation 11 as:
$\displaystyle \beta_{ie}$ $\displaystyle =$ $\displaystyle \bar{\beta}_{ie}D_{ie}$ (20)

where:
$\displaystyle \bar{\beta}_{ie}=\frac{\bar{D}_{ie}}{\sqrt(\sum_b D_{ieb}^2/N_b)}.$     (21)

Recall from equation 11 that $ {\sqrt(\sum_b
D_{ieb}^2/N_b)}$ is the normalisation on the HRF shape vector, $ D_{ie}$, and the scalar, $ \bar{D}_{ie}$, represents the size of the HRF. We now have the following prior on $ \beta_{ie}$:
$\displaystyle \beta_{ie}\vert\bar{\beta}_{ie}$ $\displaystyle \sim$ $\displaystyle MVN(\bar{\beta}_{ie}m,
\bar{\beta}_{ie}^{2}C^{-1})$ (22)

and a noninformative prior on $ \bar{\beta}_{ie}$:
$\displaystyle \bar{\beta}_{ie}$ $\displaystyle \sim$ $\displaystyle N(0,\phi_{\bar{\beta}_{0}}^{-1})$ (23)

where the precision, $ \phi_{\bar{\beta}_{0}}$, is fixed to be very small (1e-6) for all voxels. We now assume the following factorised form for the approximate posterior:
$\displaystyle q(\beta,\bar{\beta},a,\phi_{\epsilon}\vert y)=\prod_p\{
q(a_p\ver...
... y)\}\prod_i\{
q(\beta_{i},\bar{\beta}_{i}\vert y)q(\phi_{\epsilon_i}\vert y)\}$     (24)

where:
$\displaystyle q(B_{i}\vert y)$ $\displaystyle =$ $\displaystyle MVN(\mu_{B_i},\Lambda_{B_i})$  
$\displaystyle q(a_p\vert y)$ $\displaystyle =$ $\displaystyle N(\mu_{a_p},\Lambda_{a_p})$  
$\displaystyle q(\phi_{a_p}\vert y)$ $\displaystyle =$ $\displaystyle Ga(b_{a_p},c_{a_p})$  
$\displaystyle q(\phi_{\epsilon_i}\vert y)$ $\displaystyle =$ $\displaystyle Ga(b_{\epsilon_i},c_{\epsilon_i})$ (25)

where $ B_i=(\beta_{i},\bar{\beta}_{i})$, where $ \beta_{i}$ is the $ (N_e N_b) \times 1$ vector $ [ \beta_{i1} \hdots \beta_{ie} \hdots
\beta_{iN_e}]^T$ and $ \bar{\beta}_{i}$ is the $ N_e \times 1$ vector $ [ \bar{\beta}_{i1} \hdots \bar{\beta}_{ie} \hdots
\bar{\beta}_{iN_e}]^T$. Note that we do not fully factorise. We would not expect $ \beta_i$ and $ \bar{\beta}_i$ to be independent a posteriori. Therefore, we maintain a combined unfactorised posterior for the two parameters $ \beta_i$ and $ \bar{\beta}_i$. However, we have factorised the noise parameter posteriors from the regression parameter posteriors. This assumption helps to make inference tractable using Variational Bayes. Penny et al. (2003) discuss the implications of doing this and show that the error induced by this assumption is negligible for inferring on FMRI data. We also show later (in section 4) that there is negligible error induced when inferring on artificial data. At this point, the inference is still not fully tractable to Variational Bayes as we can not derive the update equations for $ q(B_i\vert y)$. To overcome this we rewrite the prior in equation 22 as:
$\displaystyle \beta_{ie} \sim MVN(\bar{\beta}_{ie}m,
\phi^{-1}_{\bar{\beta}_{ie}}C^{-1})$     (26)

where the utility parameter, $ \phi_{\bar{\beta}_{ie}}$, is updated as a point estimate equal to one over the current expected value of $ {\bar{\beta}_{ie}}^2$:
$\displaystyle \phi_{\bar{\beta}_{ie}}$ $\displaystyle =$ $\displaystyle E[\bar{\beta}_{ie}^{-2}]^{-1}$  
  $\displaystyle =$ $\displaystyle R_{ie}\mu_{\bar{\beta}_{ie}}^{2}+(\Lambda_{B_i}^{-1})_{\bar{\beta}_{ie}}$ (27)

where $ R_{ie}$ is defined by the relationship $ \bar{\beta}_{ie}=R_{ie}B_i$, and $ (\Lambda_{B_i}^{-1})_{\bar{\beta}_{ie}}$ is the current marginal covariance of $ \bar{\beta}_{ie}$. The approximate posterior distributions are now tractable to Variational Bayes. The update rules for the approximate posterior distributions, which iteratively maximises the free energy in equation 18, are given in appendix B. We can perform standard inference questions on the marginal posterior over $ \beta_e$, in the same way that we do for the standard use of basis functions in the GLM (i.e. using f-contrasts, see section 3.2). We test the accuracy of the posterior approximations presented in this section using null artificial data in section 4. The Variational Bayes inference requires approximately 10 iterations and takes approximately 15 minutes (for a whole brain - in-plane resolution 4mm, slice thickness 7mm and $ 180$ volumes) on a 2GHz Intel PC.

Subsections
next up previous
Next: Initialisation Up: tr04mw2 Previous: Determining Basis Set Constraints