next up previous
Next: Artificial data Up: tr04mw1 Previous: Classifying SPMs in FMRI


Inference

To explore the differences that certain aspects of the models make, there are three different mixture models which we want to be able to infer upon in this paper, these are:

To achieve this we need to obtain the marginal posterior distribution for the weights, $ w_{ik}$, given the observed data, $ \vec{y}$. Whilst it is possible to perform approximations to the distribution, it is difficult to assess the effect of these approximations on the inference performed. Therefore, the approach taken instead is to use Markov Chain Monte Carlo (MCMC) sampling from the full joint posterior distribution (see Gilks et al. (1996) and  Gamerman (1997) for texts on MCMC). This also automatically provides us with samples from the marginal posterior distribution for the weights.

We are able to use Gibbs sampling for the adaptive MRF smoothness parameter, $ \phi_{\tilde{w}}$. See the appendix for the required full conditional distribution of $ \phi_{\tilde{w}}$.

For all other parameters (i.e. the continuous weights and class distribution parameters) we use single-component Metropolis-Hastings jumps (i.e. we propose separate jumps for each of the parameters in turn). The updates are detailed in the appendix. We use separate Normal proposal distributions for each parameter, with the mean fixed on the current value, and with a scale parameter $ \sigma_p$ for the $ p^{th}$ parameter that is updated every 30 jumps. At the $ j^{th}$ update $ \sigma_p$ is updated according to:

$\displaystyle \sigma_p^{j+1}=\sigma_p^{j}S\frac{(1+A+R)}{(1+R)}$     (24)

where $ A$ and $ R$ are the number of accepted and rejected jumps since the last $ \sigma_p$ update respectively, $ S$ is the desired rejection rate, which we fix at $ 0.5$.

We require a good initialisation of the parameters in the model purely to reduce the required burn-in of the MCMC chains (the burn-in is the part of the MCMC chain which is used to ensure that the chain has converged to be sampling from the true distribution). To initialise we use the non-spatial class labels with class proportions model (equation 5) along with the class distributions specified in section 3. The joint maximum a posterior over class labels, $ x_i$, and distribution parameters, $ \vec{\theta}$, can be obtained for this model using the Expectation-Maximisation (EM) algorithm (Beckmann et al., 2003).

As a result of this good initialisation, we use a burn-in of 1000 jumps, followed by 1000 further jumps of which every 2nd is sampled. Observation of the chains with different initial conditions confirmed that a burn-in of 1000 jumps was sufficient.


next up previous
Next: Artificial data Up: tr04mw1 Previous: Classifying SPMs in FMRI