It is worth considering the different sources of individual class
broadening (variance) in the image intensity histogram. One source
is intrinsic within-class variation in the underlying signal, for
example, FMRI, where the activation strength does vary within an
"activated" cluster. Secondly, broadening is caused by image
noise, both high frequency noise and also, for example, in the
case of structural brain images, low frequency noise or "bias
field". A third cause is limitations of the model, for example, in
the case of the models described in this paper, partial volume
effect (PVE), where a voxel in reality contains a mixture of
different classes. In Santago and Gage (1995); Ruan et al. (2000) spatial
neighbourhood information is used to aid classification into pure
voxels and PVE voxels. A very similar approach to PVE models is
that of fuzzy mixture segmentation (Caillol et al., 1997). The approach
in our paper, using the language of Caillol et al. (1997) is a ``hard
segmentation'' in that the intention is to find posterior pdfs on
discrete classification labels. The fuzzy approach
of Caillol et al. (1997), like PVE modelling, consider voxels as being
mixtures of classes rather than discretely belonging to one class
or another. In PVE models (or fuzzy mixture models), the resulting
models have weights vectors at each voxel to specify the mixture
of the different classes. However, this should *not* be
confused with the weights vectors used in this paper, which by
virtue of the logistic transform are not producing mixtures of
classes at a voxel but instead an *approximation* to discrete
labels (to perform a hard segmentation).

Nevertheless, it might be of interest to extend the models described in this paper to include PVE estimation. Note, however, that whilst this makes sense conceptually for situations where the original idealised class distribution is a delta-function (e.g., structural image segmentation), it is more complex for situations where real class intensities vary within-class (e.g., FMRI activation map classification). This is because there is now an ambiguity between varying PVE fractions and real within-class signal variations; in fact, when considering the underlying causes of variation in activation intensity in FMRI, the two issues (varying activation strength and PVE) may well be physically/physiologically the same thing, in which case it may make no sense to attempt separate PVE (or fuzzy) modelling.

Commonly in FMRI, people use null hypothesis testing to label voxels, or clusters of voxels, as being ``active'' if they reject the null hypothesis at a given false positive rate (FPR) (Friston et al., 1995). This depends on knowing the null distribution (or non-activation distribution) for relevant statistics under the null hypothesis.

In contrast, spatial mixture modelling provides us with a way of estimating the ``activating'' and ``non-activating'' distributions from the data itself. One advantage of estimating the ``non-activating'' distribution (in particular) from the data is that it allows some adjustment for modelling assumption violations. An example of the value of explicit separate models for background and activation distributions is that common problems in time-series modelling (such as imperfect autocorrelation modelling or the presence of structured noise) can cause the "null" distribution to be both shifted and scaled away from its assumed form, invalidating null-hypothesis testing. However, it is worth noting that the limitation of spatial mixture modelling for FMRI (or any other application) inference is the validity of the distributional assumptions. Whilst the adaptivity of the parameters of the Gaussian distribution we use to model the ``non-activating'' class can protect against small modelling assumption violations, this will not protect us against all modelling assumption violations.

There is also a
well-known problem in null hypothesis testing of FMRI.
The problem is that if enough observations are made, then every
voxel in the brain will reject the null hypothesis (Friston et al., 2002).
This is
because in practice *no* voxels will show completely no
reponse to the stimulus, if only due to modelling inadequacies such
as unmodelled stimulus correlated motion or the point spread
function of the scanner. By doing mixture modelling we can overcome
this by instead of asking the question ``Is the activation
zero or not?'', we ask the question ``Is the activation bigger
than the overall background level of ``activation''?''.

There is also the question of inference flexiblity.
Because we have both the ``activating'' and ``non-activating''
distributions we can
calculate the probability of a voxel being
``activating'' *and* the probability of a voxel being
``non-activating''. This provides us with far more
inference flexiblity compared with null hypothesis testing.
We can still look to control the FPR by thresholding using
the probability of a voxel being
``non-activating''. But now we could also look to
control the true postive rate (TPR) by thresholding using
the probability of a voxel being
``activating''. Controlling the TPR may be of real
importance when using FMRI for pre-surgery planning.
Usually, however, we would use a third option of comparing
a voxel's probabilities of being in each of the classes.
In this case there is a natural choice of threshold
of assigning a voxel to the class with highest probability
of membership.

There are other limitations of current FMRI null hypothesis
techniques. A major issue is the incorporation of spatial
information. The most popular method for this is via Gaussian
Random Field Theory (Friston et al., 1994; Worsley et al., 1992). Gaussian Random
Field Theory provides null distributions for clusters of voxels in
SPMs. The problem is that to form clusters an arbitrary threshold
has to be chosen (there is no objective way of setting this
threshold, and its choice has a huge impact on the final
thresholded results). Also, to meet the assumptions of Gaussian
Random Field Theory, preprocessing spatial smoothing needs to be
performed, the amount of which is another arbitrary choice. This
is in stark contrast to the approach described in this paper for
which there are *no* parameter choices to be made, apart from
the final inference threshold.

In Fernandez and Green (2002) mixture models are used for disease mapping, in which the data is a count (the number of observed cases of disease) in each region that constitutes a geographical partition of the area of interest. Compellingly, they allow for the adaptive determination of the number of mixtures in the data via the use of reversible jumps MCMC sampling (Green, 1995). However, in tissue-type segmentation and FMRI activation classification the number of classes is normally known. Indeed, even in the absence of an assumed class, classifications were shown to be forced to zero using our method on artificial data when no activation or deactivation is supported by the data.

Fernandez and Green (2002) use a model related to our discrete labels
non-spatial mixture model with class proportions
(equation 5). However, they actually integrate
out the discrete labels as they are not interested in
classification. Instead, they are effectively looking to fit a
different mixture at each voxel. This would mean allowing the
class proportion parameters, , *and* the
distribution parameters,
, to vary locally. However,
they restrict themselves to only allowing the class proportion
parameters, , to vary locally (i.e. ).
Sanjay-Gopal and Hebert (1998) and Marroquin et al. (2003) use similar models,
but they have to hand-tune the parameter controlling the amount of
spatial regularisation of these class proportion parameters. In
contrast, Fernandez and Green (2002) adaptively determine the amount of
spatial regularisation. To spatially regularise using a Gaussian
MRF they need to map the local class proportion parameters
(
,
) to random variables which
are uniform from to . Hence, as we do in this
paper, they also use the logistic transform. However, to maintain
the interpretation of class proportion parameters they need to use
a large in equation 12 (they
require uniformity between 0 and for , whereas we
required delta functions at 0 and for , hence we
used a small value of ).

An interesting extension of the work in this paper would be to use Fernandez and Green (2002)'s non-stationary mixtures, whilst maintaining the spatially regularised classification of this paper. Non-stationary mixtures would allow local adaptation to spatial variations in the class distributions of intensity values such as can be observed particularly in structural brain MR images. The approach could have a non-stationary locally varying mixture (spatially regularised and ) with spatially regularised classification, , with the strengths of spatial regularisation determined adaptively and separately.

On a 2GHz Intel PC the technique takes approximately 30mins on a full volume. Future work needs to be done on alternative inference techniques to MCMC, which could speed up the inference on the model specified in this paper considerably. Broadly, there are two possiblities: (1) a point estimation (maximum a posterior) approach such as ICM or simulated annealing, (2) full posterior approximation approaches such as Laplace or Variational Bayes.