Automatic segmentation in medical imaging is an important tool. It allows for objective classification of regions in images based upon statistical models. Typical tasks include the segmentation of structural images into different matter types and of statistical parametric maps in functional imaging to perform inference.

Statistical segmentation can be achieved by modelling the histogram of the observations as being made up of a mixture of the distributions of the different classes that we want to segment the image into. For example, in brain segmentation tasks the observations are typically intensity levels in a structural brain image. The mixture is then made up of the distributions of the different matter types in the brain (e.g. white matter, grey matter, cerebral spinal fluid) (Guillemaud and Brady, 1997; Zhang et al., 2001; Held et al., 1997; Wells et al., 1996). The task is then to classify areas in the brain to a particular matter type.

Another example is in functional brain imaging, where the observations might be statistical parametric maps (SPMs). The SPMs are typically the result of a temporal linear model of the 4-dimensional functional brain imaging data (e.g. FMRI), and the SPM statistics represent the height (and its uncertainty) of the measured response to a neural stimulation. The mixture is then made up of distributions representing non-activation, activation, and possibly deactivation. The task is then to classify areas in the brain as either activating, deactivating, or not activating. Everitt and Bullmore (1999) considered a non-spatial approach to mixture modelling for SPMs in FMRI.

Spatial mixture models have also been developed to augment this histogram information with spatial regularisation. This is to encode the prior belief that neighbouring voxels in our images are likely to come from the same class. The spatial mixture modelling implemented in Marroquin et al. (2003); Zhang et al. (2001); Salli et al. (1999) introduced spatial regularisation of the classification labels using discrete Markov Random Field (MRF) priors. Salli et al. (1999) apply spatial mixture modelling to FMRI, and Zhang et al. (2001) apply spatial mixture modelling to structural brain segmentation. Essentially, Marroquin et al. (2003); Zhang et al. (2001); Salli et al. (1999) incorporate a discrete MRF prior on the spatial map of classification labels.

Svensén et al. (2000) use mixture models in a different and interesting way on FMRI data. Instead of having a mixture model on SPMs of the activation height, they effectively have a mixture model on the haemodynamic response function (HRF). This allows segmentation into regions with different characteristics of the HRF. They also use a discrete MRF prior on the spatial map of classification labels.

A discrete MRF prior simply penalises when neighbouring voxels are of a different class. Crucially, the amount of penalisation depends on an MRF control parameter which controls how strong this spatial regularisation is. Marroquin et al. (2003) refer to this as the parameter which controls the granularity of the field, and they discuss how the use of different values for this parameter can affect the resulting segmented field. However, a problem with Marroquin et al. (2003); Zhang et al. (2001); Salli et al. (1999); Svensén et al. (2000)'s use of discrete classification label MRFs is that the normalising constant (or partition function) of the MRF is not known analytically. This makes it very difficult to infer on the MRF control parameter. Consequently, Marroquin et al. (2003); Zhang et al. (2001); Salli et al. (1999); Svensén et al. (2000), effectively have to heuristically tune the MRF control parameter. This is a strong limitation. If we applied the model to a new dataset then the optimal MRF control parameter would need to be heuristically deduced, and would then be open to subjective judgement. It would be favourable to allow the actual data itself to automatically determine the amount of spatial regularisation.

Hartvig and Jensen (2000) also use a spatial mixture model, on FMRI data. Their solution to the problem of not knowing the partition function for discrete MRF priors is to use a different spatial prior altogether. The joint spatial prior over the map of labels is specified indirectly by specifying the marginal prior for the labels in a voxel neighbourhood. The advantage is that by choosing marginal priors which depend on summary statistics (such as the number of voxels in the voxel neighbourhood of the same class) the posterior probability that a voxel is activated can be calculated analytically. This provides inference which is much quicker than iterative techniques such as ICM, simulated annealing or MCMC.

However, Hartvig and Jensen (2000) themselves make the point that these marginal spatial priors are not as flexible as MRFs. There is also a problem when we come to adaptively determine the global parameters, such as the class distribution parameters and the parameters which control the strength of the spatial regularisation. This is because it is not obvious how to go from the modelling of marginals in voxel neighbourhoods to the joint posterior over the entire spatial map, and it is the latter that is required to do inference on the global parameters. As a result Hartvig and Jensen (2000) propose a contrast function to represent the joint posterior so that the global parameters can be determined. However, whilst a sensible choice of contrast function can be made, it is still somewhat arbitrary. Hence the difference between the resulting global parameter estimates and those that could be obtained if the joint posterior was available is unclear. In addition, this means that there is no formal way to assess or include uncertainty in these global parameters.

In this paper we propose an alternative spatial mixture model to Hartvig and Jensen (2000)'s in determining the amount of spatial regularisation. Unlike Hartvig and Jensen (2000) we use a discrete labels Markov Random Field. This paper describes a novel way to do spatial mixture modelling with a MRF with the amount of spatial regularisation determined unambiguously and adaptively from the data.

This is achieved by approximating the discrete labels with a vector of
continuous weights. The imposed properties of our weights
vector ensures that the new posterior distribution
is the same as the posterior distribution when we
use discrete labels. Crucially, instead of a discrete
MRF prior on the discrete labels, we can now use a
continuous Gaussian MRF (or conditionally specified
auto-regressive process) prior on parameters related to the
continuous weights, for which we *do*
know what the normalising constant is.

Subsequently, are able to automatically determine the continuous Gaussian MRF control parameter, allowing us to adaptively determine the amount of spatial regularisation. Heuristic tuning of control parameters is no longer required. All parameters in the model are adaptively determined from the data.