Estimation of Variance Components

In the previous sections it was assumed that all variance terms are known a-priori. In practice, these quantities are unknown and will need to be estimated as part of the model fitting. Variance component estimation is a challenging task in itself, having generated a variety of approaches. Any approach to variance estimation (or combination of approaches) can easily be combined with the multi-level GLM to provide a practical multi-level method; this section discusses some of the more popular approaches.

There are no differences between the first level and any other level from the modelling perspective; however, in practice, estimating the variances is substantially different. At the first-level there typically exists considerable serial auto-correlation (in FMRI time series data) but with a large number of observations. A considerable amount of literature is devoted to specifying the form for the first-level covariance matrix and estimating its parameters in the single session case [17,14,19]. In contrast, higher level variance component estimation is typically troubled by having very few observations, while serial auto-correlation between these normally is, and often can be, ignored.

When the number of observations is very low, this imposes restrictions in the types of model which are practically estimable. For instance, while it is possible to formulate a model where the variance about the group mean is different for each session/subject, such a model is not estimable because there is only a single measurement per session/subject.

Several approaches to estimation within the multi-level GLM currently exist, of which the parametric techniques can, most easily, be split into Bayesian and non-Bayesian approaches.

Classically, variance components tend to be estimated separately using iterative estimation schemes employing Ordinary Least Squares (OLS), Expectation Maximisation (EM) or Restricted Maximum Likelihood (ReML), see [16] for details.

The specific choice of the variance component estimator will also determine the 'effective' degrees of freedom of the variance estimate. For ordinary least squares estimates, the degrees of freedom simply is . When more sophisticated estimation techniques and/or variance structures are being used, this changes significantly [19,10]; e.g. when enforcing mixed- effects variance to be greater than the fixed-effects variance, this can increase the effective degrees of freedom, a quantity typically estimated using a Satterthwaite approximation [15].

As an example of a non-Bayesian approach, [19] estimates variance components at each level of the split-level model separately. First-level estimation incorporates autoregressive AR() noise estimated from the lag--autocovariance matrices and utilises the pre-whitening approach to generate BLUEs for the first-level parameters. At higher levels, they propose EM for estimation of the random effects variance contribution, in order to reduce bias in the variance estimation - a potential problem in higher-level analyses if simple OLS were used (note that [12] shows that this is not necessary at first-level). Positivity of the random- effects variance, avoiding what is known as the 'negative variance problem' [11] (where mixed-effects variance estimates are lower than fixed-effects variances implying negative random-effects variance), is partially addressed but not strictly enforced. As a separate stage, in order to boost effective degrees of freedom for later mixed-effects inference, the authors propose to post-hoc spatially regularise the random-effects variance via smoothing a random variance ratio image. As a consequence, the resulting analysis does not produce a mixed-effects analysis. It should be noted that this specific form of spatial regularisation (or indeed any) is not a necessary ingredient of an EM approach to variance estimation.

In contradistinction [6] have proposed an empirical-Bayesian approach [6] for estimation of the full single-level model. Unlike the previous case and the techniques advocated in this paper, this relates the parameters of interest to the full set of original data, i.e. does not utilise the 'summary statistics' approach. Parameter and variance component estimation is no longer separated. Conditional posterior point estimates are generated using EM which give rise to posterior probability maps.

More recently, [1] have placed the higher-level GLM
estimation in
a *fully* Bayesian framework. Using appropriate reference
priors, the method is based on Markov-Chain-Monte-Carlo sampling
from the full posterior distribution of
. Under the parametric model
assumptions, the posterior has a non-central -distribution which
the method fits to the MCMC samples.
This is done in order to both estimate the appropriate degrees
of freedom for the mixed-effects inference, and in order to avoid
having to sample densely far into the tail of the posterior.
As in the empirical-Bayesian case, parameters
and their relevant variance components are estimated together.
Also, as the number of degrees of freedom is estimated as part of the
-distribution fit there is no need to separately approximate this
quantity e.g. via the Satterthwaite approximation [16].
This technique provides for an unbiased and efficient estimation of
multi- level GLMs within the 'summary statistics' approach, also
strictly enforcing positivity of variance components at all levels.
This technique, combined with a pre-whitening approach to
first-level estimation [17] has been implemented as
part of FSL [1,7].