next up previous
Next: Discussion Up: Results Previous: Results

   
Consistency Test

The results presented above are purely qualitative, based on the subjective assessment of many individuals. However, as is generally the case for many registration problems in practice, there was no ground truth available to test the registration against. This makes the area of quantitative assessment of methods quite difficult.

In order to test the method more quantitatively, a comparative consistency test was performed. This test aims to measure the robustness, rather than the accuracy [West et al., 1997] of the registration method. Robustness is defined here as the ability to get close to the global minimum on all trials, whereas accuracy is the ability to precisely locate a (possibly local) minimum of the cost function. For example, one method might always (say in over 99.99% of cases) be between 0.2mm and 0.6mm from the best possible solution compared to another method that was often less than 0.1mm from the best solution but would sometimes (say in 5% of cases) fail to find the global minimum and get trapped in a local minimum which could be in excess of 10mm from the best solution. In this case the former method would be considered more robust than the latter while the latter method would be more accurate but less robust. Ideally a registration method should be both.

The consistency test is designed to assess one necessary, but not sufficient, aspect of robustness. That is, the ability to find the same solution regardless of the initial position. Any robust method, which always finds the global minimum, will give the same solution each time whereas a non-robust method which can be trapped by a local minimum is likely to give different solutions depending on the initial position. However, this condition is not sufficient in determining robustness as the same, consistent, solution may just be a large local minimum, rather than the global minimum. Therefore it is also necessary to check that the registration solution is acceptable to someone trained in neuroanatomy. This aspect was addressed in the trials described above.

More specifically, the consistency test for an individual image Iinvolved taking the image and applying several pre-determined affine transformations, Aj to it. All these images (both transformed and un-transformed) were registered to a given reference image, Ir, giving transformations Tj. If the method was consistent the composite transformations $T_j \circ A_j$ should all be the same, which is illustrated in figure 6. Moreover, an RMS deviation between the composite registration and the registration from the un-transformed case allows quantification of the consistency.


  
Figure 6: Illustration of the consistency test for a single image. An image (top) has a number of initial affine transformations Aj (rotations are used in this study) applied to it. The resulting images (middle) are then registered to the reference image (bottom), giving transformations Tj. Therefore, the overall transformation from the initial image to the reference image is $F_j = T_j \circ A_j$, and these are compared with T0 which is the registration of the initial image directly to the reference image. For a consistent method, all the transformations, Fj, should be the same as T0.
\begin{figure}
\begin{center}
\psfig{figure=consistencyeg.ps, width=0.45\textwidth}\end{center} \end{figure}

The particular test used (which is also described in [Jenkinson and Smith, 1999]) used 18 different images as the floating images (like the one shown in figure 3a), all with the MNI 305 brain [Collins et al., 1994] as the reference image. The 18 images were all $256 \times 256 \times 30$, T2 weighted MRI images with voxel dimensions of 0.93mm by 0.93 mm by 5mm, while the MNI 305 template is a $172 \times 220 \times 156$, T1 weighted MRI images with voxel dimensions of 1mm by 1 mm by 1mm.

In addition to FLIRT, several other registration packages were tested. These were AIR [Woods et al., 1993], SPM [Friston et al., 1995], UMDS [Studholme et al., 1996] and MRITOTAL [Collins et al., 1994]. These methods were chosen because the authors' implementations were available, and so this constituted a fair test as opposed to a re-implementation of a method described in a paper, where often the lack of precise implementation details makes it difficult to produce a good working method.

The results of such a test, using six different rotations about the Anterior-Posterior axis, are shown in figure 7. It can be seen that only FLIRT and MRITOTAL were consistent with this set of images. This indicates that the other methods, AIR, SPM and UMDS, get trapped in local minima more easily, and are not as robust. In particular, rotations of only $0.5^\circ$ sometimes resulted in large differences in the final registrations, showing how sensitive the methods are to initial position.


  
Figure 7: Results of the consistency study, plotting RMS deviation (in mm) versus image number for (a) AIR, (b) SPM, (c) UMDS, (d) MRITOTAL and (e) FLIRT. For each of the 18 source images (T2 weighted MRI images with voxel dimensions of 0.93mm by 0.93 mm by 5mm) there are 6 results corresponding to initial starting rotations of -10,-2,-0.5,0.5,2, and 10 degrees about the y-axis (anterior-posterior axis). All of the methods, except FLIRT and MRITOTAL, show large deviations and are therefore inconsistent and non-robust.
\begin{figure*}
\begin{center}
\begin{tabular}{ccc}
\psfig{figure=air12.ps,heig...
...igwidth} } \\
\multicolumn{3}{c}{ (e) }
\end{tabular}\end{center} \end{figure*}

A further consistency test was then performed comparing only MRITOTAL and FLIRT. This test used initial scalings rather than rotations. The reason that this is important is that MRITOTAL uses a purely local optimisation method (Gradient Descent) but relies on initial pre-processing to provide a good starting position. This pre-processing is done by finding the principle axes of both volumes and initially aligning them. However, this initial alignment does not give any information about scaling and is dependent on the FOV, since when the edges of the volume truncate the image it can have a significant impact on the principle axes that are computed.

The results of the scaling consistency test are shown in figure 8. It can be seen that, although generally consistent, in three cases MRITOTAL produces registrations that deviate by more than 20mm (RMS) from each other. In contrast, FLIRT was consistent (less than 2mm RMS) for all images.


  
Figure 8: Results of the scale consistency study, plotting RMS deviation (in mm) versus image number for (a) MRITOTAL and (b) FLIRT. For each of the 18 source images (T2 weighted MRI images with voxel dimensions of 0.93mm by 0.93 mm by 5mm) there are 6 results corresponding to initial scalings of 0.7, 0.8, 0.9, 1.1, 1.2 and 1.3 about the Centre of Mass. In three cases MRITOTAL shows large deviations and is therefore inconsistent and non-robust.
\begin{figure*}
\begin{center}
\begin{tabular}{ccc}
\psfig{figure=mritotalscale...
...gheight,width=\figwidth}\\
(a) & & (b)
\end{tabular}\end{center} \end{figure*}


next up previous
Next: Discussion Up: Results Previous: Results
Mark Jenkinson
2000-05-10