Next: Multiway multidimensional data reduction Up: tr00dl2 Previous: Introduction

Shortcomings of current statistical methods

EEG mapping data analysis gives rise to some statistical problems when looking for treatments effects with inferential tests, mainly because of the structure of the data leading to multiple comparisons: in time points, in variables (EEG bands) and in locations (electrodes). One must first notice that multiple testing correction such as Bonferroni adjustment is not applicable here as there is a large amount of highly correlated variables to be considered: 1008 measures for each time point (among about 10) and for each dose (say 3 doses). Multivariate parametric methods such as MANOVA are not to be recommended either mainly because of the small size of the sample (here 12 subjects) making the assumptions difficult to assess and models not practical to apply ( e.g. covariance structure estimation). Based on these points to analyse a wake-EEG (pharmaco-EEG data) the statistical method often shows a two level analysis as found in [3]. Their method or at least their approach is widely used, so comments will only be based on this one. The first level called Statistical Decision Tree, an exploratory level, uses non-parametric testing methods ; significant results (at the leaves of the tree) are submitted to the second level, a confirmatory level using qualitative criteria called Descriptive Data Analysis (DDA) and a quantitative confirmation using Principal Component Analysis (PCA). After the first level, two kinds of series of maps, one at each time point, are then produced: coloured maps of p-values for dose versus placebo comparisons (each map is 28 points which can be interpolated), and the SDT-maps which plots at each electrode the value of the ``decision" of the direction of drug effect [3]). At the second level DDA intends to give confirmation of SDT results meeting a few qualitative criteria e.g. spatial coherence and time coherence of significant results (i.e. clusters of electrodes that lasts in time), then SDT/DDA results are confirmed or not by using a t-test on the principal component (PCA of the electrodes) ``overlapping" the spatial SDT/DDA result (e.g. a frontal effect FP1, FP2, Fz if overlapped by the back/front principal component). First of all, the term ``Statistical Decision" in SDT is misleading as the method does not give any level of confidence on the final conclusion. The SDT procedure is applied electrode by electrode and time after time (in comparison to baseline), therefore multiple testing problem is still present. Also problematic is the comparison of probability maps from time to time or from EEG-band to EEG-band. In particular, a probability map does not enable quantification of the effect observed especially when produced from non-parametric tests. The only possible comparison is in fact the spread, if it is a true spread (multiple testing). Certainly the DDA procedure provides some control on false positives. It introduces qualitative criteria on coherence of the results. If these criteria seem common sense they might not support all the designs or drug analysed, and they bring an interpretation or conclusive step into the statistical analysis from which interpretations and conclusions are made. The PCA confirmation plays a similar role concerning multiple testing, but this time in an acceptable way as it is a statistical control. Notice it would seem more logical to start with PCA as an exploratory tool, then select electrodes most contributing to PC and/or meeting DDA criteria, and then to confirm hypothesis with SDT. The main problem with the current method seems to be the multiple testing issue. This has been an issue in the medical imaging literature when looking at activation with different imaging techniques such as PET. Solutions to exhibit the distribution of local maxima can be derived using random fields theory as in [22] or permutation testing procedure as in [7]. Permutation testing would be better in this context as here only 28 ``pixels" forms the ``image", making the smooth random field approximation difficult to keep. The approach is univariate in a highly multivariate context with three particular features : time correlation, spatial correlation and frequency-band correlation. Using univariate models (or separate models) does not account for spatial, temporal, and frequency structures of the data, which are discussed only a posteriori and qualitatively when making conclusions from the results. The idea of using data reduction techniques is promising because it can at first reduce the problem of multiple testing, and secondly provide some modelisation of the structure. PCA is not fully appropriate for the multi-entries array as only bilinear modelisation is possible, one must use a data reduction method allowing multi-linear modelisation. Multiway data reduction techniques have been already used on multichannel evoked potentials data as in [6] with the PARAFAC method. In this later paper the signal was in input instead of its Fourier transform (summarised on bands) as in our case (i.e. EEG analysis instead of qEEG analysis). PARAFAC method seems more appropriate in EEG analysis as the focus is more on modelisation of the time course (long) than on decomposition of the variability. Choosing a tradeoff between modelisation and decomposition more focused on this later, the purpose of this paper is to describe an other method handling multi-entries data which conserves most of the properties of the PCA method, with therefore easier understanding.

Next: Multiway multidimensional data reduction Up: tr00dl2 Previous: Introduction

Didier Leibovici 2001-09-04