Multivariate Methods


Theme Co-ordinator: Luigi Palla

With the recent advent of Big Data problems in epidemiological studies, multivariate statistical analysis, which was already the domain of other scientific applications including psychology, genetics, chemistry, image analysis, nutrition, economics and social science, is increasingly of interest in medical applications. Standard application of statistical methods in medicine traditionally covers multiple regression, leaving out the plethora of methods that fall under the general title of ‘multivariate methods’. These methods have in common the attempt to model mathematically or statistically a set of variables measured on the same observations using matrix algebra and statistical and computational models and algorithms.

Examples of multivariate statistical methods include:

Descriptive (mathematical/geometric) methods like

1) Principal component analysis;

2) Correspondence analysis;

3) Multidimensional scaling;

4) Cluster analysis;

Methods based on a statistical model (i.e. with a model having a probability distribution) like

5) Factor analysis;

6) Discriminant analysis (canonical DA, logistic DA);

7) Partial least squares;

8) Reduced rank regression;

9) Simultaneous equations and instrumental variable models with multiple instruments;

10) Mediation analysis with high-dimensional mediators;

Machine learning methods like

9) Classification and regression trees (recursive partitioning);

10) Neural networks;

11) Support vector machines.

Within this theme we wish to explore and reflect on the challenges that the use of multivariate methods pose in their application and interpretation in epidemiology/medicine, as well as drawing technical expertise from the statistical methodology and the aforementioned disciplines where the methods were developed.