Array Informatics – Massively Multivariate Data Mining
The advent of microarray technology is changing the landscape for bioinformatics analysis. Most existing data analysis techniques applied in this area ignore the multivariate nature of the data. CSIRO Bioinformatics has developed new statistical methods (GeneRaVE) specifically designed for microarray (and similar) data. The methods can be used to study diagnostic, prognostic and other outcomes (eg LC50) use gene expression measurements in combination do not require pre-filtering down to a smaller number of genes. The algorithms are demonstrated with applications to existing data sets. Applied to pediatric ALL diagnosis (Yeoh et. al. Cancer Cell, vol.1, March 2002) they are able to identify 9 genes that have the same classification accuracy as the more than 200 used originally. In breast cancer prognosis (van’t Veer et al. Nature 415, 2002) the algorithms identify a 6 gene prognostic that performs as well as the original 70 genes.
Leader, Bioinformatics for Human Health CSIRO Mathematical and Information Sciences North Ryde, NSW Australia
Glenn Stone is currently Leader of Bioinformatics for Human Health at CSIRO’s Mathematical and Information Sciences Division. He has spent many years in applied data mining roles including leading a group of ten statistician and machine learners at CSIRO, and in research at the Insurance Australia Group, Australia’s largest general insurer.