Matrix Visualization and Information Mining for Genomics/Proteomics Data Structure
Many statistical techniques, particularly multivariate methodologies, focus on extracting information from data and proximity matrices. Rather than rely solely on numerical characteristics, matrix visualization allows one to graphically reveal structure in a matrix. Visualization of gene expression profile has made matrix visualization the most popular data visualization tool nowadays. This talk will discuss some issues in the application of matrix visualization for exploring patterns embedded in gene expression profiles using Generalized Association Plots (GAP), a package we have been developed several years for general purposes matrix visualization. Most of the matrix visualization packages work for only continuous data profiles, not for categorical ones. Categorical generalized association plots (CateGAP) is an extension of GAP adapted for exploring high dimensional categorical data structure. Optimal scaling (Multiple Correspondence Analysis) is applied to compute the proximity matrices for objects as well as for variables and to obtain colours for coding all categories in the raw data matrix. Visualization of various types of biomedical categorical data profiles such as SNPs, COGs, and gene expression patterns will be demonstrated using CateGAP.
Associate Research Fellow, Institute of Statistical Science, Academia Sinica,, Taipei, Taiwan, R.O.C.
After receiving his Ph.D. in Mathematics (program in statistics) from the University of California, Los Angeles (USA) in 1992, Chun-houh Chen started his professional career as an assistant professor at The George Washington University (Department of Statistics/Computer and Information Systems), USA. In 1993, Dr. Chen went back to Taiwan to continue his research career at the Institute of Statistical Science, Academia Sinica. Development of data visualization methodologies with dimension reduction techniques such as SIR (sliced inverse regression), and pHd (principal Hessian direction) were the main focus of Dr. Chen’s early research works. Through years of collaboration and application works with psychiatrists at National Taiwan University, Dr. Chen shifted his research interests to dimension free matrix visualization and information mining. Dr. Chen’s group is now working on methodologies and applications of matrix visualization techniques for visualizing different types of genomics and proteomics data structure.