Structural Bioinformatics and Literature Mining
Our group is interested in developing computational algorithms for solving biological problems as well as natural language tools for biological literature mining. Since experimental data is error-prone, we are especially interested in designing error-tolerant algorithms that can deal with real data rather than those that are theoretically sound but impractical. The overwhelming volumes of biological literature make text mining an indispensable tool for biological knowledge extraction and verification. We have developed ontology-based information extraction tools for text processing emphasizing in all kinds of structural matching, such as biological concepts (protein names, gene names, etc.), phrases, templates, sentences, frames, and scripts. Our group is highly interdisciplinary in that, we borrowed knowledge-based techniques from natural language and applied them to the prediction of protein (secondary and tertiary) structures; and in reverse, applied error-tolerant algorithms to the design of robust natural language matching tools in text processing.
Professor and Research Fellow, Institute of Information Science, Academia Sinica, Taipei, Taiwan
I was trained in Mathematics at National Taiwan University and then obtained a PhD in Operations Research from Cornell university. From 1980 to 1989, I taught in Northwestern University working mostly in theoretical graph algorithms. After returning to Taiwan in 1989, I became interested in Chinese natural language processing. In 1992 we produced a very popular Chinese character input method, which has since helped over a million people in Taiwan to type Chinese in computer. The software is based on an algorithm that automatically translates Chinese phonetic symbols to characters based on the contextual information. From 1997 and on, my interest in algorithms has gradually shifted toward biological computing. We produced the first error-tolerant algorithm for the DNA physical mapping problem. In the past few years, we have combined combinatorial optimization, machine learning and knowledge-based approaches to tackle protein structure prediction as well as biological literature mining problems.