Title: Some Applications of NLP Techniques for Modeling Biological Sequences Speaker: Professor Aravind K. Joshi Department of Computer and Information Science Room 555, Moore School University of Pennsylvania 200 S. 33rd Street, Philadelphia, PA 19104 Phone: (215) 898-8540 email: joshi@linc.cis.upenn.edu Summary: In this tutorial we will describe some of the NLP techniques that have been used in modeling biological sequences such as DNA, RNA, and Proteins. First we will briefly review techniques such as HMM's and stochastic CFG's which have already been used for sequence analysis and structural analysis. Recently, for more complex structures, nlp techniques for grammars more powerful than CFG's have been used. We will discuss this last topic in some detail and also describe some open problems for which nlp techniques might be useful. Tutorial Outline: (Time: 90 min, estimated time indicated in parentheses) [1] General Introduction (5) [2] A brief review of applications of HMM's in sequence modeling (15) [3] Structural analysis, secondary and higher structures -- Review of stochastic context-free grammars for structural analysis (15) -- More complex secondary and higher structures (10) -- Applications of grammars more powerful than CFG's (25) [4] Relationship of energy calculations and parsing (15) [5] Summary (5) Some Selected References: [1] Biological Sequence Analysis, R. Durbin, S. Eddy, A. Krogh, and G. Mitchison, Cambridge University Press, 1998. [2] Elena Rivas and Sean R. Eddy. The language of RNA: a formal grammar that includes pseudoknots. Bioinformatics, 16(4):334-340, 2000. [3] Yasuo Uemura, Aki Hasegawa, Satoshi Kobayashi, and Takashi Yokomori. Tree-adjoining grammars for RNA structure prediction. Theoretical Computer Science, 10:277-303, 1999. [4] David Chiang and Aravind K. Joshi. Formal grammars for estimating partition functions of double-stranded chain molecules. Proceedings of Human Language Technology Conference, San Diego, March 2002. Bio: My early research was in information theory and communication theory. Since 1958, I have been working almost continuously on problems that overlap computer science and linguistics. Much of this research is now classified under formal linguistics, natural language processing, artificial intelligence, or cognitive science, depending on the topic. These categories are not necessarily exclusive. More specifically, I have been working on syntactic and semantic representations for language structure, relationship of language structure to logic, mathematical linguistics, theory of computation as it relates to natural language processing, parsing algorithms, design and implementation of various systems for natural language processing, especially question-answer systems as interfaces to databases, theories of representation and inferencing in natural language, computational aspects of discourse, psychological implications of processing models, problems in processing certain kinds of bilingual utterances, some aspects of language learning, and other related problems.