Sunday 3 March 2013

Natural Language Processing

Description:
Natural Language Processing addresses fundamental questions at the intersection of human languages and computer science. How can computers acquire, comprehend and produce English? How can computational methods give us insight into observed human language phenomena? How can you get a job at Google? In this interdisciplinary introductory course, you will learn how computers can do useful things with human languages, such as translate from French into English, filter junk email, extract social networks from the web, and find the main topics in the day's news. You will also learn about how computational methods can help linguists explain language phenomena, including automatic discovery of different word

Textbook & Resources

Required book (reading online version is fine):
Jurafsky and Martin
"Speech and Language Processing"
Prentice Hall; (January 26, 2000)
ISBN: 0130950696
Second edition is available online.

Recommended book :
Lutz and Ascher
"Learning Python"
O'Reilly
ISBN: 0596002815

Additional sources of interest :
Manning and Schutze
"Statistical Natural Language Processing"
MIT Press; 1st edition (June 18, 1999)
ISBN: 0262133601
James Allen. Natural Language Understanding. The Benajmins/Cummings Publishing Company Inc. 1994. ISBN 0-8053-0334-0.
Tom Mitchell. Machine Learning. McGraw Hill, 1997. ISBN 0070428077
Cover, T. M. and J. A. Thomas: Elements of Information Theory. Wiley. 1991. ISBN 0-471-06259-6.
Charniak, E.: Statistical Language Learning. The MIT Press. 1996. ISBN 0-262-53141-0.
Jelinek, F.: Statistical Methods for Speech Recognition. The MIT Press. 1998. ISBN 0-262-10066-5.
Download slides here
Topics Readings
Introduction and Overview
Welcome, motivations, what is Natural Language Processing, hands-on demonstrations. Ambiguity and uncertainty in language. The Turing test. Course outline and logistics. Questionaire. Handout. Slides.
JM Ch 1
Optional:
MS Ch 1, historical overview.

Regular Expressions
Chomsky hierarchy, regular languages, and their limitations. Finite-state automata. Practical regular expressions for finding and counting language phenomena. A little morphology. In class demonstrations of exploring a large corpus with regex tools. Slides.
JM Ch 2
Programming in Python
An introduction to programming in Python. Why Python? Variables, numbers, strings, arrays, dictionaries, conditionals, iteration. The NLTK (Natural Language Toolkit), with demonstrations. Slides.
Refer to online programming resources, and Learning Python, at your own pace.
String Edit Distance and Alignment
Key algorithmic tool: dynamic programming, first a simple example, then its use in optimal alignment of sequences. String edit operations, edit distance, and examples of use in spelling correction, and machine translation. Slides.
JM Ch 3.11
Optional extras: web
Context Free Grammars
Constituency, CFG definition, use and limitations. Chomsky Normal Form. Top-down parsing, bottom-up parsing, and the problems with each. The desirability of combining evidence from both directions. Slides.
JM Ch 13.1-13-.3
Non-probabilistic Parsing
Efficient CFG parsing with CYK, another dynamic programming algorithm. Also, perhaps, the Earley parser. Designing a little grammar, and parsing with it on some test data. Slides.
JM Ch 13.4
Probability
Introduction to probability theory--the backbone of modern natural language processing. Events, and counting. Joint and conditonal probability, marginals, independence, Bayes rule, combining evidence. Examples of applications in natural language. (Plus: use a little calculus!?) Slides.
 
Information Theory
What is information? Measuring it in bits. The "noisy channel model." The "Shannon game"--motivated by language! Entropy, cross-entropy, information gain. Its application to some language phenomena. Slides.
JM Ch 4.10-4.11
Information Theory, continued
Including helpful a quiz.
 
Language modeling and Naive Bayes
Probabilistic language modeling and its applications. Markov models. N-grams. Estimating the probability of a word, and smoothing. Generative models of language. Their application to building an automatically-trained email spam filter, and automatically determining the language (English, French, German, Dutch, Finnish, Klingon?). Slides.
JM Ch 4.1-4.9
NO CLASS (This Tuesday follows UMass Monday schedule.) Optional midterm review session 5:00pm.  
Midterm
Part of Speech Tagging and Hidden Markov Models
The concept of parts-of-speech, examples, usage. The Penn Treebank and Brown Corpus. Probabilistic (weighted) finite state automata. Hidden Markov models (HMMs), definition and use. Slides.
JM Ch 5
Viterbi Algorithm for Finding Most Likely HMM Path
Dynamic programming with Hidden Markov Models, and its use for part-of-speech tagging, Chinese word segmentation, prosody, information extraction, etc. (No slides, board work only.)
JM Ch 6.1-6.4
Probabilistic Context Free Grammars
Weighted context free grammars. Weighted CYK. Pruning and beam search. Slides.
JM Ch 12
Parsing with PCFGs
A treebank and what it takes to create one. The probabilistic version of CYK. Also: How do humans parse? Experiments with eye-tracking. Modern parsers. Slides.
JM Ch 13
Maximum Entropy Classifiers
The maximum entropy principle, and its relation to maximum likelihood. The need in NLP to integrate many pieces of weak evidence. Maximum entropy classifiers and their application to document classification, sentence segmentation, and other language tasks. Slides.
JM Ch 6.6-6.7
Maximum Entropy Markov Models & Conditional Random Fields
Part-of-speech tagging, noun-phrase segmentation and information extraction models that combine maximum entropy and finite-state machines. State-of-the-art models for NLP. Guest lecture by Greg Druck. Slides.
JM Ch 6.8
Lexical Semantics
Guest lecture by Chris Potts, Professor, UMass Linguistics. Slides.
JM Ch 24, Section 1
Dirichlet Multinomial Distributions
Mathematics of Multinomial and Dirichlet distributions, Dirichlet as a smoothing for multinomials. Guest lecture by David Mimno. (No slides.)


JM Ch 20
(labeled as 19)
Project Proposals
Student groups give short presentations on their project idea. Feedback from the rest of class.
 
Machine Translation
Probabilistic models for translating French into English. Alignment, translation, language generation. IBM Model #1. Slides.
JM Ch 24
Machine Translation 2
IBM Model #2, and Expectation Maximization. MT evaluation. (Continuation of previous slides.)
JM Ch 24
NO CLASS (Thanksgiving Holiday)  
Unsupervised Language Discovery
Automatically discovering verb subcategorization. Slides.
 
Topic Models and Language in Social Networks
Topic models. Language modeling integrated into social network analysis. (Continuation of previous slides.)
 
Pragmatics
Guest lecture by Chris Potts, Professor, UMass Linguistics. Slides.
JM Ch 21.3
Selected readings.
Information Extraction & Reference Resolution
Building a database of person & company relations from 10 years of New York Times. Building a database of job openings from 70k company Web pages. Various methods, including HMMs. Models of anaphora resolution. Machine learning methods for coreference. Slides1 Slides2
JM Ch 22
Selected readings.

No comments:

Post a Comment