Description:
Natural Language Processing addresses fundamental questions at the intersection of human languages and computer science. How can computers acquire, comprehend and produce English? How can computational methods give us insight into observed human language phenomena? How can you get a job at Google? In this interdisciplinary introductory course, you will learn how computers can do useful things with human languages, such as translate from French into English, filter junk email, extract social networks from the web, and find the main topics in the day's news. You will also learn about how computational methods can help linguists explain language phenomena, including automatic discovery of different word
Second edition is available online.
Recommended book :
Natural Language Processing addresses fundamental questions at the intersection of human languages and computer science. How can computers acquire, comprehend and produce English? How can computational methods give us insight into observed human language phenomena? How can you get a job at Google? In this interdisciplinary introductory course, you will learn how computers can do useful things with human languages, such as translate from French into English, filter junk email, extract social networks from the web, and find the main topics in the day's news. You will also learn about how computational methods can help linguists explain language phenomena, including automatic discovery of different word
Textbook & Resources
Required book (reading online version is fine):
Jurafsky and Martin "Speech and Language Processing" Prentice Hall; (January 26, 2000) ISBN: 0130950696 |
Recommended book :
Lutz and Ascher "Learning Python" O'Reilly ISBN: 0596002815 |
Additional sources of interest :
Manning and Schutze "Statistical Natural Language Processing" MIT Press; 1st edition (June 18, 1999) ISBN: 0262133601 |
James Allen. Natural Language Understanding. The Benajmins/Cummings Publishing Company Inc. 1994. ISBN 0-8053-0334-0.
Tom Mitchell. Machine Learning. McGraw Hill, 1997. ISBN 0070428077
Cover, T. M. and J. A. Thomas: Elements of Information Theory. Wiley. 1991. ISBN 0-471-06259-6.
Charniak, E.: Statistical Language Learning. The MIT Press. 1996. ISBN 0-262-53141-0.
Jelinek, F.: Statistical Methods for Speech Recognition. The MIT Press. 1998. ISBN 0-262-10066-5.
Download slides here
Topics | Readings |
---|---|
Introduction and Overview Welcome, motivations, what is Natural Language Processing, hands-on demonstrations. Ambiguity and uncertainty in language. The Turing test. Course outline and logistics. Questionaire. Handout. Slides. |
JM Ch 1 Optional: MS Ch 1, historical overview. |
Regular Expressions Chomsky hierarchy, regular languages, and their limitations. Finite-state automata. Practical regular expressions for finding and counting language phenomena. A little morphology. In class demonstrations of exploring a large corpus with regex tools. Slides. |
JM Ch 2 |
Programming in Python An introduction to programming in Python. Why Python? Variables, numbers, strings, arrays, dictionaries, conditionals, iteration. The NLTK (Natural Language Toolkit), with demonstrations. Slides. |
Refer to online programming resources, and Learning Python, at your own pace. |
String Edit Distance and Alignment Key algorithmic tool: dynamic programming, first a simple example, then its use in optimal alignment of sequences. String edit operations, edit distance, and examples of use in spelling correction, and machine translation. Slides. |
JM Ch 3.11 Optional extras: web |
Context Free Grammars Constituency, CFG definition, use and limitations. Chomsky Normal Form. Top-down parsing, bottom-up parsing, and the problems with each. The desirability of combining evidence from both directions. Slides. |
JM Ch 13.1-13-.3 |
Non-probabilistic Parsing Efficient CFG parsing with CYK, another dynamic programming algorithm. Also, perhaps, the Earley parser. Designing a little grammar, and parsing with it on some test data. Slides. |
JM Ch 13.4 |
Probability Introduction to probability theory--the backbone of modern natural language processing. Events, and counting. Joint and conditonal probability, marginals, independence, Bayes rule, combining evidence. Examples of applications in natural language. (Plus: use a little calculus!?) Slides. |
|
Information Theory What is information? Measuring it in bits. The "noisy channel model." The "Shannon game"--motivated by language! Entropy, cross-entropy, information gain. Its application to some language phenomena. Slides. |
JM Ch 4.10-4.11 |
Information Theory, continued Including helpful a quiz. |
|
Language modeling and Naive Bayes Probabilistic language modeling and its applications. Markov models. N-grams. Estimating the probability of a word, and smoothing. Generative models of language. Their application to building an automatically-trained email spam filter, and automatically determining the language (English, French, German, Dutch, Finnish, Klingon?). Slides. |
JM Ch 4.1-4.9 |
NO CLASS (This Tuesday follows UMass Monday schedule.) Optional midterm review session 5:00pm. | |
Midterm | |
Part of Speech Tagging and Hidden Markov Models The concept of parts-of-speech, examples, usage. The Penn Treebank and Brown Corpus. Probabilistic (weighted) finite state automata. Hidden Markov models (HMMs), definition and use. Slides. |
JM Ch 5 |
Viterbi Algorithm for Finding Most Likely HMM Path Dynamic programming with Hidden Markov Models, and its use for part-of-speech tagging, Chinese word segmentation, prosody, information extraction, etc. (No slides, board work only.) |
JM Ch 6.1-6.4 |
Probabilistic Context Free Grammars Weighted context free grammars. Weighted CYK. Pruning and beam search. Slides. |
JM Ch 12 |
Parsing with PCFGs A treebank and what it takes to create one. The probabilistic version of CYK. Also: How do humans parse? Experiments with eye-tracking. Modern parsers. Slides. |
JM Ch 13 |
Maximum Entropy Classifiers The maximum entropy principle, and its relation to maximum likelihood. The need in NLP to integrate many pieces of weak evidence. Maximum entropy classifiers and their application to document classification, sentence segmentation, and other language tasks. Slides. |
JM Ch 6.6-6.7 |
Maximum Entropy Markov Models & Conditional Random Fields Part-of-speech tagging, noun-phrase segmentation and information extraction models that combine maximum entropy and finite-state machines. State-of-the-art models for NLP. Guest lecture by Greg Druck. Slides. |
JM Ch 6.8 |
Lexical Semantics Guest lecture by Chris Potts, Professor, UMass Linguistics. Slides. |
JM Ch 24, Section 1 |
Dirichlet Multinomial Distributions Mathematics of Multinomial and Dirichlet distributions, Dirichlet as a smoothing for multinomials. Guest lecture by David Mimno. (No slides.) |
JM Ch 20 (labeled as 19) |
Project Proposals Student groups give short presentations on their project idea. Feedback from the rest of class. |
|
Machine Translation Probabilistic models for translating French into English. Alignment, translation, language generation. IBM Model #1. Slides. |
JM Ch 24 |
Machine Translation 2 IBM Model #2, and Expectation Maximization. MT evaluation. (Continuation of previous slides.) |
JM Ch 24 |
NO CLASS (Thanksgiving Holiday) | |
Unsupervised Language Discovery Automatically discovering verb subcategorization. Slides. |
|
Topic Models and Language in Social Networks Topic models. Language modeling integrated into social network analysis. (Continuation of previous slides.) |
|
Pragmatics Guest lecture by Chris Potts, Professor, UMass Linguistics. Slides. |
JM Ch 21.3 Selected readings. |
Information Extraction & Reference Resolution Building a database of person & company relations from 10 years of New York Times. Building a database of job openings from 70k company Web pages. Various methods, including HMMs. Models of anaphora resolution. Machine learning methods for coreference. Slides1 Slides2 |
JM Ch 22 Selected readings. |
No comments:
Post a Comment