Real Computer Science begins where we almost stop reading ...: Natural Language Processing

Sunday, 3 March 2013

Natural Language Processing

Description:
Natural Language Processing addresses fundamental questions at the intersection of human languages and computer science. How can computers acquire, comprehend and produce English? How can computational methods give us insight into observed human language phenomena? How can you get a job at Google? In this interdisciplinary introductory course, you will learn how computers can do useful things with human languages, such as translate from French into English, filter junk email, extract social networks from the web, and find the main topics in the day's news. You will also learn about how computational methods can help linguists explain language phenomena, including automatic discovery of different word

Textbook & Resources

Required book (reading online version is fine):

Jurafsky and Martin
"Speech and Language Processing"
Prentice Hall; (January 26, 2000)
ISBN: 0130950696

Second edition is available online.

Recommended book :

Lutz and Ascher
"Learning Python"
O'Reilly
ISBN: 0596002815

Additional sources of interest :

Manning and Schutze
"Statistical Natural Language Processing"
MIT Press; 1st edition (June 18, 1999)
ISBN: 0262133601

James Allen. Natural Language Understanding. The Benajmins/Cummings Publishing Company Inc. 1994. ISBN 0-8053-0334-0.

Tom Mitchell. Machine Learning. McGraw Hill, 1997. ISBN 0070428077

Cover, T. M. and J. A. Thomas: Elements of Information Theory. Wiley. 1991. ISBN 0-471-06259-6.

Charniak, E.: Statistical Language Learning. The MIT Press. 1996. ISBN 0-262-53141-0.

Jelinek, F.: Statistical Methods for Speech Recognition. The MIT Press. 1998. ISBN 0-262-10066-5.

Download slides here

Topics	Readings
Introduction and Overview Welcome, motivations, what is Natural Language Processing, hands-on demonstrations. Ambiguity and uncertainty in language. The Turing test. Course outline and logistics. Questionaire. Handout. Slides.	JM Ch 1 Optional: MS Ch 1, historical overview.
Regular Expressions Chomsky hierarchy, regular languages, and their limitations. Finite-state automata. Practical regular expressions for finding and counting language phenomena. A little morphology. In class demonstrations of exploring a large corpus with regex tools. Slides.	JM Ch 2
Programming in Python An introduction to programming in Python. Why Python? Variables, numbers, strings, arrays, dictionaries, conditionals, iteration. The NLTK (Natural Language Toolkit), with demonstrations. Slides.	Refer to online programming resources, and Learning Python, at your own pace.
String Edit Distance and Alignment Key algorithmic tool: dynamic programming, first a simple example, then its use in optimal alignment of sequences. String edit operations, edit distance, and examples of use in spelling correction, and machine translation. Slides.	JM Ch 3.11 Optional extras: web
Context Free Grammars Constituency, CFG definition, use and limitations. Chomsky Normal Form. Top-down parsing, bottom-up parsing, and the problems with each. The desirability of combining evidence from both directions. Slides.	JM Ch 13.1-13-.3
Non-probabilistic Parsing Efficient CFG parsing with CYK, another dynamic programming algorithm. Also, perhaps, the Earley parser. Designing a little grammar, and parsing with it on some test data. Slides.	JM Ch 13.4
Probability Introduction to probability theory--the backbone of modern natural language processing. Events, and counting. Joint and conditonal probability, marginals, independence, Bayes rule, combining evidence. Examples of applications in natural language. (Plus: use a little calculus!?) Slides.
Information Theory What is information? Measuring it in bits. The "noisy channel model." The "Shannon game"--motivated by language! Entropy, cross-entropy, information gain. Its application to some language phenomena. Slides.	JM Ch 4.10-4.11
Information Theory, continued Including helpful a quiz.
Language modeling and Naive Bayes Probabilistic language modeling and its applications. Markov models. N-grams. Estimating the probability of a word, and smoothing. Generative models of language. Their application to building an automatically-trained email spam filter, and automatically determining the language (English, French, German, Dutch, Finnish, Klingon?). Slides.	JM Ch 4.1-4.9
NO CLASS (This Tuesday follows UMass Monday schedule.) Optional midterm review session 5:00pm.
Midterm
Part of Speech Tagging and Hidden Markov Models The concept of parts-of-speech, examples, usage. The Penn Treebank and Brown Corpus. Probabilistic (weighted) finite state automata. Hidden Markov models (HMMs), definition and use. Slides.	JM Ch 5
Viterbi Algorithm for Finding Most Likely HMM Path Dynamic programming with Hidden Markov Models, and its use for part-of-speech tagging, Chinese word segmentation, prosody, information extraction, etc. (No slides, board work only.)	JM Ch 6.1-6.4
Probabilistic Context Free Grammars Weighted context free grammars. Weighted CYK. Pruning and beam search. Slides.	JM Ch 12
Parsing with PCFGs A treebank and what it takes to create one. The probabilistic version of CYK. Also: How do humans parse? Experiments with eye-tracking. Modern parsers. Slides.	JM Ch 13
Maximum Entropy Classifiers The maximum entropy principle, and its relation to maximum likelihood. The need in NLP to integrate many pieces of weak evidence. Maximum entropy classifiers and their application to document classification, sentence segmentation, and other language tasks. Slides.	JM Ch 6.6-6.7
Maximum Entropy Markov Models & Conditional Random Fields Part-of-speech tagging, noun-phrase segmentation and information extraction models that combine maximum entropy and finite-state machines. State-of-the-art models for NLP. Guest lecture by Greg Druck. Slides.	JM Ch 6.8
Lexical Semantics Guest lecture by Chris Potts, Professor, UMass Linguistics. Slides.	JM Ch 24, Section 1
Dirichlet Multinomial Distributions Mathematics of Multinomial and Dirichlet distributions, Dirichlet as a smoothing for multinomials. Guest lecture by David Mimno. (No slides.)	JM Ch 20 (labeled as 19)
Project Proposals Student groups give short presentations on their project idea. Feedback from the rest of class.
Machine Translation Probabilistic models for translating French into English. Alignment, translation, language generation. IBM Model #1. Slides.	JM Ch 24
Machine Translation 2 IBM Model #2, and Expectation Maximization. MT evaluation. (Continuation of previous slides.)	JM Ch 24
NO CLASS (Thanksgiving Holiday)
Unsupervised Language Discovery Automatically discovering verb subcategorization. Slides.
Topic Models and Language in Social Networks Topic models. Language modeling integrated into social network analysis. (Continuation of previous slides.)
Pragmatics Guest lecture by Chris Potts, Professor, UMass Linguistics. Slides.	JM Ch 21.3 Selected readings.
Information Extraction & Reference Resolution Building a database of person & company relations from 10 years of New York Times. Building a database of job openings from 70k company Web pages. Various methods, including HMMs. Models of anaphora resolution. Machine learning methods for coreference. Slides1 Slides2	JM Ch 22 Selected readings.

No comments:

Inspiring Quotes

An inspiring quote may be just what you need to turn your day around. Here are some of the most inspiring quotes ever spoken or written.

I hated every minute of training, but I said, “Don’t quit. Suffer now and live the rest of your life as a champion.”

–Muhammad Ali

“You can have anything you want if you are willing to give up the belief that you can’t have it.”
–Robert Anthony

“There is no man living that can not do more than he thinks he can.”

–Henry Ford

“The best way to predict the future is to create it.”

–Dr. Forrest C. Shaklee

“It’s not about time, it’s about choices. How are you spending your choices?”

–Beverly Adamo

“Success…seems to be connected with action. Successful people keep moving. They make mistakes, but they don’t quit.”
–Conrad Hilton

“Destiny is not a matter of chance; it’s a matter of choice.”

–Anonymous

“The future belongs to those who believe in the beauty of their dreams.”
–Eleanor Roosevelt

“The quality of a person’s life is in direct proportion to their commitment to excellence, regardless of their chosen field of endeavor.”
–Vince Lombardi

“It is never too late to be what you might have been.”
–George Eliot

“Do not let what you can not do; interfere with what you can do.”
–John Wooden

“One man with courage makes a majority.”
–Andrew Jackson

“Failure is the opportunity to begin again more intelligently.”
–Henry Ford

“Try not to become a man of success but rather try to become a man of value.”
–Albert Einstein

“The mind is its own place, and in itself can make a heaven of Hell, a hell of Heaven.”

–John Milton

"If u are student, working and preparing give a little extra effort after regular work. A small sacrifice of TV time, fun time, or facebook time can bring a lot of better things to life than you ever imagined."

-- Naam likhna jaroori nai samajhta.

Thank you for reading, be sure to pass this along!

Sunday, 3 March 2013

Natural Language Processing

Textbook & Resources

No comments:

Post a Comment