Computational linguistics

From lingwiki

Revision as of 18:38, 6 January 2011 by Abney (Talk | contribs)
Jump to: navigation, search

This page concerns computational linguistics in general. For study and research in computational linguistics in the UM linguistics department, see CompLing Lab.

Computational linguistics encompasses just about anything that combines natural language with computation. By natural language we mean a naturally-occurring human language. Natural languages contrast with formal languages such as computer languages and logical calculi, with invented languages such as Esperanto and Sindarin, and with animal languages.

The term natural language processing (NLP) comes from the artificial intelligence community, and is often treated as synonymous with computational linguistics, though it can be taken more narrowly as referring to the subfield of artificial intelligence that concerns language processing.

Contents

Conceptions of computational linguistics

One can identify at least three different broad conceptions of computational linguistics, with three different ideas of the overall aims of the discipline.

Computational models of human language processing

Perhaps the most salient "ultimate goal" of computational linguistics is the construction of a system that either evinces or models human language capabilities. The choice between "evinces" and "models" depends on whether one's interests are in how human language can be processed in principle, or more concretely in how humans process human language. To be a little tongue-in-cheek, the question is whether we are interested in [human language] processing, or in human [language processing].

Human language technology (HLT)

Small-scale AI systems of the late sixties and early seventies generated considerable excitement, and an expectation that "scaling them up" would lead quickly to a genuine AI. But scaling them up proved to be very difficult. This led to a focus on shorter-term goals, and the identification of useful language technologies that could be constructed in the near term. Considerable progress has been made on such technologies, including:

The area of human language technology has received so much attention that it is often taken as synonymous with computational linguistics.

Digital linguistics

On the face of it, computational linguistics ought to refer to a branch of linguistics, analogous to "computational biology" or "computational astronomy." This is perhaps the least well-developed conception of computational linguistics, hence (on the bright side) the area where there is the most low-hanging fruit. Computational linguistics in this conception is the application of computational methods to the scientific study of language. Computational psycholinguistics, mentioned above, can be viewed as a special case, but there is no presumption in general that the computation being done has any relation to human language processing. Any computation in service of doing linguistics falls here - for example, the automated processing of large collections of language data.

Core Computational Linguistics

Foundations

References

There are two widely-used textbooks in computational linguistics:

  • Dan Jurafsky and James Martin. Speech and Language Processing, 2nd edition. Upper Saddle River, NJ: Pearson/Prentice Hall. 2009. http://www.cs.colorado.edu/~martin/slp.html
  • Christopher D. Manning and Hinrich Schütze. Foundations of Statistical Natural Language Processing. Cambridge, MA: The MIT Press. 1999.

Other starting points:

For getting started in computer science and programming:

  • John Zelle. Python Programming: An Introduction to Computer Science. Franklin Beedle & Associates. 2003.
Personal tools