Computational linguistics
From lingwiki
m (1 revision) |
|||
Line 1: | Line 1: | ||
+ | ''This page concerns computational linguistics in general. For computational linguistics at the University of Michigan, see [[CompLing Lab]].'' | ||
+ | |||
'''Computational linguistics''' encompasses just about anything that combines natural language with computation. By ''natural language'' we mean a naturally-occurring human language. Natural languages contrast with formal languages such as computer languages and logical calculi, with invented languages such as Esperanto and Sindarin, and with animal languages. | '''Computational linguistics''' encompasses just about anything that combines natural language with computation. By ''natural language'' we mean a naturally-occurring human language. Natural languages contrast with formal languages such as computer languages and logical calculi, with invented languages such as Esperanto and Sindarin, and with animal languages. | ||
Revision as of 14:18, 14 May 2010
This page concerns computational linguistics in general. For computational linguistics at the University of Michigan, see CompLing Lab.
Computational linguistics encompasses just about anything that combines natural language with computation. By natural language we mean a naturally-occurring human language. Natural languages contrast with formal languages such as computer languages and logical calculi, with invented languages such as Esperanto and Sindarin, and with animal languages.
The term natural language processing (NLP) comes from the artificial intelligence community, and is often treated as synonymous with computational linguistics, though it can be taken more narrowly as referring to the subfield of artificial intelligence that concerns language processing.
Contents |
Conceptions of computational linguistics
One can identify at least three different broad conceptions of computational linguistics, with three different ideas of the overall aims of the discipline.
Computational models of human language processing
Perhaps the most salient "ultimate goal" of computational linguistics is the construction of a system that either evinces or models human language capabilities. The choice between "evinces" and "models" depends on whether one's interests are in how human language can be processed in principle, or more concretely in how humans process human language. To be a little tongue-in-cheek, the question is whether we are interested in [human language] processing, or in human [language processing].
- natural language processing seeks to build a "machine that talks" - a working artifact that demonstrates human-level language processing abilities - as a subgoal of constructing an artificial intelligence
- computational psycholinguistics seeks to define and test formal computational models of the language component of the human mind, or, more abstractly, of the language capabilities of humans
Human language technology (HLT)
Small-scale AI systems of the late sixties and early seventies generated considerable excitement, and an expectation that "scaling them up" would lead quickly to a genuine AI. But scaling them up proved to be very difficult. This led to a focus on shorter-term goals, and the identification of useful language technologies that could be constructed in the near term. Considerable progress has been made on such technologies, including:
- information extraction
- machine translation
- speech recognition
- speech synthesis
- spoken language systems
- computer-supported language education
The area of human language technology has received so much attention that it is often taken as synonymous with computational linguistics.
Computational methods for linguistics
On the face of it, computational linguistics ought to refer to a branch of linguistics, analogous to "computational biology" or "computational astronomy." This is perhaps the least well-developed conception of computational linguistics, hence (on the bright side) the area where there is the most low-hanging fruit. Computational linguistics in this conception is the application of computational methods to the scientific study of language. Computational psycholinguistics, mentioned above, can be viewed as a special case, but there is no presumption in general that the computation being done has any relation to human language processing. Any computation in service of doing linguistics falls here - for example, the automated processing of large collections of language data.
- computational historical linguistics
- computational sociolinguistics
- corpus linguistics
- field linguistics
Core Computational Linguistics
Foundations
References
There are two widely-used textbooks in computational linguistics:
- Dan Jurafsky and James Martin. Speech and Language Processing, 2nd edition. Upper Saddle River, NJ: Pearson/Prentice Hall. 2009. http://www.cs.colorado.edu/~martin/slp.html
- Christopher D. Manning and Hinrich Schütze. Foundations of Statistical Natural Language Processing. Cambridge, MA: The MIT Press. 1999.
Other starting points:
For getting started in computer science and programming:
- John Zelle. Python Programming: An Introduction to Computer Science. Franklin Beedle & Associates. 2003.