Computational linguistics
From lingwiki
This page concerns computational linguistics in general. For study and research in computational linguistics in the UM linguistics department, see CompLing Lab.
Computational linguistics encompasses just about anything that combines natural language with computation. By natural language we mean a naturally-occurring human language. Natural languages contrast with formal languages such as computer languages and logical calculi, with invented languages such as Esperanto and Sindarin, and with animal languages - though the techniques of computational linguistics can certainly be extended to these other varieties of language as well.
The term natural language processing (NLP) comes from the artificial intelligence community, and is often treated as synonymous with computational linguistics, though it can be taken more narrowly as referring to the subfield of artificial intelligence that concerns language processing.
Contents |
[edit] Conceptions of computational linguistics
One can identify at least three different broad conceptions of computational linguistics, with three different ideas of the overall aims of the discipline.
[edit] Computational models of human language processing
Perhaps the most salient "ultimate goal" of computational linguistics is the construction of a system that either evinces or models human language capabilities. The choice between "evinces" and "models" depends on whether one's interests are in how human language can be processed in principle, or more concretely in how humans process human language. To be a little tongue-in-cheek, the question is whether we are interested in [human language] processing, or in human [language processing].
- natural language processing seeks to build a "machine that talks" - a working artifact that demonstrates human-level language processing abilities - as a subgoal of constructing an artificial intelligence
- computational psycholinguistics seeks to define and test formal computational models of the language component of the human mind, or, more abstractly, of the language capabilities of humans
[edit] Human language technology (HLT)
Small-scale AI systems of the late sixties and early seventies generated considerable excitement, and an expectation that "scaling them up" would lead quickly to a genuine AI. But scaling them up proved to be very difficult. This led to a focus on shorter-term goals, and the identification of useful language technologies that could be constructed in the near term. Considerable progress has been made on such technologies, including:
- information extraction
- machine translation
- speech recognition
- optical character recognition
- speech synthesis
- spoken language systems
- computer-supported language education
The area of human language technology has received so much attention that it is often taken as synonymous with computational linguistics.
[edit] Digital linguistics
On the face of it, computational linguistics ought to refer to a branch of linguistics, analogous to "computational biology" or "computational astronomy." This is perhaps the least well-developed conception of computational linguistics, hence (on the bright side) the area where there is the most low-hanging fruit. Computational linguistics in this conception is the application of computational methods to the scientific study of language. Computational psycholinguistics, mentioned above, can be viewed as a special case, but there is no presumption in general that the computation being done has any relation to human language processing. Any computation in service of doing linguistics falls here - for example, the automated processing of large collections of language data.
- computational historical linguistics
- computational sociolinguistics
- corpus linguistics
- digital language documentation
- field linguistics
[edit] Core Computational Linguistics
[edit] Foundations
[edit] References
There are two widely-used textbooks in computational linguistics:
- Dan Jurafsky and James Martin. Speech and Language Processing, 2nd edition. Upper Saddle River, NJ: Pearson/Prentice Hall. 2009. http://www.cs.colorado.edu/~martin/slp.html
- Christopher D. Manning and Hinrich Schütze. Foundations of Statistical Natural Language Processing. Cambridge, MA: The MIT Press. 1999.
Other starting points:
- Wikipedia page "Computational linguistics"
- AAAI topic "Natural Language"
- Association for Computational Linguistics
For getting started in computer science and programming:
- John Zelle. Python Programming: An Introduction to Computer Science. Franklin Beedle & Associates. 2003.