Annotated Corpora

From lingwiki

Revision as of 18:30, 5 September 2009 by Abney (Talk | contribs)
(diff) ←Older revision | Current revision (diff) | Newer revision→ (diff)
Jump to: navigation, search

An annotated corpus incorporates various elements; a lexicon, morphological/syntactc analysis, and a text corpus. Developing a large multi-lingual corpus, together with tools to analyze the data and tools to navigate the data, is a project with far-reaching potential for linguistic research.

Online Corpora

Here are links to a few existing corpora which include some additional analysis of the texts:

  • Perseus Digital Library -- includes texts from wide variety of sources, including classical Greek and Latin texts.
  • The Vergil Project -- marked-up text of Vergil's Aeneid. The text is hand-annotated; each word form is described in terms of its root, inflectional morphology, and a rough English gloss. Annotation is incomplete (but ongoing?)
Personal tools