Annotated Corpora

From lingwiki

(Difference between revisions)
Jump to: navigation, search
Current revision (13:04, 11 November 2009) (edit) (undo)
m (1 revision)
 

Current revision

An annotated corpus incorporates various elements; a lexicon, morphological/syntactc analysis, and a text corpus. Developing a large multi-lingual corpus, together with tools to analyze the data and tools to navigate the data, is a project with far-reaching potential for linguistic research.

[edit] Online Corpora

Here are links to a few existing corpora which include some additional analysis of the texts:

  • Perseus Digital Library -- includes texts from wide variety of sources, including classical Greek and Latin texts.
  • The Vergil Project -- marked-up text of Vergil's Aeneid. The text is hand-annotated; each word form is described in terms of its root, inflectional morphology, and a rough English gloss. Annotation is incomplete (but ongoing?)
Personal tools