02 APR 2024

PoeTree version 0.0.2

DOI 10.5281/zenodo.10907309

  • PoeTree.sl added
  • PoeTree.de enriched with Deutsches Lyrik Korpus

Dataset comprising over 330,000 poems / 89,000,000 tokens in ten languages (Czech, English, French, German, Hungarian, Italian, Portuguese, Russian, Slovenian, and Spanish). Each corpus has been deduplicated, enriched with Universal Dependencies, provided with additional metadata and converted into a unified JSON structure.

Universal Dependencies
wikidata
VIAF
16 OCT 2023

PoeTree version 0.0.1

DOI 10.5281/zenodo.10008459

Dataset comprising over 300,000 poems / 84,000,000 tokens in nine languages (Czech, English, French, German, Hungarian, Italian, Portuguese, Russian, and Spanish). Each corpus has been deduplicated, enriched with Universal Dependencies, provided with additional metadata and converted into a unified JSON structure.

Universal Dependencies
wikidata
VIAF
Supported by the Czech Science Foundation (GA23-07727S)