PoeTree is a standardized collection of poetry corpora comprising over 330,000 poems in ten languages (Czech, English, French, German, Hungarian, Italian, Portuguese, Russian, Slovenian, Spanish). Each corpus has been deduplicated, enriched with Universal Dependencies, provided with additional metadata and converted into a unified JSON structure.

The latest version of full JSON collection is available at DOI 10.5281/zenodo.10008458

PoeTree is also accessible via REST API and through Python and R libraries.

PoeTree size measured by number of poems
PoeTree coverage measured by number of poems
Supported by the Czech Science Foundation (GA23-07727S)