Petr Plecháč :: Corpus of Czech Verse and Beyond ASEEES 2016
Versification RG, ICL CAS | Institute of the Czech National Corpus

Detection of rhymes

(3) Similarity (distance) based approach

PRINCIPLE: 1. Consider each sound as a vector of distinctive features.
2. Apply some similarity measure (distance between sound vectors).
EXAMPLE 1 (cs): "láska" [ l a: s k a ] :: "maska" [ m a s k a ]
LOW | HIGH | FRONT | BACK | ROUND | TENSE | LENGTH a: 1 0 1 0 0 0 1 DIST = 1 a 1 0 1 0 0 0 0 SON | LAB | COR | DOR | PHA | VOI | CONT | STRI | LAT | DEL | NAS s 0 0 1 0 0 0 1 1 0 0 0 DIST = 0 s 0 0 1 0 0 0 1 1 0 0 0 SON | LAB | COR | DOR | PHA | VOI | CONT | STRI | LAT | DEL | NAS k 0 0 1 1 0 0 0 0 0 0 0 DIST = 0 k 0 0 1 1 0 0 0 0 0 0 0 LOW | HIGH | FRONT | BACK | ROUND | TENSE | LENGTH a 1 0 1 0 0 0 0 DIST = 0 a 1 0 1 0 0 0 0 DIST.AVERAGE = 0.25
===> Importance of match of particular features is different.
What to do when consonant clusters are of different size?
EXAMPLE 2 (cs):
"láska" [ l a: s k a ] OR [ l a: s k a ]
"pára" [ p a: r a ] OR [ p a: r a ]