]> git.openstreetmap.org Git - nominatim.git/commit
ICU: better letter identification in normalization
authorSarah Hoffmann <lonvia@denofr.de>
Thu, 28 Apr 2022 15:20:56 +0000 (17:20 +0200)
committerSarah Hoffmann <lonvia@denofr.de>
Thu, 28 Apr 2022 16:23:17 +0000 (18:23 +0200)
commit63dc4b39bc6bc0bf5a95d0c1a8298f5349637a9e
treef58b7e6e539ef4229241f70a4556f9c0ce3cbae3
parentde828b723e98955c3484e596dcd1f84437eb652b
ICU: better letter identification in normalization

The Letter class does not include non-spacing marks that can also
have a consonant or vowel meaning, especially in Indian languages.
Use the alnum propoerty instead which includes them all. Also
include the vowel-canceling Virama, which is not a letter by itself
but changes the transliteration.
settings/icu_tokenizer.yaml