]> git.openstreetmap.org Git - nominatim.git/commitdiff
contract duplicate spaces in transliteration string
authorSarah Hoffmann <lonvia@denofr.de>
Fri, 2 Dec 2022 09:15:02 +0000 (10:15 +0100)
committerSarah Hoffmann <lonvia@denofr.de>
Fri, 2 Dec 2022 09:15:02 +0000 (10:15 +0100)
There are some pathological cases where an isolated letter may
be deleted because it is in itself meaningless. If this happens in
the middle of a sentence, then the transliteration contains two
consecutive spaces. Add a final rule to fix this.

See #2909.

settings/icu_tokenizer.yaml

index 16339970c2ae5727cf606fe686a607814cfec2a2..f30578a2322859ce287915a4049092f50eb3057a 100644 (file)
@@ -24,6 +24,7 @@ transliteration:
     - ":: lower ()"
     - "[^a-z0-9[:Space:]] >"
     - ":: NFC ()"
+    - "[:Space:]+ > ' '"
 sanitizers:
     - step: clean-housenumbers
       filter-kind: