3 _Sanitizing_ is the process of cleaning up and otherwise preprocessing names
4 before adding them to the search index during the import process. This allows
5 to clean up tagging, normalise different spellings and mark names with extra
6 attributes for further processing.
9 Sanitizers only have an effect on how the search index is built. They
10 do not change the information about each place that is saved in the
11 database. In particular, they have no influence on how the results are
12 displayed. The returned results always show the original information as
13 stored in the OpenStreetMap database.
18 The sanitizing process is defined in the 'sanitizers.yaml' configuration
19 file. The file must contain a list of steps. Each step has a mandatory
20 parameter `step` which defines the type of sanitizer. Additional step
21 configuration may then be set with additional parameters.
23 The steps are executed in the order that they are defined in the configuration
24 file. Order matters here: each sanitizer works with the output of the previous
27 ## Pre-defined sanitizers
29 The following is a list of sanitizers that are shipped with Nominatim.
30 To learn about how to add your own custom sanitizer, see the section on
31 [custom sanitizer modules](../develop/Sanitizer-Modules.md).
35 ::: nominatim_db.tokenizer.sanitizers.affix_expansion
39 docstring_section_style: spacy
41 ### clean-housenumbers
43 ::: nominatim_db.tokenizer.sanitizers.clean_housenumbers
47 docstring_section_style: spacy
51 ::: nominatim_db.tokenizer.sanitizers.clean_postcodes
55 docstring_section_style: spacy
59 ::: nominatim_db.tokenizer.sanitizers.clean_tiger_tags
63 docstring_section_style: spacy
67 ::: nominatim_db.tokenizer.sanitizers.delete_names
71 docstring_section_style: spacy
75 ::: nominatim_db.tokenizer.sanitizers.derive_names
79 docstring_section_style: spacy
83 ::: nominatim_db.tokenizer.sanitizers.split_name_list
87 docstring_section_style: spacy
91 ::: nominatim_db.tokenizer.sanitizers.strip_brace_terms
95 docstring_section_style: spacy
97 ### tag-analyzer-by-language
99 ::: nominatim_db.tokenizer.sanitizers.tag_analyzer_by_language
103 docstring_section_style: spacy
107 ::: nominatim_db.tokenizer.sanitizers.tag_japanese
111 docstring_section_style: spacy