]> git.openstreetmap.org Git - nominatim.git/blob - docs/customize/Sanitizers.md
prepare release 5.3.2.post7
[nominatim.git] / docs / customize / Sanitizers.md
1 # Sanitizers
2
3 _Sanitizing_ is the process of cleaning up and otherwise preprocessing names
4 before adding them to the search index during the import process. This allows
5 to clean up tagging, normalise different spellings and mark names with extra
6 attributes for further processing.
7
8 !!! hint
9     Sanitizers only have an effect on how the search index is built. They
10     do not change the information about each place that is saved in the
11     database. In particular, they have no influence on how the results are
12     displayed. The returned results always show the original information as
13     stored in the OpenStreetMap database.
14
15
16 ## Configuration
17
18 The sanitizing process is defined in the 'sanitizers.yaml' configuration
19 file. The file must contain a list of steps. Each step has a mandatory
20 parameter `step` which defines the type of sanitizer. Additional step
21 configuration may then be set with additional parameters.
22
23 The steps are executed in the order that they are defined in the configuration
24 file. Order matters here: each sanitizer works with the output of the previous
25 step.
26
27 ## Pre-defined sanitizers
28
29 The following is a list of sanitizers that are shipped with Nominatim.
30 To learn about how to add your own custom sanitizer, see the section on
31 [custom sanitizer modules](../develop/Sanitizer-Modules.md).
32
33 ### affix-expansion
34
35 ::: nominatim_db.tokenizer.sanitizers.affix_expansion
36     options:
37         members: False
38         heading_level: 6
39         docstring_section_style: spacy
40
41 ### clean-housenumbers
42
43 ::: nominatim_db.tokenizer.sanitizers.clean_housenumbers
44     options:
45         members: False
46         heading_level: 6
47         docstring_section_style: spacy
48
49 ### clean-postcodes
50
51 ::: nominatim_db.tokenizer.sanitizers.clean_postcodes
52     options:
53         members: False
54         heading_level: 6
55         docstring_section_style: spacy
56
57 ### clean-tiger-tags
58
59 ::: nominatim_db.tokenizer.sanitizers.clean_tiger_tags
60     options:
61         members: False
62         heading_level: 6
63         docstring_section_style: spacy
64
65 ### delete-names
66
67 ::: nominatim_db.tokenizer.sanitizers.delete_names
68     options:
69         members: False
70         heading_level: 6
71         docstring_section_style: spacy
72
73 ### derive-names
74
75 ::: nominatim_db.tokenizer.sanitizers.derive_names
76     options:
77         members: False
78         heading_level: 6
79         docstring_section_style: spacy
80
81 ### split-name-list
82
83 ::: nominatim_db.tokenizer.sanitizers.split_name_list
84     options:
85         members: False
86         heading_level: 6
87         docstring_section_style: spacy
88
89 ### strip-brace-terms
90
91 ::: nominatim_db.tokenizer.sanitizers.strip_brace_terms
92     options:
93         members: False
94         heading_level: 6
95         docstring_section_style: spacy
96
97 ### tag-analyzer-by-language
98
99 ::: nominatim_db.tokenizer.sanitizers.tag_analyzer_by_language
100     options:
101         members: False
102         heading_level: 6
103         docstring_section_style: spacy
104
105 ### tag-japanese
106
107 ::: nominatim_db.tokenizer.sanitizers.tag_japanese
108     options:
109         members: False
110         heading_level: 6
111         docstring_section_style: spacy
112