docs/develop/Database-Layout.md

   1 # Database Layout
   2
   3 ### Import tables
   4
   5 OSM data is initially imported using [osm2pgsql](https://osm2pgsql.org).
   6 Nominatim uses a custom flex style to create the initial import tables.
   7
   8 The import process creates the following tables:
   9
  10 ![osm2pgsql tables](osm2pgsql-tables.svg)
  11
  12 The `planet_osm_*` tables are the usual backing tables for OSM data. Note
  13 that Nominatim uses them to look up special relations and to find nodes on
  14 ways.
  15
  16 The osm2pgsql import produces a single table `place` as output with the following
  17 columns:
  18
  19  * `osm_type` - kind of OSM object (**N** - node, **W** - way, **R** - relation)
  20  * `osm_id` - original OSM ID
  21  * `class` - key of principal tag defining the object type
  22  * `type` - value of principal tag defining the object type
  23  * `name` - collection of tags that contain a name or reference
  24  * `admin_level` - numerical value of the tagged administrative level
  25  * `address` - collection of tags defining the address of an object
  26  * `extratags` - collection of additional interesting tags that are not
  27                  directly relevant for searching
  28  * `geometry` - geometry of the object (in WGS84)
  29
  30 A single OSM object may appear multiple times in this table when it is tagged
  31 with multiple tags that may constitute a principal tag. Take for example a
  32 motorway bridge. In OSM, this would be a way which is tagged with
  33 `highway=motorway` and `bridge=yes`. This way would appear in the `place` table
  34 once with `class` of `highway` and once with a `class` of `bridge`. Thus the
  35 *unique key* for `place` is (`osm_type`, `osm_id`, `class`).
  36
  37 How raw OSM tags are mapped to the columns in the place table is to a certain
  38 degree configurable. See [Customizing Import Styles](../customize/Import-Styles.md)
  39 for more information.
  40
  41 ### Search tables
  42
  43 The following tables carry all information needed to do the search:
  44
  45 ![search tables](search-tables.svg)
  46
  47 The **placex** table is the central table that saves all information about the
  48 searchable places in Nominatim. The basic columns are the same as for the
  49 place table and have the same meaning. The placex tables adds the following
  50 additional columns:
  51
  52  * `place_id` - the internal unique ID to identify the place
  53  * `partition` - the id to use with partitioned tables (see below)
  54  * `geometry_sector` - a location hash used for geographically close ordering
  55  * `parent_place_id` - the next higher place in the address hierarchy, only
  56    relevant for POI-type places (with rank 30)
  57  * `linked_place_id` - place ID of the place this object has been merged with.
  58    When this ID is set, then the place is invisible for search.
  59  * `importance` - measure how well known the place is
  60  * `rank_search`, `rank_address` - search and address rank (see [Customizing ranking](../customize/Ranking.md)
  61  * `wikipedia` - the wikipedia page used for computing the importance of the place
  62  * `country_code` - the country the place is located in
  63  * `housenumber` - normalized housenumber, if the place has one
  64  * `postcode` - computed postcode for the place
  65  * `indexed_status` - processing status of the place (0 - ready, 1 - freshly inserted, 2 - needs updating, 100 - needs deletion)
  66  * `indexed_date` - timestamp when the place was processed last
  67  * `centroid` - a point feature for the place
  68
  69 The **location_property_osmline** table is a special table for
  70 [address interpolations](https://wiki.openstreetmap.org/wiki/Addresses#Using_interpolation).
  71 The columns have the same meaning and use as the columns with the same name in
  72 the placex table. Only three columns are special:
  73
  74  * `startnumber` and `endnumber` - beginning and end of the number range
  75     for the interpolation
  76  * `interpolationtype` - a string `odd`, `even` or `all` to indicate
  77     the interval between the numbers
  78
  79 Address interpolations are always ways in OSM, which is why there is no column
  80 `osm_type`.
  81
  82 The **location_postcode** table holds computed centroids of all postcodes that
  83 can be found in the OSM data. The meaning of the columns is again the same
  84 as that of the placex table.
  85
  86 Every place needs an address, a set of surrounding places that describe the
  87 location of the place. The set of address places is made up of OSM places
  88 themselves. The **place_addressline** table cross-references for each place
  89 all the places that make up its address. Two columns define the address
  90 relation:
  91
  92   * `place_id` - reference to the place being addressed
  93   * `address_place_id` - reference to the place serving as an address part
  94
  95 The most of the columns cache information from the placex entry of the address
  96 part. The exceptions are:
  97
  98   * `fromarea` - is true if the address part has an area geometry and can
  99     therefore be considered preceise
 100   * `isaddress` - is true if the address part should show up in the address
 101     output. Sometimes there are multiple places competing for for same address
 102     type (e.g. multiple cities) and this field resolves the tie.
 103
 104 The **search_name** table contains the search index proper. It saves for each
 105 place the terms with which the place can be found. The terms are split into
 106 the name itself and all terms that make up the address. The table mirrors some
 107 of the columns from placex for faster lookup.
 108
 109 Search terms are not saved as strings. Each term is assigned an integer and those
 110 integers are saved in the name and address vectors of the search_name table. The
 111 **word** table serves as the lookup table from string to such a word ID. The
 112 exact content of the word table depends on the [tokenizer](Tokenizers.md) used.
 113
 114 ## Address computation tables
 115
 116 Next to the main search tables, there is a set of secondary helper tables used
 117 to compute the address relations between places. These tables are partitioned.
 118 Each country is assigned a partition number in the country_name table (see
 119 below) and the data is then split between a set of tables, one for each
 120 partition. Note that Nominatim still manually manages partitioned tables.
 121 Native support for partitions in PostgreSQL only became usable with version 13.
 122 It will be a little while before Nominatim drops support for older versions.
 123
 124 ![address tables](address-tables.svg)
 125
 126 The **search_name_X** tables are used to look up streets that appear in the
 127 `addr:street` tag.
 128
 129 The **location_area_large_X** tables are used to look up larger areas
 130 (administrative boundaries and place nodes) either through their geographic
 131 closeness or through `addr:*` entries.
 132
 133 The **location_road_X** tables are used to find the closest street for a
 134 dependent place.
 135
 136 All three table cache specific information from the placex table for their
 137 selected subset of places:
 138
 139  * `keywords` and `name_vector` contain lists of term ids (from the word table)
 140    that the full name of the place should match against
 141  * `isguess` is true for places that are not described by an area
 142
 143 All other columns reflect their counterpart in the placex table.
 144
 145 ## Static data tables
 146
 147 Nominatim also creates a number of static tables at import:
 148
 149  * `nominatim_properties` saves settings that must not be changed after
 150     import
 151  * `address_levels` save the rank information from the
 152    [ranking configuration](../customize/Ranking.md)
 153  * `country_name` contains a fallback of names for all countries, their
 154    default languages and saves the assignment of countries to partitions.
 155  * `country_osm_grid` provides a fallback for country geometries
 156
 157 ## Auxiliary data tables
 158
 159 Finally there are some table for auxiliary data:
 160
 161  * `location_property_tiger` - saves housenumber from the Tiger import. Its
 162    layout is similar to that of `location_propoerty_osmline`.
 163  * `place_class_*` tables are helper tables to facilitate lookup of POIs
 164    by their class and type. They exist because it is not possible to create
 165    combined indexes with geometries.
 166