Sarah Hoffmann [Mon, 12 Jul 2021 12:58:44 +0000 (14:58 +0200)]
factor out connection reset code
Sarah Hoffmann [Mon, 12 Jul 2021 12:47:50 +0000 (14:47 +0200)]
simplify analyse function
Sarah Hoffmann [Mon, 12 Jul 2021 12:43:50 +0000 (14:43 +0200)]
split up variant computation for better readability
Sarah Hoffmann [Mon, 12 Jul 2021 09:53:25 +0000 (11:53 +0200)]
reorganise process_place function
Move address processing into its own function as it is
rather extensive.
Sarah Hoffmann [Mon, 12 Jul 2021 09:41:05 +0000 (11:41 +0200)]
simplify website setup code
Use formaat strings and move variable quoting code into extra
function.
Sarah Hoffmann [Mon, 12 Jul 2021 09:33:09 +0000 (11:33 +0200)]
avoid repeated patterns for table name
Sarah Hoffmann [Sun, 11 Jul 2021 22:16:25 +0000 (00:16 +0200)]
simplify if statements
Sarah Hoffmann [Sun, 11 Jul 2021 21:48:16 +0000 (23:48 +0200)]
convert single case switch to if statement
Sarah Hoffmann [Sun, 11 Jul 2021 21:22:16 +0000 (23:22 +0200)]
avoid local variable assignment
Sarah Hoffmann [Sun, 11 Jul 2021 18:21:12 +0000 (20:21 +0200)]
fix more missing braces on one-liners
Sarah Hoffmann [Sun, 11 Jul 2021 18:14:25 +0000 (20:14 +0200)]
remove dead code
Sarah Hoffmann [Sun, 11 Jul 2021 18:10:13 +0000 (20:10 +0200)]
do not intermix params with and without default
Sarah Hoffmann [Sun, 11 Jul 2021 17:24:04 +0000 (19:24 +0200)]
directly return data in function
The temporary variable is not necessary.
Sarah Hoffmann [Sun, 11 Jul 2021 17:11:37 +0000 (19:11 +0200)]
remove unnecessayly nested ifs
Found by Sonarqube.
Sarah Hoffmann [Sun, 11 Jul 2021 17:10:04 +0000 (19:10 +0200)]
remove unused functions
The functions were necessary for the transitory code
to Python and are no longer used.
Sarah Hoffmann [Sun, 11 Jul 2021 16:23:42 +0000 (18:23 +0200)]
avoid multiple returns of same value
Found by Sonarqube.
Sarah Hoffmann [Sat, 10 Jul 2021 12:59:38 +0000 (14:59 +0200)]
always use brackets on if statements
This adds bracket around all one-line if statements that did
not have them yet.
Sarah Hoffmann [Fri, 9 Jul 2021 14:36:42 +0000 (16:36 +0200)]
remove unused variables
As reported by sonarqube.
Sarah Hoffmann [Fri, 9 Jul 2021 10:50:35 +0000 (12:50 +0200)]
fix bad use of echo in PHP output
Sarah Hoffmann [Wed, 7 Jul 2021 12:39:53 +0000 (14:39 +0200)]
Merge pull request #2384 from lonvia/actions-add-icu-tokenizer
CI: run tests on Ubuntu 18
Sarah Hoffmann [Tue, 6 Jul 2021 21:04:01 +0000 (23:04 +0200)]
add missing pyyaml requirement
Sarah Hoffmann [Tue, 6 Jul 2021 20:52:57 +0000 (22:52 +0200)]
enable PHP 7.2 for Ubuntu 18 CI
Sarah Hoffmann [Tue, 6 Jul 2021 14:10:18 +0000 (16:10 +0200)]
cannot use capture_output in subprocess.run
Only available since Python 3.7.
Sarah Hoffmann [Tue, 6 Jul 2021 07:54:11 +0000 (09:54 +0200)]
remove default parameter for namedtuple
This is only available in Python 3.7.
Sarah Hoffmann [Mon, 5 Jul 2021 15:15:07 +0000 (17:15 +0200)]
CI: run tests on older Ubuntu version as well
Sarah Hoffmann [Mon, 5 Jul 2021 10:34:34 +0000 (12:34 +0200)]
Merge pull request #2382 from lonvia/remove-json-config
Remove outdated ICU tokenizer JSON config
Sarah Hoffmann [Mon, 5 Jul 2021 10:34:16 +0000 (12:34 +0200)]
Merge pull request #2383 from lonvia/remove-more-names
Exclude name:etymology and name:signed
Sarah Hoffmann [Mon, 5 Jul 2021 09:04:16 +0000 (11:04 +0200)]
exclude name:etymology and name:signed
name:etymology contains a description of the name origin and is
thus more informative than search-worthy.
name:signed basically indicates that the feature does not have
a name.
Sarah Hoffmann [Mon, 5 Jul 2021 09:01:35 +0000 (11:01 +0200)]
remove outdated ICU tokenizer JSON config
Sarah Hoffmann [Mon, 5 Jul 2021 08:32:38 +0000 (10:32 +0200)]
Merge pull request #2371 from lonvia/increase-python-version
Increase minimum required Python version to 3.6
Sarah Hoffmann [Mon, 5 Jul 2021 08:32:16 +0000 (10:32 +0200)]
Merge pull request #2381 from lonvia/reorganise-abbreviations
Reorganise abbreviation handling
Sarah Hoffmann [Sun, 4 Jul 2021 08:44:58 +0000 (10:44 +0200)]
add warning about experimental nature of ICU tokenizer
Sarah Hoffmann [Fri, 2 Jul 2021 14:42:13 +0000 (16:42 +0200)]
limit the number of variants that can be produced
Sarah Hoffmann [Fri, 2 Jul 2021 13:05:17 +0000 (15:05 +0200)]
restrict partial word counting to names of reasoanble length
The partial word count does not split names to save a bit of time.
The result is that it might enounter unreasonably long names
which in truth consist of multiple words. No accurate statistics
are needed so simply restrict the count to words shorter than
75 characters.
Sarah Hoffmann [Thu, 1 Jul 2021 15:56:23 +0000 (17:56 +0200)]
fix subsequent replacements
Two replacement words directly following each other did not
work as expected because each expects a space at the
beginning/end while there was only one space available.
Also forbit composing a word after a space was added in the
end by a previous replacement.
Sarah Hoffmann [Wed, 30 Jun 2021 19:52:33 +0000 (21:52 +0200)]
leave ICU variant properties empty for now
Saving unused properties causes unnecessary duplicates.
Sarah Hoffmann [Wed, 30 Jun 2021 19:37:29 +0000 (21:37 +0200)]
import abbreviations from OSM Wiki
Replaces the variant rules with a slightly cleaned-up
version of the abbreviation lists at
https://wiki.openstreetmap.org/wiki/Name_finder:Abbreviations
Sarah Hoffmann [Sat, 26 Jun 2021 17:38:08 +0000 (19:38 +0200)]
improve normalization
Make sure all special symbols are removed during normalization already.
Those won't be interpreted in any way because they are unlikely to be
searched for.
Sarah Hoffmann [Sat, 26 Jun 2021 09:57:09 +0000 (11:57 +0200)]
only consider partials in multi-words for initial count
This ensures that it is less likely that we exclude meaningful
words like 'hauptstrasse' just because they are frequent.
Sarah Hoffmann [Sat, 26 Jun 2021 08:13:33 +0000 (10:13 +0200)]
add documentation for ICU tokenizer configuration
Sarah Hoffmann [Thu, 24 Jun 2021 18:02:07 +0000 (20:02 +0200)]
switch to a more flexible variant description format
The new format combines compound splitting and abbreviation.
It also allows to restrict rules to additional conditions
(like language or region). This latter ability is not used
yet.
Sarah Hoffmann [Sun, 20 Jun 2021 21:45:33 +0000 (23:45 +0200)]
use yaml tag syntax to mark include files
Sarah Hoffmann [Tue, 15 Jun 2021 07:02:17 +0000 (09:02 +0200)]
add dependency on datrie
Sarah Hoffmann [Tue, 15 Jun 2021 06:59:03 +0000 (08:59 +0200)]
tests for composing decomposed suffixes
Sarah Hoffmann [Fri, 11 Jun 2021 08:03:31 +0000 (10:03 +0200)]
make compund decomposition pure import feature
Compound decomposition now creates a full name variant on
import just like abbreviations. This simplifies query time
normalization and opens a path for changing abbreviation
and compund decomposition lists for an existing database.
Sarah Hoffmann [Thu, 10 Jun 2021 15:18:23 +0000 (17:18 +0200)]
complete tests for icu tokenizer
Sarah Hoffmann [Thu, 10 Jun 2021 08:28:46 +0000 (10:28 +0200)]
fix full term token in special phrases
Sarah Hoffmann [Thu, 10 Jun 2021 08:06:49 +0000 (10:06 +0200)]
complete tests for rule loader
Sarah Hoffmann [Thu, 10 Jun 2021 07:36:43 +0000 (09:36 +0200)]
correctly quote strings when copying in data
Encapsulate the copy string in a class that ensures that
copy lines are written with correct quoting.
Sarah Hoffmann [Wed, 9 Jun 2021 13:07:36 +0000 (15:07 +0200)]
update unit tests for adapted abbreviation code
Sarah Hoffmann [Wed, 9 Jun 2021 08:53:39 +0000 (10:53 +0200)]
add abbreviations from legacy tokenizer
These abbreviations are not a perfect fit anymore because
abbreviation replacement is now applied before transliteration.
Sarah Hoffmann [Sun, 6 Jun 2021 09:00:44 +0000 (11:00 +0200)]
adapt tests for ICU tokenizer
Sarah Hoffmann [Fri, 28 May 2021 20:06:13 +0000 (22:06 +0200)]
move abbreviation computation into import phase
This adds precomputation of abbreviated terms for names and removes
abbreviation of terms in the query. Basic import works but still
needs some thorough testing as well as speed improvements during
import.
New dependency for python library datrie.
Sarah Hoffmann [Wed, 26 May 2021 18:50:34 +0000 (20:50 +0200)]
icu tokenizer: move transliteration rules in separate file
The tokenizer configuration has become difficult to handle
due to the additional manual transliteration rules. Allow
to have a separate rule file that is given to the ICU library
as is.
Sarah Hoffmann [Sat, 3 Jul 2021 19:14:43 +0000 (21:14 +0200)]
docs: nominatim-ui should be installed from the release
The development version does not provide the pre-packaged
dist directory anymore.
Sarah Hoffmann [Sat, 26 Jun 2021 14:21:08 +0000 (16:21 +0200)]
Merge pull request #2373 from lonvia/tweak-search-cost
Further tweaking of search cost
Sarah Hoffmann [Sat, 26 Jun 2021 09:20:25 +0000 (11:20 +0200)]
remove penalty for full words in address
Now that mutli-word partials no longer exist, multi-word full
words need to be used to search in addresses and therefore no
longer should have a penalty.
Also changes the condition when a full word is included into
the address. It is no longer relevant if an equivalent partial
exists but only if the term consists of more than one word.
Sarah Hoffmann [Sat, 26 Jun 2021 08:31:55 +0000 (10:31 +0200)]
adjust penalty for housenumber-in-name searches
When searching for house numbers in the name (for place-only
terms) then the same penalties need to apply as for the
regular house number search.
Change the code to first compute the penalties and then create
the new search variants.
Sarah Hoffmann [Mon, 21 Jun 2021 14:32:54 +0000 (16:32 +0200)]
increase minimum Python to 3.6
Python 3.6 introduces formatted string literals and
flag enums as well as a much faster dict implementation.
These changes make the code so much simpler as to warrant
dropping Python 3.5 support.
Affected distributions are Ubuntu 16.04 and Debian Stretch.
Sarah Hoffmann [Fri, 18 Jun 2021 08:58:41 +0000 (10:58 +0200)]
make sure old data gets deleted on place type change
When changing from some other place type to place=postcode
make sure that the old place type entry in the place table
is deleted.
Sarah Hoffmann [Thu, 17 Jun 2021 22:28:10 +0000 (00:28 +0200)]
update postcode in place if it already exists
Sarah Hoffmann [Thu, 17 Jun 2021 13:30:05 +0000 (15:30 +0200)]
Merge pull request #2369 from lonvia/exclude-poi-from-housenumber-search
Do not return POIs when dropping house number in query
Sarah Hoffmann [Thu, 17 Jun 2021 10:05:33 +0000 (12:05 +0200)]
do not return POIs when dropping house number in query
We've previously added searching through rank 30 in a house
number search to enable searches for house number+name.
This had the unintended side effect that rank 30 objects
are also returned in s search that dropped the house number
from the query. This is wrong because POIs cannot function
as a parent to a house number.
This fix drops all rank 30 objects from the results for a
house number search if they do not match the requested house
number.
Sarah Hoffmann [Wed, 16 Jun 2021 09:45:07 +0000 (11:45 +0200)]
Merge pull request #2360 from AntoJvlt/postcodes-place-table
Use place instead of placex to compute postcodes
AntoJvlt [Sat, 12 Jun 2021 13:46:08 +0000 (15:46 +0200)]
Improved performance of the postcodes query and some code cleaning
AntoJvlt [Sat, 12 Jun 2021 13:35:51 +0000 (15:35 +0200)]
Always delete old placex entry for type=postcode when inserting a new one into the place table
AntoJvlt [Wed, 9 Jun 2021 07:24:25 +0000 (09:24 +0200)]
Handle postcode type change in place insert trigger
AntoJvlt [Tue, 8 Jun 2021 20:39:04 +0000 (22:39 +0200)]
Clean and update tests for postcodes
AntoJvlt [Tue, 8 Jun 2021 07:33:10 +0000 (09:33 +0200)]
Use place_exists() into can_compute() for postcodes
AntoJvlt [Mon, 7 Jun 2021 13:02:53 +0000 (15:02 +0200)]
Update tests for postcodes
AntoJvlt [Fri, 4 Jun 2021 19:26:13 +0000 (21:26 +0200)]
Use place instead of placex to compute postcodes
Sarah Hoffmann [Tue, 8 Jun 2021 08:42:14 +0000 (10:42 +0200)]
do not fail CI on codecov errors
The CodeCove upload depends on unreliable external code.
Sarah Hoffmann [Sun, 6 Jun 2021 16:29:51 +0000 (18:29 +0200)]
Merge pull request #2359 from lonvia/switch-bdd-tests-to-api-search
Remove deprecated commandline query function
Sarah Hoffmann [Sun, 6 Jun 2021 13:28:21 +0000 (15:28 +0200)]
remove deprecated query interface
Searches can now be done via the thin API wrapper.
Sarah Hoffmann [Sun, 6 Jun 2021 13:27:52 +0000 (15:27 +0200)]
switch BDD tests to always use search API
Sarah Hoffmann [Fri, 4 Jun 2021 21:54:37 +0000 (23:54 +0200)]
Merge pull request #2358 from AntoJvlt/documentation-update
Update documentation
AntoJvlt [Tue, 1 Jun 2021 15:02:45 +0000 (17:02 +0200)]
Update documentation
Sarah Hoffmann [Wed, 2 Jun 2021 18:58:14 +0000 (20:58 +0200)]
Merge pull request #2357 from lonvia/legacy-tokenizer-fix-word-entries
Fix insertion of special terms and countries into word table
Sarah Hoffmann [Wed, 2 Jun 2021 15:37:27 +0000 (17:37 +0200)]
fix insertion of special terms and countries into word table
Special terms need to be prefixed by a space because they are
full terms.
For countries avoid duplicate entries of word tokens.
Adds tests for adding country terms.
Sarah Hoffmann [Wed, 2 Jun 2021 14:25:26 +0000 (16:25 +0200)]
Merge pull request #2356 from lonvia/freeze-after-import
Call freeze after running and non-updateable import
Sarah Hoffmann [Wed, 2 Jun 2021 14:11:29 +0000 (16:11 +0200)]
docs: reload SQL when migrating to 3.6
SQL functions must always be reloaded when updating the software.
All other updates included the instruction as part of some other
migration. From 3.7 on it will happen as part of the migration
command.
Fixes #2335.
Sarah Hoffmann [Wed, 2 Jun 2021 09:08:48 +0000 (11:08 +0200)]
call freeze after running and non-updateable import
Some of the tables will have already been removed but
the tables for indexing are still there and should be
dropped.
Sarah Hoffmann [Wed, 26 May 2021 09:47:08 +0000 (11:47 +0200)]
commit changes to replication log table
Fixes #2350.
Sarah Hoffmann [Wed, 26 May 2021 09:04:02 +0000 (11:04 +0200)]
always compute guessed postcode for POIs from centroid
When guessing postcodes from the area, only postcodes within
that area are accepted. For POIs that is usually not what we
want as the postcode would have to be within a house for
example.
Fixes #2301.
Sarah Hoffmann [Tue, 25 May 2021 18:43:44 +0000 (20:43 +0200)]
Merge pull request #2349 from lonvia/fix-website-refresh
Only initialise tokenizer for refresh functions where needed
Sarah Hoffmann [Tue, 25 May 2021 17:16:22 +0000 (19:16 +0200)]
only initialise tokenizer for refresh functions where needed
Fixes #2347.
Sarah Hoffmann [Mon, 24 May 2021 15:41:38 +0000 (17:41 +0200)]
Merge pull request #2346 from lonvia/words-vs-tokens
Cleanup use of partial words in legacy tokenizers
Sarah Hoffmann [Mon, 24 May 2021 08:29:21 +0000 (10:29 +0200)]
add tests for new full name computation with ICU
Sarah Hoffmann [Sun, 23 May 2021 21:58:58 +0000 (23:58 +0200)]
reorganize keyword creation for legacy tokenizer
- only save partial words without internal spaces
- consider comma and semicolon a separator of full words
- consider parts before an opening bracket a full word
(but not the part after the bracket)
Fixes #244.
Sarah Hoffmann [Sun, 23 May 2021 21:08:11 +0000 (23:08 +0200)]
use make_keywords for place search terms also
Ensures that place indeed uses the same search names as other
names.
Sarah Hoffmann [Sun, 23 May 2021 20:13:03 +0000 (22:13 +0200)]
always ignore multi term partials in search
Partial terms should only ever consist of one word. Ignore
any other, they are a leftover from inefficient word index
builts.
Sarah Hoffmann [Sat, 22 May 2021 08:36:35 +0000 (10:36 +0200)]
Merge pull request #2342 from lonvia/icu-tokenizer-ci
Add BDD tests with icu tokenizer to CI runs
Sarah Hoffmann [Fri, 21 May 2021 20:40:22 +0000 (22:40 +0200)]
CI: run BDD tests with legacy_icu tokenizer
Sarah Hoffmann [Fri, 21 May 2021 20:39:56 +0000 (22:39 +0200)]
enable Tiger BDD API test for legacy_icu
Sarah Hoffmann [Thu, 20 May 2021 15:30:30 +0000 (17:30 +0200)]
Merge pull request #2341 from lonvia/cleanup-python-tests
Cleanup and linting of python tests
Sarah Hoffmann [Thu, 20 May 2021 08:26:23 +0000 (10:26 +0200)]
Merge pull request #2337 from mogita/fix/invalid-query-string
fix: add the missing question mark
Sarah Hoffmann [Wed, 19 May 2021 21:07:39 +0000 (23:07 +0200)]
test: fix linting errors
Sarah Hoffmann [Wed, 19 May 2021 15:37:03 +0000 (17:37 +0200)]
test: more use of table_factory
Sarah Hoffmann [Wed, 19 May 2021 14:42:35 +0000 (16:42 +0200)]
test: avoid use of tempfile module
Use the tmp_path fixture instead which provides automatic
cleanup.
Sarah Hoffmann [Wed, 19 May 2021 14:03:54 +0000 (16:03 +0200)]
test: use src_dir fixture instead of self-computed paths