Potential Note Duplicates worldwide

Good morning everyone,

while the German Schwerpunkt der Woche No. 82 AND the UK Quarterly Project of Q1 2023 both aimed to close Map Notes, I came up with a way to find potential duplicates of Notes that have either been created by a glitch in a software or at will - either out of ignorance or just by accident.

Below you see a list of countries with their duplicate candidates. From today, I will be creating the corresponding tables in the Wiki and link them here, once they are created. The same Wiki page also describes the process of what I did, once the source code isn’t such a hot mess anymore, that only I can use it, I will publish it on GitHub.

Feel free to answer to this post, if you want any country moved up the list to create the table for, otherwise I will start with those with the fewest entries.

If a country is NOT listed here, it means that there are no dupes as of Planet-Notes.osn March 6th, 2023.

Country Potential Duplicates Total Open Percentage Potential Duplicates
Afghanistan 2 329 0.61
Albania 53 1008 5.26
Algeria 557 4029 13.82
Andorra 2 57 3.51
Angola 22 542 4.06
Argentina 140 3659 3.83
Armenia 19 276 6.88
Azerbaijan 565 2906 19.44
Bahrain 14 192 7.29
Bangladesh 177 2825 6.27
Belarus 15 417 3.60
Benin 10 423 2.36
Bhutan 8 164 4.88
Bolivia 12 930 1.29
Bosnia and Herzegovina 12 1097 1.09
Botswana 17 218 7.80
Brazil 58 5568 1.04
Bulgaria 4 898 0.45
Burkina Faso 14 1076 1.30
Cambodia 2 143 1.40
Cameroon 24 728 3.30
Canada 11 1944 0.57
Cape Verde 4 102 3.92
Chad 14 214 6.54
China 613 10689 5.73
Colombia 4 578 0.69
Costa Rica 108 1463 7.38
Côte d’Ivoire 18 1773 1.02
Croatia 10 1804 0.55
Cuba 18 355 5.07
Cyprus 59 943 6.26
Czechia 56 4084 1.37
Democratic Republic of the Congo 32 3829 0.84
Denmark 2 849 0.24
Djibouti 2 36 5.56
East Timor 12 107 11.21
Ecuador 17 972 1.75
Egypt 266 3851 6.91
El Salvador 2 121 1.65
Equatorial Guinea 6 52 11.54
Estonia 8 474 1.69
Ethiopia 239 1384 17.27
Faroe Islands 2 47 4.26
Finland 20 2325 0.86
France 119 30148 0.39
Gaza Strip 240 1024 23.44
Georgia 253 2505 10.10
Germany 8 44270 0.02
Ghana 26 785 3.31
Greece 133 4912 2.71
Greenland 11 171 6.43
Guatemala 2 786 0.25
Guinea 19 335 5.67
Guinea-Bissau 2 48 4.17
Guyana 30 243 12.35
Haiti 12 777 1.54
Hungary 18 2405 0.75
Iceland 14 636 2.20
India 127 6402 1.98
Indonesia 925 15058 6.14
Iran 7311 56978 12.83
Iraq 2496 13023 19.17
Ireland 8 2395 0.33
Israel 345 2812 12.27
Italy 6 23142 0.03
Jamaica 2 34 5.88
Japan 819 11082 7.39
Jordan 56 986 5.68
Judea and Samaria 53 309 17.15
Kazakhstan 379 3838 9.87
Kenya 52 819 6.35
Kosovo 159 1331 11.95
Kyrgyzstan 212 1912 11.09
Laos 154 1050 14.67
Lebanon 25 632 3.96
Libya 466 1931 24.13
Madagascar 83 730 11.37
Malawi 16 257 6.23
Malaysia 189 3545 5.33
Mali 16 589 2.72
Malta 4 267 1.50
Mauritania 21 422 4.98
Mauritius 14 231 6.06
Mexico 69 3538 1.95
Moldova 93 955 9.74
Mongolia 215 1335 16.10
Montenegro 71 810 8.77
Morocco 291 3648 7.98
Mozambique 15 227 6.61
Myanmar 593 3838 15.45
Nepal 779 4668 16.69
Netherlands 18 1292 1.39
New Zealand 13 1567 0.83
Nicaragua 5 349 1.43
Niger 17 306 5.56
Nigeria 41 1270 3.23
North Macedonia 14 374 3.74
Norway 76 2957 2.57
Oman 160 1011 15.83
Pakistan 114 2765 4.12
Panama 46 562 8.19
Paraguay 20 437 4.58
Peru 10 609 1.64
Philippines 749 7248 10.33
Poland 5 8397 0.06
Portugal 9 2326 0.39
Qatar 65 519 12.52
Romania 201 3162 6.36
Russia 1770 22726 7.79
Rwanda 12 233 5.15
Saudi Arabia 81 1727 4.69
Senegal 2 496 0.40
Serbia 17 398 4.27
Seychelles 10 147 6.80
Sierra Leone 18 190 9.47
Singapore 49 780 6.28
Slovakia 6 1903 0.32
Slovenia 4 389 1.03
Somalia 34 537 6.33
South Africa 2 1203 0.17
South Korea 35 1730 2.02
South Sudan 2 187 1.07
Spain 45 14031 0.32
Sri Lanka 74 1468 5.04
Sudan 39 646 6.04
Suriname 15 141 10.64
Syria 170 1414 12.02
Tajikistan 28 700 4.00
Tanzania 91 1961 4.64
Thailand 889 8256 10.77
The Gambia 18 135 13.33
Togo 10 602 1.66
Trinidad and Tobago 133 957 13.90
Tunisia 40 971 4.12
Turkey 474 8031 5.90
Turks and Caicos Islands 2 33 6.06
Uganda 43 1281 3.36
Ukraine 269 7287 3.69
United Arab Emirates 512 2621 19.53
United Kingdom 24 28885 0.08
United States 457 44123 1.04
Uruguay 10 199 5.03
Uzbekistan 688 3338 20.61
Vanuatu 6 155 3.87
Venezuela 12 885 1.36
Vietnam 815 7318 11.14
West Bank 336 1615 20.80
Western Sahara 6 179 3.35
Yemen 75 707 10.61
Zambia 13 329 3.95
Total 29100 547187 5.32

Changes in duplicate count:

Date Count
06.03.2023 30798
07.03.2023 30240
08.03.2023 29843
09.03.2023 29517
10.03.2023 29224
15.03.2023 29100

This thread also collected a comprehensive list of Issues with editors that may lead to these:

Have a wonderful day

Kai

5 Likes

I’d love to investigate what’s going on in Croatia, thanks.

(I also wonder how much might be related to StreetComplete #4853)

Uhm, I see Iran has a lot of duplicates, I wonder if this is connected somehow to: Many iranian notes in the middle of the Atlantic ocean · Issue #103 · osmlab/onosm.org · GitHub

Every single day I close 2 to 5 iranian notes in Italy.

Hi Matija,
see link to Croatia above. There being only 14 notes (7 instances) is quite good :wink:

K

1 Like

Hey @ivanbranco,

putting up Iran and Iraq will like take a while as putting that list together is an awful job for the CPU :wink:

I would guess yes and no. The problem you describe is onosm.org putting the note in weird places. But those would then be counted towards the country they are located in, in your case, Italy.

Anyway, it’s an interesting phenomen! Will be trying to stay on top of it!

K

I have a possible fix for this issue. It doesn’t directly fix the problem of duplicates. A code review and testing would help move along the merging process.

It tries to use the Nominatim supplied bounding box from the original search to limit where the resulting notes can be added. This should keep any duplicates relatively clustered and easy to removed once the information is add to the map.

Note that list in Poland includes duplicates I already fixed few days ago (based on your list - see say Note: 3365812 | OpenStreetMap Note: 3336691 | OpenStreetMap )

UK may suffer from similar issue.

Iran has someone(s) who is(are) using onosm.org on massive scale, to the point that it seems to not be helpful at all.

See https://resultmaps.neis-one.org/osm-notes-country?c=Iran

I reached out to Iranian community some time ago, but they (understandably) appear to be busy with other issues than OSM notes ( Mahsa Amini protests - Wikipedia is one of recent Iranian events )

If anyone will find duplicated StreetComplete note that has an attached image - please, mention it here and ping me.

This may help to solve mysterious note duplication (tracked as Multiple Notes created in the same place for the same poi · Issue #4853 · streetcomplete/StreetComplete · GitHub )

Yep, I know. I didn’t remove the closed notes from all countries. I will do that in a few.
All entries have been fixed yesterday evening at around 10pm CET. New run will follow at around 11am CET today.

The information has been updated.
State: Open Notes as of March 7th 2023, 04:20 UTC (planet-notes.osn), Closed Notes as of 12:40 CET

1 Like

This is very cool. I went into the Turkmenistan duplicates and cleaned most of them up. Thanks for flagging this.

The information has been updated.
State: Open Notes as of March 8th 2023, 04:18 UTC (planet-notes.osn), Closed Notes as of 06:43 CET.

Iran and Iraq have also got their own pages now.

Duplicated StreetComplete note with photo:

Duplicated in 652 and 654.

1 Like

The information has been updated.
State: Open Notes as of March 9th 2023, 04:22 UTC (planet-notes.osn), Closed Notes as of 06:27 CET.

I did the same thing a while back but in a proper geospatial database (PostGIS).
With this tool: GitHub - RicoElectrico/NoteMD I imported notes and did something like:
SELECT a.id as id_a, b.id as id_b FROM osm_note a JOIN osm_note B on ST_DWithin(a.geom,b.geom,1e5) AND a.created_at IS NOT NULL AND b.created_at IS NOT NULL WHERE a.id < b.id
Now this 1e-5 is in WGS84 degrees hence the north-south search radius will be longer towards the poles, but it doesn’t really matter that much.

I agree that PostGIS might have been the better way to implement this, but fortunately it still works decently fast - because the import is reduced to just the note id and the geo point, discarding every other information.

Can you elaborate on the algorithm you use in more detail than what is described in the Wiki?

  • Given the notes table contains id (int), geom (geography), ctry (char(2)), stat (enum(open, closed)).

  • For each Country (ctry is ISO3166-1) I select all open notes including their distance from 0/0 with

    DECLARE @g geography = 'POINT(0 0)';
    
    select [id], geom.STDistance(@g) [dist_to_zz]
    FROM [notes]
    where stat = 'open'
    and ctry = '<CTRY>'
    order by geom.STDistance(@g)
  • A ;-seprated list gets selected with
    DECLARE @g geography = 'POINT(0 0)'; 
    with a as (
    SELECT rank() over (order by geom.STDistance(@g)) [rnk], id
    from [notes] 
    where 
    id in (<Notes from previous step>)
    ),
    b as (
    SELECT rank() over (order by geom.STDistance(@g)) [rnk], id
    from [notes] 
    where 
    id in (<Notes from previous step>)
    )
    select rank() over (order by rnk) as number, stuff((select concat('; ', b.id) from b where b.rnk = a.rnk order by id for xml path('')), 1, 1, '') from a
    group by rnk
    order by rnk
  • PHP magic creates the overview-list for discourse and the wiki-text.
1 Like

Organic maps seems to have a bug:
Note: 3350004 | OpenStreetMap
Note: 3350005 | OpenStreetMap
Note: 3350430 | OpenStreetMap
Note: 3350436 | OpenStreetMap
Note: 3350635 | OpenStreetMap

… there’s an issue for it at [editor] Multiple uploads of Notes (often) and POIs (rare) into OSM after editing. · Issue #2071 · organicmaps/organicmaps · GitHub

2 Likes