Best way to delete hundreds of duplicate identical nodes

wkc · July 30, 2012, 5:56am

Hello all
I have a problem where a user added thousands of duplicate nodes to the map years ago. So what I have is two or three houses one on top the other with identical tags and data. I am trying to update the data using JOSM, but deleting the nodes one by one is obviously going to take forever. Can anybody suggest a way to erase one of the duplicate nodes so there is only one of them?
The nodes all have the same lat, lon, addr:street, addr:city, addr:country. I am thinking that since the nodes are identical to use some type of search to select all the nodes with the same attributes and erase all of them, leaving the one with the highest ID.
An example area would be http://www.openstreetmap.org/?lat=10.67651&lon=-61.49881&zoom=16&layers=M . There are other areas with hundreds of nodes all over the country and not just the 20+ in that example.
I hope someone can give me a pointer here.
Thanks in advance
wkc

g0ldfish · July 30, 2012, 7:55am

I have downloaded the area you linked to, but was not able to spot duplicate objects. But in general I would select them all (via search or with rectangle selection, depending on the situation), then click once on the overlapping objects while holding the ctrl key to deselect one of them. Hope this helps.

BCNorwich · July 30, 2012, 10:48am

Hi,
Just to clarify the question I think you meant there are groups of nodes with the same tag information. The nodes are not duplicate as they occupy different locations, (albeit very closely grouped). They will each have to be looked at separately as the tag information sometimes differs. For example the five nodes for house number four on Third Avenue, four nodes have identical tags with street=THIRD AVE, the fifth is different, it has street=THIRD AVE CASCADE.
One other point to raise is the question of all the abbreviations (AVE instead of Avenue, TT instead of Trinidad and Tobago). Might be best to draw the buildings or properties in as you or someone goes along but that would require a survey or local knowledge.
My own humble opinion is best left if alone if not done properly otherwise someone will have to go over it again.
Regards Bernard

Post Script: I also notice the street names are not consistent, First Ave, 2nd Ave, Third Ave, (one numerical?).

wkc · July 30, 2012, 11:37pm

Hello
thanks for the replies. Let me clarify the situation.
As you have realised the data is very inconsistent and in some situations very wrong. The typical situation would be as follows:
Two building nodes with the same coordinates, house number and city, but the street name is wrong in one and correct in the other. For example the house number 7 nodes at 10.676306, -61.510306 . The nodes have streets as “ST ANNS ROAD” and “ST ANNS RD ST”, the correct road name would be “Saint Ann’s Road”.

Typically what I would do is

remove all the obviously incorrect nodes (wrong city, country, in the middle of forests, etc)
correct the highway name (“ST ANNS RD” → “Saint Ann’s Road”)
using the search function replace all the “ST ANNS RD” with “Saint Ann’s Road” (find “addr:street”:“st anns rd”) on the nodes.

At this point I would now have two identical nodes and I need to remove one of them. I need to do this hundreds of times sometimes in one area.

I could remove the nodes with “ST ANNS RD ST” in step one, but I have found places where the wrong tagged nodes are in the correct places. While the correctly tagged nodes are in the wrong places. So I’d like to keep as much information as possible for later review.

The example area does not have duplicate nodes, but there are other locations that have them, my mistake sorry.
As I mentioned the data is very inconsistent and does not follow the wiki’s guidelines, things are in all caps and the use of abbreviations. Even the example I used is incorrect, there are no houses in that area, the city is not Port Of Spain it is Saint Ann’s.
The clustered nodes is another problem I will need to work on, but that is not the problem I am looking for suggestions for.

Any ideas in being able to remove these duplicates and other mapping advice would be appreciated.
Thanks in advance
wkc

g0ldfish · July 31, 2012, 1:50pm

I would not know any way to automate this.

But if I were you, I’d really be tempted to ask user pdunn to help you with cleaning and correcting the data. I suspect it to be an import (~ 17.000 nodes in one changeset), and there are good reasons why imports need to follow stricter rules (which pdunn seems to have ignored at least partly) than most things in OSM.

wkc · August 3, 2012, 2:51am

g0ldfish
thanks for the info and suggestion. I will try contacting pdunn later this week/weekend and see what can be done. I will look at the scripting plugin in JOSM to see if I can get it do do what I need. It has been a long time sine I have done any serious coding though.
I will report here if I make any progress with this.
Thanks again
wkc