Bureau of Vector-borne Diseases campaign: possible violation of Organized Editing Guidelines

Like others, I run often into these weird rounded area shapes with digit descriptions and non-prefixed village names overlapping existing residential areas with a source of BVBDXXXXXX, DVBDXXXXXX, WHO2019.

They raise validation errors both in iD and JOSM. Their name tag does not include the required prefix and sometimes conflicts with the existing village name.

I have left a few changeset comments but never heard of the original mapper. Deleted some of them manually, and sometimes kept the name/source.

Found out thru the forum search (the year 2020):

https://forum.openstreetmap.org/viewtopic.php?pid=778436#p778436
https://forum.openstreetmap.org/viewtopic.php?id=69928

that a local organization, the Bureau of Vector-borne Diseases have:

  • pretty much used OSM as their own custom database
  • reverted changes post-cleanup e.g. 80125521, 89829776
  • not documented publicly their activities nor contact the community (a private exchange was made with RussMcD, see forum post above)
  • not provided further info on the source and license of data.

Until 2 years ago, they seem to have maintained at least 12 thousand of these data inputs across Thailand: overpass turbo

I have asked their main contact Elijah Filip | OpenStreetMap to respond to this thread and provide more info about their campaign, their source of data and license, and whether they are willing to perform manual post-cleanup activities.

1 Like

The case documented in your second link was referring to the malaria status of individual households.
This data was not only reverted, but also redacted from the database. It is no longer visible in the history.
This was to protect the privacy of the affected people.
Back then I did not find any other violations of privacy.

The other problem is that the data is proprietary. We can’t maintain it, because no one knows where the data comes from and what to map.
Currently, it is not rendered. And it should not interfere with other elements in OSM.
So it is “just” dead data.
If the data was not updated in the last 2 years and no one responds back for what it is used, then we might export it (just in case) and then bulk-remove it from OSM.

3 Likes

Just to clarify, it does interfere when the shapes’ nodes/edges are touching other geometries (e.g. forest bounds, roads…). This is how I found them. If you fix a forest bound or realign a road touching one of these shapes, validation warnings are shown (at least in iD).

It might be that iD is “snapping” to the elements of these areas, which will connect it then to other elements as you describe.

To understand the exact details, you would have to check the history to see why it is connected.

I recommend to not care about the geometry of these areas. If they are connected to another element you are moving, then go on and move it.

Hi @julcnx and @stephankn,

Thanks for raising these issues again. You can find a bit more background on the mapping in this changeset comment thread which is mentioned in the links in the original post.

To summarize briefly, the Department of Vector Borne Diseases (DVBD) in Thailand worked with a developer starting in 2019 who was building out an application to monitor and respond to new cases of malaria and areas with historic transmission. To do so, locations of enumerated households and the boundaries of malaria foci (sub-village boundaries plus areas of potential transmission, i.e., forested areas, stream, water sources, etc.) were needed to determine the extent of routine and response activities.

The developers initially set up a task through HOT to enumerate the households in these areas after training health center staff on enumeration. However, when it came time to map the focus boundaries, they also relied on the main OSM instance despite those boundaries representing only malaria-specific information.

As far as I am aware, this was all done without the consultation with the OSM community. When we later discovered this was the case, we set up a separate, private OSM instance that DVBD now uses to delineate foci boundaries. However, what had already been enumerated on the main OSM instance was never removed as it wasn’t clear on our end whether an official community determination had been made on the next steps.

All of that is to say that we’re happy to help remove these areas/household enumerations if the community feels like they should be removed. Let us know if there is anything I can do on my end to help or if you have any additional questions.

Thanks,
Eli

I appreciate the candor, politeness and relative completeness of reply (replies) here.

As an individual volunteer in the project, I’ll say here and now that it would be my personal preference that these data be removed from OSM. Very carefully and so as to not disturb other OSM data they might be connected to, that is.

1 Like

Hello Eli,

thanks for responding here.

If you no longer need the data in OSM and that data is probably even outdated, then it might be the best to remove it.
I think I remember some discussions whether the “name” content could potentially be useful.

I would like this aspect discussed before we remove the data. The query by @julcnx returns some of the nodes which contain the interesting tags.
For example, do we consider the names valid for place nodes and want to keep them?
@Elijah_Filip what is the source of these names? Are they with a compatible license for OSM? How trustworthy are these names?

{
  "type": "node",
  "id": 3947331989,
  "lat": 19.4767940,
  "lon": 99.3955158,
  "tags": {
    "description": "5711011104",
    "name": "บ้านป่าคา",
    "name:en": "Ban Pa Kha",
    "place": "village",
    "source": "DVBD2020"
  }
},

If there are no benefits of having it in, then removing would be a good choice. This prevents confusion as documented in the various threads. Also it makes the data cleaner when editing. And data no one can ever maintain is of no use to OSM.
One key aspect of having data in OSM is that it can be maintained by community.

If we agree that data should be removed, then we should also discuss on how to do this in a proper way to not accidentally remove too much.

So we need a way to identify them. I am aware of the query above. Not sure how well it matches. I tried looking at a query where we have area and description and anything besides 4 tags. I came up with ways also having additional tags from this data-set. Probably worth checking. And some other very questionable taggings. But at least the source tag seems to be always there, so using it as a way to identify the data should work.

[out:json];
{{geocodeArea:Thailand}}->.search;
(
  way["area"="yes"][description](area.search)(if:count_tags() != 4);
);
out tags qt;

thanks @Elijah_Filip for the quick and helpful answer!

Could you please confirm that the other sources below are related? The changeset you mentioned only refer to way["source"~"BVBDMAY2019|WHO2019"]:

BVBD2019, BVBDAMY2019, BVBDAUGSUT2019, BVBDAUGUST2019, BVBDMAY019, BVBDMAY2019, BVBDMAY2o19, BVBDMay2019, DVBD2020, DVBDMAY2019, VBDUMAY2019, VBVBD2019, WHO2019, ิBVBDMAY2019, ิิBVBDMAY2019, ิฺBVBDMAY2019, ฺBVBDAMY2019, ฺBVBDMAY2019, ฺฺBVBDMAY2019

@stephankn other mappers, and I, have merged in some cases like in the example above, existing place nodes, landuse=residential with data from DVBD. These may contain previously valid shapes or names, hence should not be entirely deleted.

As you noted, the original data seem to be areas only with 4 tags area=yes + name=* + description=* + source~DVBD|WHO.

If it’s a node, or a place=* or landuse=residential tag exists, it may have been already merged, and we should only delete the source key and not the whole object.

The village names would definitely be useful and some may have already been reused, however, the process is very manual and time-consuming ( names need to be prefixed and moved to a new/existing place node). There are 900 shapes (not 12000 as initially stated) so it could be done if someone volunteers.

Hi all,

I can confirm that the data is mostly outdated, is no longer used by the program, and isn’t something they program plans on maintaining so removing it does seem like the best course of action.

As for the names, the source comes from the mappers themselves - mostly district-level health staff who mapped the foci. However, given that foci are sub-villages, I’m not entirely sure how well the names of the foci map back to the villages they are part of, nor how trustworthy they actually are, but I can double check with the Department of Vector Borne Diseases if it would be helpful.

All mapped foci were supposed to have the four following tags:

  • area: “yes”
  • description: a unique, 10-digit sub-village code
  • name: The name of the sub-village in Thai
  • source: The training under which the mapping was done (theoretically just BVBDMAY2019, WHO2019, and DVBD2020)

Unfortunately, particularly early on in the mapping process, tags were applied inconsistently, however the list of sources you shared @julcnx tentatively looks correct to me. I’d also add “malariaiamhealth”, “MALARIAMHEALTH”, “malarlamhealth”, “Source: MALARIAMHEALTH”, “VALTEST1123”, “VALTEST123” which all came up when I queried all ways in Thailand with a source and description tag through Overpass.

For what it’s worth, when we moved data over to the private OSM server, we used the following Overpass query to get the foci boundaries. It currently returns 592 ways, though is clearly missing areas that may have been mapped incorrectly:

area["name:en"="Thailand"] ->.a;
way["source"~"BVBDMAY2019|WHO2019|DVBD2020"]["description"~"."](area.a);
(._;>;);
out geom;

Let me know if you have any questions or if there is anything else on my end that I can do to help.

Thanks,
Eli

2 Likes

Thank You @Elijah_Filip for the clarifications.

I have done some manual checks and the names do not often match the other OSM data or other sources (GeoNames). Shapes are often not above actual residential areas, which makes it difficult to pinpoint them to a nearby neighborhood. At this point, it’s safer to remove the data.

Everyone: I am currently reviewing and retaining manually some of the items that were post-merged (often by me). I will then remove the rest of the data and post here the query as well as a data backup.

2 Likes

Please find:

[out:xml];
{{geocodeArea:Thailand}}->.search;
(
way[source~'VBD|WHO|IAMHEALTH|VALTEST'][!amenity](area.search);
);
(._; >>;);
(._; <<;);
out meta;

I have made sure to review that all query results were originally related to this organization’s previous campaigns.

@Elijah_Filip in the future if you feel some of the non-proprietary data sources you use or collect could be useful for both OpenStreetMap and your organization, please reach out to the community.

Administrative data and road network coverage in remote areas are definitely lacking, and the community would welcome any valuable inputs in line with the organized editing guidelines: Organised Editing Guidelines - OpenStreetMap Foundation