Integration of Substations in HIFLD into OSM

As you may be aware, the substation dataset from the US government’s HIFLD dataset has been removed from the public since 2022. This post is about how we are integrating those substations into OSM.

Overview

The substation dataset from HIFLD was withdrawn from the public in 202. It can still be downloaded here (you need to click the “Electrical Substations” layer in the layer list on the right, then click “Download”.)

Each substation in HIFLD is represented as a point with additional attributes. OSM recommends that all objects tagged with ‘power=substation’ be a polygon. In addition, we have found that most of the data is unreliable or out of date. Because of these issues we have created a manual process that uses the substations in HIFLD to guide us in creating and integrating those substations in OSM. This is a manual process that involves looking at the location where HIFLD states a substation exists, verifying that it does not already exist IN OSM, verifying that it exists in reality using satellite imagery, and then drawing the polygon that represents the substation using the satellite imagery as a background.

Our Workflow

Working on a per-state basis, we download the OSM data for that state from Geofabrik and extract all objects with a tag ‘power=substation’. We then remove the HIFLD substations that already exist within OSM using Point-in-Polygon (ST_CONTAINS). From there we load the HIFLD substations into a custom desktop application that displays the HIFLD substations over an OSM layer and a satellite layer. If we find the HIFLD substation at or near an object in OSM we go to OSM to figure out what the deal is. Sometimes the HIFLD substation is near the OSM substation in which case we ignore it and move on. Sometimes the object in OSM is a substation, but was not tagged as ‘power=substation’. In these cases we use the ID editor to make the minor corrections directly within OSM and skip the HIFLD substation in our application.

Once we have manually verified that the HIFLD substation is not in OSM, we verify that it exists in reality by looking for a substation on a satellite image. We will also use Google StreetView if the satellite image is not clear enough.

If the satellite image and / or Google SV is good enough to verify the actual existence of the HIFLD substation, we draw the outline of the substation which is then saved to a GeoJSON file along with the name of the substation from HIFLD.

Once we finish a state, we load the GeoJSON file into JOSM, visually look at the substations again, and run its validation function. If everything is OK, we use JOSM to upload the data into OSM.

The following columns exist in the HIFLD data that we do not bring into OSM:

  • OBJECTID, ID - I assume this has some relevance to the backend system someplace in the government but has no relevance in OSM.
  • CITY, STATE, ZIP, COUNTY, COUNTYFIPS, COUNTRY - This is the address of the substation and I have found this information to be unreliable.
  • VAL_METHOD, VAL_DATE, LINES - If these have a corresponding tag in OSM let me know.
  • MAX_VOLT, MIN_VOLT, MAX_INFER, MIN_INFER - OSM does have a tag for this called ‘voltage’, however, I have found the data in HIFLD to be unreliable and not comprehensive. For example, there could be voltages between the MIN and the MAX but we would never know. Because of this unreliability we do not import these columns into OSM.
  • TYPE - this is an enumeration with the following values: DEAD END, NOT AVAILABLE, RISER, SUBSTATION and TAP. During our conversion process we only work with rows with a TYPE=“SUBSTATION”.
  • STATUS - This is an enumeration of the following strings: IN SERVICE, INACTIVE, NOT AVAILABLE, PROPOSED, and UNDER CONSTRUCTION. We only deal with rows with a STATUS=“IN SERVICE”.

Upon upload the following OSM tags are set:

  • power=substation
  • name= (only if a name exists in the HIFLD data)

In theory, we could also set the ‘addr:state’ tag to the state we are working on. We could also set the ‘addr:country’ tag as well. I have found the other “address” related tags in HIFLD to be unreliable.

So, why am I writing this? We are currently in the process of adding substations to OSM using the HIFLD substations as a guide and was told to stop and go through the import process. All of the data we had uploaded was reverted. My feeling is that this is not an import, as we are manually verifying everything (except the name of the substation), including manually drawing the substations. So the question is, is this truly an import into OSM or can we proceed with what we were doing?

Thanks.

4 Likes

Hello! Thanks for posting and working through these additions. Several years ago, some folks wanted to tackle HIFLD hospital data and I wrote a bit about that process (as a proof of concept)

One note is that the use of Google Streetview is not allowed. There are other sources of streetside imagery that are compatible with our data license (Mapillary, Panoramax, Bing Streetside) so you’ll need to move to one of those.

1 Like

Thanks for the information about GSV. I seem to remember reading about being able to use it in this use-case but for the life of me, after some quick Googling right now, I cannot find it. We do not actually use it as the satellite image from ESRI and Bing is good enough.

I will take a look at your article.

Sounds good. If memory serves, the hospitals had similar data freshness and location accuracy that you describe for power substations.

40%-60% of the substations in HIFLD do not exist in real life. That’s the main reason we are doing this manually.

Oof. That’s a pretty bad false positive rate. I am glad you’re doing your due diligence!

1 Like

Thank you for posting your workflow here. I hope that a couple more of the proficient members of the US community will chime in and once they give a thumbs up you’re good to go.

1 Like

OBJECTID is the primary key for a record in any ArcGIS dataset. It doesn’t mean anything more than that and definitely isn’t worth fussing about.

That’s fine. Substation addresses can be useful, but they’d have to be more granular than what this dataset is providing.

If VAL_DATE stands for “validation date”, then the OSM equivalent would be check_date=*. Maybe certain values of VAL_METHOD could allow you to weed out nonexistent substations more efficiently.

If these names contain unwieldy company names or numbers, you might want to duplicate or move some of that information to operator=* or ref=*, respectively, so that the name=* remains fairly practical for day-to-day use.

Not quite… the first attempt has been reverted already: Changeset: 179243989 | OpenStreetMap

But anyway, it’s good we have a discussion and can together find a way to get the data to OSM in a better way. So you might check the history of above mentioned objects to see the data.

That was the biggest issue in the first attempt at least in MI.

1 Like

I have no easy way to verify the name or determine whether it is for the substation or operator, and since this seems to be the main issue with the import, we will not be using the name in any form.

As for using VAL_DATE for check_date, why wouldn’t we use the date we drew the substation or the date of upload? The reason why I ask is that is when we actually draw the substation as it appears on the satellite layer.

1 Like

Based on a cursory inspection of examples like Toyota, Mobil South Belridge, South Belridge Cogen, Castaic, South Bay 2, and University, my strong suspicion is that these are indeed substation names omitting “Substation” or another generic – essentially, short_name=*. I don’t see names of electric utilities in most of the features you added. In some cases, the name may include the name of the operator, but this is normal for a substation when there’s a similarly named substation in close proximity.

I would be comfortable using these names (with the generic added back) as long as they don’t override any existing names in the database. The names probably don’t come from someone at DHS running the substations through a random name generator or otherwise conjuring up fictitious names. We don’t strictly need to verify each name in imagery. I’m not sure we’ve ever raised the bar that high for substation names anyways.

check_date=* would be more useful if you were importing the dataset verbatim, but probably not in this case. I wouldn’t set it to the current date, because the aerial or satellite imagery you’re consulting is as much as several years old. I’d just leave out the key entirely.

4 Likes

A note from the get-go here, I don’t know the power tagging very well at all, these are just some observations from reviewing the data in iD and OSMCha.
As mentioned your previous changeset in Fl there are several that duplicates of existing features, for example take way 1485092814 that you added today, it seems to duplicate way 1384252192 that has been in OSM for a while but is tagged as disused:power=substation, adding a power=substation on top seems incorrect to me. If the power station exists and truly is back in service, it would be better to change its tagging rather than adding a new way on top of the existing feature. Way 1485092813 seems to, at least somewhat overlap with way 174398674. Some of these also seem to kind small, like way 1485092811, that one in particular seems to only have one machine in it, if it just be one single generator, amplifier etc, is it really a “substation”, aka there may be a better way to tag that. Though again on that last one that may just be me not knowing enough about the tagging of these types of features. Way 1485092810 seems to be a straight up duplicate of way 1099901546 with fewer tags. Same with ways 1485092809 and 1099901524, there are multiple other like this in the most recent Fl upload. Some, like way 1485092778, seem to be smaller parts of existing power features, like power plants so there may be better tagging for them than power=substation. For Turkey Point there already seems to be a substation within it, so way 1485092781 should be double checked, not that there cannot be two substations within a power-plant like that but because there may be better tagging for that feature.
From how I see it, when there is an existing building that is already mapped, (if it is unclear from aerial imagery whether a building exists) you should have recent street-side imagery confirming that the building no longer exists if you are going to remove it, if a substation overlaps partially it is better to have it share vertexes with the substation way then to remove it if it is unclear if the building exists from aerials.

Happy Mapping,
Udar.

1 Like

Hey @PowerMapper77

apparently you did not checked the names within Michigan before your import. Btw. again without following the import guideline, without any proper source in the changeset.

Just within the first 20 stations I can see multiple name which are not names. Otherwise, please provide any proof of those substations containing “LLC”, as well name should not contain abbreviations, like Dg or Drs C&c.
Riverview Energy Systems doesn’t sound like a name of a substation either.
Please fix those and as well properly document your import.

3 Likes

As stated elsewhere in this thread, we have decided not to include the HIFLD name when we add a substation to OSM. Not only is it time consuming to verify but the vast majority would not be verifiable anyway.

Regarding the value of the ‘source’ tag, what do you suggest?

Interesting about disused:power to be used for substations, as that is different than for power=line which just uses disused. I had assumed that all power tags used the same tag to indicate that something was disused.

We also found the issues you talk about with overlapping polygons and I thought we had addressed those issues in FL. I suspect our utility would start to replicate ID and JOSM to handle these types of things, so we may just skip those substations for now and swing back around and handle those cases individually.

Thanks.

One thing: Who is “we”?

If you do this in some sort of professional capacity your account needs to show that fact.

4 Likes

The difference between disused:*=* and disused=yes is that the former is unrecognizable because it’s no longer in use, whereas the latter would still be described as the thing regardless of its current use:

  • A disused (vacant) building is still a building, whereas a disused shop is not a shop.
  • A disused quarry is no longer a functioning quarry, but landuse=quarry is actually characterizing the land, which is still scarred (and hazardous).
  • A disused (deenergized) power line is still a power line, whereas a disused substation – well, it depends what you mean by it being disused. Is it merely deenergized or has most of the equipment been cleared off the concrete pad?

There are some gray areas, since something can be recognizable or unrecognizable depending on the audience and use case.

Regardless, I agree with @Udarian that there’s no need to keep a separate disused feature around that conceptually represents the same real-world thing as the current feature.

2 Likes

I am confused… Above screenshot showing ways you uploaded new, aka in version 1 having a name tag.

Correct and I decided going forward I would not include the name tag.

I’m also a little confused by this, in your changeset just yesterday in Maryland you added this named way: Way: ‪Conowingo‬ (‪1485090642‬) | OpenStreetMap , which seems ambiguous/redundant with the name of the hydroelectric dam it’s within. Also, as an aside, are you just tracing these from imagery? Before uploading you might want to square the shapes (can be done by hitting Q in JOSM), it’s weird for something like this to be sticking out of the existing geometry like it is now. I can’t say I know enough about power tagging to comment on whether a substation within a power plant makes sense.