Proposed guided import of data for UK schools

TLDR: Look here and see what you think about the edits being suggested:

Context

There is open data from the Department for Education (DfE) relating to educational establishments (schools, colleges, local authority nurseries), available under the OGL.

Establishments are matched to this data using the ref:edubase tag and I think that using this data OSM objects can be added and updated. I have created a series of scripts to transform DfE data from the Get Information About Schools (GIAS) system into OSM tags.

Automatic changes

I propose that the following tags can be automatically added or updated, with minimal individual checking:

ref:edubase:group, operator, operator:wikidata, operator:type, capacity, religion, denomination, diocese, school, school:type, school:group:type (and school:boarding, school:selective if they are yes and school:gender if it is not mixed). check_date can also be updated if other data is being changed or if the object has not been modified for at least one year.

Additionally, ref:edubase could be updated if the school has a ‘successor’ (e.g. a community school becoming an academy), if there are no other significant changes to the element.

At the moment, I am presenting these changes on the website to make them easy to review, together with an OSM file of all of the changes for a region, ready to load into an editor and upload manually. If this proposal is approved then this could be modified to be uploaded to OSM automatically, perhaps once per week. I will make a further post proposing this if appropriate.

Changes that need manual review

If the official data would lead to any changes in the following tags, then no changes can be made automatically and each element should be individually reviewed.

name, amenity, addr:postcode, ref:GB:uprn, min_age, max_age, phone, website

isced:level would generally require a manual review, however due to recent corrections to how isced:level should be tagged in the UK, then if min_age and max_age are already tagged and match the official data then isced:level would be in the semi-automatic updates above. This actually makes up a majority of the ‘automatic’ edits at the moment.

Wiki page

I have created the following wiki page for this, with a little more detail. Notably, that some tags will not be suggested if the school operates multiple sites, that the DfE name is checked against other name tags and that websites in the official data are all checked for redirects and existence before being suggested.

Motivation

I have been making edits to schools across the UK using the official data for a number of years, generally by creating MapRoulette tag-fix challenges using the data and then individually reviewing each element. However, I now want to semi-automate some of the edits and I want to make it easier to change the tags being edited, since I was often making the same changes to the suggestions (moving the name suggestion to official_name, or if it was a school operating multiple sites then the same tags shouldn’t be changed each time).

Website

I have created a website to make it easy to see the changes being suggested and then to perform those edits. I have only made edits of the manual corrections, to confirm that the editing works and to continue the work to keep schools up to date. I have not performed any of the suggested automatic updates and will not do so until an approval has been reached here.

This is only for England and Wales at the moment (GIAS only has full data for England, and limited data for Wales), future work would include extending this to the other countries.

Let me know what you think of this proposal and any changes that could be made to the process.

4 Likes

That tool looks really good - nice work!

I don’t know if DfE have fixed this yet, but a few months ago there were some systematic errors in the UPRNs being returned. In that the same UPRN was listed for a handful of unrelated schools in the same general area. I wrote to DfE about this, but never received a response. Also, I just randomly looked at one school in your tool, and it’s suggesting adding https://uprn.uk/100091533112 to “Necton Church of England Primary School”. But I assume the true UPRN should actually be https://uprn.uk/100091533074 .

I think the UPRNs in the DfE data need a bit more investigation, but possibly they’re not going to be reliable enough to use.

One other thing that occurred to me. If we map a primary school and an attached nursery as two separate objects in OSM, presumably that means the school object should only have the age-range and ISCED levels of the primary school proper, and the nursery should get the pre-school parts of those. I’m guessing your tool may be suggesting we add the full age and ISCED range to the school object.

3 Likes

I think someone else raised doubts about the reliability of UPRNs in the DfE data a few months back, possibly @philipcullen ?

1 Like

That looks to be a nice tool.

The UPRNs from GIAS seem to have quite a few anomalies. Organisations update address data themselves, but do not add a UPRN, so the DfE are presumably deriving this themselves and picking some odd ones in the process.

It may be worth considering adding the UKPRN and possibly DfE Number/LAESTAB in addition to the URN. The LAESTAB overlaps with the namespace of school numbers in Scotland, so would need to be name-spaced to England. The LAESTAB is handy, as this is retained if a school converts to an academy, where as a new URN and UKPRN are created. The UKPRN is handy as this is the identifier that the DfE are trying to make the standard identifier.

There will likely be some gaps in the linked institutions data, where it has not been completed by either the institution or the DfE.

Some FE organisations have merged, but retain distinct identities at different sites, so the name on GIAS may not always represent how a site is branded locally. GIAS may also only show the main site for these. The DfE does have a system to record these distinctly within a single UKPRN, and another number for them (the Campus Identifier), but I don’t believe they publish it anywhere.

1 Like

UPRNs are an area where a significant percentage of the data are wrong. I think that a lot are in fact correct, but most of these have already been added. Many of the ones you see on the tool are ones I have skipped adding because they are wrong. I have since started using not:ref:GB:uprn to stop these showing up in the tool.

If a UPRN appears on multiple establishments then it will not be suggested at all (it is almost certainly wrong). If a UPRN is being added or changed then that would never be done automatically, and it can be clicked to see its location. I don’t want to skip them entirely, since they can be very useful for finding the location of an unmapped school or for finding the new location of a school that has moved.

I believe that, as with all of the other data, the UPRN is provided by the school. However, under the FAQ for schools, it says “How do I check my establishment’s unique property reference number (UPRN) value on the system?” (emphasis added), which may imply that some were automatically added by DfE. However, there are establishments with missing UPRNs, so I am unsure.

Good point, that would be correct. It is related to the below.

How I have dealt with this is by using school:multi_site=yes to denote a single legal establishment that operates multiple sites (also true for some primary or all-through schools). Then for colleges, I would probably have each site with the appropriate ref:edubase and an operator tag matching the parent college. If these match then it won’t be picked up in the tool, and most of the data wouldn’t be updated on the site (as it may be wrong), such as ages and postcode.

It would of course be trivial to include these, if it’s agreed on and there is suitable tagging, ref:GB:ukprn and ref:GB:laestab seem obvious candidates.

1 Like

For Wales, there’s something called My Local School but I’m not sure on licensing (or how useful it is to you!)

It mentions being public domain, but isn’t quite explicit about any license. And then I can’t see any bulk download option, which makes it trickier to make use of.

2 Likes

There’s also a question of the education tag. I know that this basically duplicates the amenity tag, but it is being added to objects by iD users anyway.

At the moment, the tool only touches education if it is already tagged, ensuring that it aligns with the amenity tag, but this could be changed to either add it when other changes are suggested or just add it to all objects with a valid and current ref:edubase tag.

  • Only update education if it is already tagged
  • Add education if other changes are being suggested
  • Add education to all matched establishments
0 voters

Schools can update their addresses on GIAS. This is now done by them entering a postcode and selecting their address from a list, rather than being able to enter an address directly.

Looking at the page from the school side, the list of addresses shown to select from is a list of UPRNs displayed as house names/numbers. I wonder if this has implications on how this data is licensed as it clearly has access to UPRN ↔ Address mapping data.

There will be a lot of data that predates this interface, and as there will seldom be a reason to change an address, so the other data must have been matched by the DfE themselves.

I don’t believed, once merged, FE Colleges would retain distinct active URNs. They should have a link record between them though. Schools joining an academy are slightly different, as they remain distinct but linked to a academy trust.

1 Like

There is a similar dataset for Scotland too at School contact details - gov.scot, if it is of interest.

Thank you, that is useful to know how the school side works. I have previously had success with contacting schools about issues with their records, such as phone numbers and websites. I am sure that the same could be done with their addresses/UPRNs. That is most likely the best route if we want the data corrected, although it requires effort of contacting each relevant school separately.

Indeed, so each site has the same URN tagged, and probably the same operator, but distinct names.

Thank you, I was aware and it’s on my radar to incorporate at some point. It comes in quite a different format, so will need some different treatment. And from memory, it has email addresses (DfE doesn’t publish them) but the websites are generally much less accurate.

1 Like

I’ve had another play around with your tool on a few establishments that I’m working for at the moment, so have a few suggestions based on that experience.

  • Where the postcode for an entry is far away from the current postcode, it may be preferable not to suggest it as a replacement.
  • If the name exists already in OSM, make the suggestion of the new name not the default.

Overall, it seems a really well designed tool to update OSM.

Thank you, I do hope so.

I definitely don’t think so. If the postcode is different or a long way away then it could be a sign that the school has relocated or operates multiple locations, so more investigation is needed. The improvement that I could make is using Codepoint open to show some warning if the proposed postcode centroid is far away from the current location.

I’m afraid that I don’t understand what you mean here, perhaps you can give an example?