I am currently working on a research project and urgently need high-quality geographic data. Specifically, I am looking for large-scale city-level data, preferably in GeoJSON or ESRI Shapefile format, focusing on building details, especially for cities in the United States.
The building information I require includes:
Year of construction
Usage (e.g., residential, commercial, etc.)
Height
Date of last update
My experience so far:
I attempted to download the full Planet data from OSM Planet, but due to its large size, the download and extraction processes were extremely time-consuming, and I eventually ran out of storage space on my computer.
I also tried downloading data for Los Angeles (LA) from BBBike, but it still took a significant amount of time. Additionally, when I opened the data in QGIS and checked the attribute table, I found that essential information like building height was missing.
I am seeking advice on more efficient methods or sources to obtain detailed building data, particularly with construction year, height, and other critical attributes. If anyone has recommendations for tools, methods, or data sources that cater to these needs, I would greatly appreciate your input!
IMHO you are wasting your time in this specific case trying to do this with OSM and you should simply try to obtain the data from the relevant city GIS departments.
To be more verbose than Simon. OSM has depending on region very different quality/coverage of buildings.
When we have buildings we most likely dont have construction year. We might have height in very dense populated areas where people extensively use StreetComplete - or better - we might have the number of levels.
But as Simon already pointed out - you will most likely get better data when requesting your citys data, especially concerning advanced metadata.
Your way of using the planet (Or some subset of it) to extract that data is the right approach though. You may filter that data with osmium down to only buildings. Doing so on a global scale with the planet will require a lot of memory, cpu and diskspace though. So you better start small with a city level extract probably from Geofabrik and check your pipeline first.
Just wanted to avoid multiple people wasting their time instead of just one. Of the 4 data items the OP wants we typically only have one for any meaningful number of buildings in OSM and even that is doubtful.
Thank you very much for your response. Regarding the construction year of buildings, it is not a mandatory option for me. What matters most is the building’s usage and height. I have noticed that in major U.S. cities, OSM has relatively comprehensive information on these two aspects, so I am relying on OSM for now. Given that I might need to obtain a large amount of data from different regions, as Simon mentioned, searching for higher-quality data for each area individually could consume a significant amount of time. I have already found high-quality data provided by the NYC government. If you have any information on how to access high-quality open geographic data for other major U.S. cities, please let me know.
Regarding Feogabrik, I have found their download speed to be incredibly slow and frustrating. Additionally, based on my experience with other OSM data sources, many details, such as building height, which are visible on the web interface, often disappear in the data I download.
Considering that my research is still in its early stages, I need to review data from many different cities to determine the next phase. Therefore, OSM datasets, which can be downloaded in bulk, are quite suitable for my needs. As you mentioned, I have also previously searched for high-quality datasets for several major cities individually, such as the dataset provided by the NYC government on GitHub. However, searching for these datasets one by one is still quite tedious, and since I am not very familiar with U.S. administrative bodies, it is difficult for me to quickly find similar datasets for other major cities.
… just to be clear, what are you actually downloading? Their download pages offer several different sorts of data that are designed to be useful for different things. If you want all useful OSM data, you’ll want the .pbf**
** Actually, if you want all OSM data, including who edited it and when, you’ll need to sign in to OSM.
I would like to know the relevant references for this argument because, even if OSM height data cannot be applied in my research, it would still serve as a valuable argument to include in my paper. However, I am still quite surprised because I roughly observed some building data in Los Angeles on the HTML platform, and they seem quite normal.
Since I am more familiar with using QGIS and Python, data in GeoJSON and Shapefile formats would, of course, be ideal, but I think PBF files are also manageable in terms of processing. However, I completely agree with your second point. I’ve already downloaded a small portion of data and found that it lacks version information, specifically regarding when this data was uploaded. This is particularly problematic for research that requires combining this data with satellite imagery.
New York City and Philadelphia have height data too, but a lot of other Cities that I tested with a simple Overpass Turbo query overpass turbo not (I looked into Boston, Washington, Chicago, Kansas City and Houston).