One thing I’m interested in is figuring out how to align the calculated camera positions to geo-tags from corresponding photographs. This would take care of your, “what direction is up problem?” (with what accuracy I don’t know)
What I’m thinking is converting lat, lng,height to cartesian coords than calculating the transformation needed to bring the camera’s center of projection (CP) in line with the GPS measurements. I would then convert the CP back to GPS coordinates.
I found this python script with functions to do the conversion in both directions. http://gagravarr.org/code/geo_helper.py
I am unfamiliar with GPS and mapping. The conversion functions require that you define the system that your lat,long and height are in. How do you determine what system this is. For instance and iPhone image is geotagged, is the system known or assumed?
Until your message I have planned to rely exclusively on manual alignment of a reconstructed scene with OSM streets and GPS tracks with the help of stretching, rotation and translation (like it is done in Microsoft Photosynth).
Let’s see what one can achieve with automatic alignment provided that photos are geo-referenced.
Please note that it is possible to geo-reference photos even if the camera itself doesn’t have a GPS chip. One needs an external GPS-tracker which inserts time stamps into each entry in the GPS track. Comparing those time stamps with ones in photos one can assign a pair of longitude and latitude to each photo in question. That’s how mapping is often done in the OSM project. Further details: http://wiki.openstreetmap.org/wiki/Photo_mapping
If we found that the height was inaccurate could we at least align to ground level, possibly query a database for ground elevation at that lat long? That might give us an accurate top-down view. It seems to me that even if it is as accurate as we want, it would be a worth while first step.
I found a very interesting read regarding aligning SfM solutions to existing map data. Towards the end of the article they show data aligned using only GPS data. They then use the estimate from the GPS data to initialize their search.
I wonder if one can’t do better with GPS than they did. They are using Flicker photographs, whose geotag quality can only be determined algorithmically, in this case using RANSAC. I would be curious to see what one gets when ensuring the quality of the GPS measurements.
The other thing is that their method require aerial photography or existing accurate maps. . . possibly we could use Yahoo!'s aerial imagery.
Ok, if we know up-direction (estimated automatically or set manually) and set height above the sea level to zero for all cameras we are coming to 2D problem instead of 3D. As a side result the algorithm you’ve talked about should yield a scale coefficient (only one, right?). If we apply a scale transformation to z coordinate of each reconstructed point we should get its height in meters. Right?
Given 4 or more gps and corresponding camera locations we can calculate a transformation matrix that will translate, rotate and uniformally scale our point set. So this means that UP aught to be roughly along our up axis and we can estimate height in meters. If we have very accurate GPS, say around 1 meter of error, I think our alignment will look pretty good with this method. Especially if we have nice spread out measurements. Without accurate height measurements we have to rely on elevation data and the assumption that the photographs were taken at ground level.
I did some quick tests today using an iPhone I can see that the GPS will give us a rough placement (I haven’t written any code yet, I got hung up trying to get a good reconstruction). Also I need a better way of geotagging my photographs. Right now I’m using an iPhone app called Trails and gpicsync - Trails seemed to take infrequent measurements.