Rethinking GIS
A continuation of an earlier post on my GIS “usability” tool. More automation, less thinking.
On Sloth 🦥
If you ask most engineers, particularly anyone in the automation space, if they consider themselves productive or lazy they will likely tell you they fall on the lazy end of the productivity spectrum, myself included. But lazy is probably the wrong word and “efficient” feels like a better descriptor. I will happily spend hours automating something that could have taken less time to do by hand if it means my life might be easier in the future, and such is the case with the GIS tool. Tired of manually downloading images, combining them, and then visually comparing two maps, I’ve added a bunch of automation and quality of life features to the tool. There is still one crucial manual step in the workflow and I’m hoping to automate that out in the future as well, but I’m lazy and honestly that automation step feels like a lot of work.
Let’s talk about the problems with my original impelementation.
Search, Click, Repeat
The source data for the usability map comes from the USGS National Map Downloader. You can manually define an area or upload a KML and then download the elevation tiles for that area. I’ve uploaded the example KML from my repo, and the tool returns 6 tiles. One option to download them is to click the Download
button for each one. The other option is to download UGet and follow the instructions for bulk download. For 6 tiles the manual option isn’t too bad; it’s annoying but not super time consuming. But what if you need to get 25 tiles, or 50? This process quickly becomes more trouble than it’s worth. The bulk option is easier but still requires installing a new piece of software and learning how to use it.
Data Prep Sucks
So you’ve got your files downloaded and now you want to create a usability map for the area of interest. Great, how do you go about combining them into a single GeoTiff for processing? Tediously, that’s how! Without going into the gory details, the process requires manually checking the footprint of each GeoTiff and figuring out the order in the grid that comprises the entire extents, starting at the top-left corner. The ordering is unique to every dataset that you download, and I haven’t decoded the mapping of [lat/lon] –> filename.
Here is an example of a somewhat generic combination script, with foreknowledge of the grid shape.
from osgeo import gdal
files = {
1: 10293120,
2: 10294117,
3: 10294118,
4: 10293008,
5: 10294005,
6: 10294006,
7: 10293012,
8: 10294009,
9: 10294010,
10: 10293016,
11: 10294013,
12: 10294014
}
tiles=[]
for i in range(1, 13):
f = files[i]
filename = f'USGS_OPR_NC_Phase5_2018_A18_LA_37_{f}_.tif'
print(filename)
tiles.append(gdal.Open(filename))
def get_bounds(tile):
gt = tile.GetGeoTransform()
minx = gt[0]
miny = gt[3] + tile.RasterYSize * gt[5]
maxx = gt[0] + tile.RasterXSize * gt[1]
maxy = gt[3]
return minx, miny, maxx, maxy
# Check coordinate system
for i, tile in enumerate(tiles):
srs = tile.GetProjection()
if not srs:
print(f'Tile {i+1} has no coordinate system defined.')
else:
print(f'Tile {i+1} coordinate system: {srs}')
# Check if it is in WGS84
if 'WGS 84' in srs or 'EPSG:4326' in srs:
print(f'Tile {i+1} is in WGS84 coordinate system.')
else:
print(f'Tile {i+1} is not in WGS84 coordinate system.')
bounds = [get_bounds(tile) for tile in tiles]
for i, b in enumerate(bounds):
print(f'Tile {i+1}: {b}')
# stitch into a single geotiff 3x4 grid
out_tile = gdal.GetDriverByName('GTiff').Create('stitched.tif', 3 * tiles[0].RasterXSize, 4 * tiles[0].RasterYSize, 1, gdal.GDT_Float32)
out_tile.SetGeoTransform((bounds[0][0], tiles[0].GetGeoTransform()[1], 0, bounds[0][3], 0, tiles[0].GetGeoTransform()[5]))
for i, tile in enumerate(tiles):
x_offset = (i % 3) * tile.RasterXSize
y_offset = (i // 3) * tile.RasterYSize
out_tile.GetRasterBand(1).WriteArray(tile.GetRasterBand(1).ReadAsArray(), x_offset, y_offset)
out_tile.FlushCache()
out_tile = None
print('Stitched geotiff created as stitched.tif')
The keen observer will note that the files must reference the WGS-84 geoid, an idealized smooth model of the earth if the earth was 100% water.
There are plenty of failure modes with this approach but the most annoying aspect is determining the correct order of tiles. The list that the USGS Map download tool generates seems to provide them in random order and correcting for that for more than 5 or 6 tiles is extremely time consuming.
Reading (Maps) Is Hard
There was some utility in the first iteration of the tool, but too much effort was spent manually comparing coarse features in the usability map with a Google Earth view to try and align the two. Sure, when a large swath of flat land is visible it provides a grounding point, but how do we know if the resolutions match? The default usability map uses a 30 meter sliding window. What does 30 meters of pixels look like in Google Earth? I have no idea. So how useful is the original tool, really?
These are the same areas but I don’t know if the Google Earth view is even approaching the same resolution as the map which started at 1 meter per pixel. Without a georeferenced map we’re left to our own devices to run a meatspace version of SURF.
Automation - Because I’m Lazy
The answer to every one of these problems is automation. Honestly, automation is the answer to just about every minor inconvenience in life. All technology is the result of humans wanting to be just a little bit lazier more efficient.
Automatic Downloads
When passing the --download
argument the tool now expects a .txt
file with a list of files. The list is populated by the USGS download tool and is meant for their bulk download option, but instead of having to use another application I can simply curl
each file in the list; they’re just sitting in an S3 bucket. Obtaining this list is the only remaining piece of the workflow that can’t be handled auotmatically, but I need to see if the USGS map has an API or another means of programatically querying their database against a KML.
Grid Construction
In theory the algorithm for determining grid size and tile ordering isn’t too difficult. In practice it probably isn’t either, but I let Claude take the wheel for writing that function and it worked flawlessly on the first try. See the intro to this section as to why I chose that route.
Georeferenced Output
Again, I have to cede responsibility to the robot for writing the bulk of this functionality; it’s just so good at what it does. However, the basic principle on which this operates is pretty straightforward. If a user supplies a KML as the bounding box GDAL is leveraged to crop the image according to the bounds of the KML - I did have to correct some broken code that the robot produced here. So it’s not entirely foolproof, yet. If a bounding box is not supplied then GDAL is still called to obtain the corners of the source imagery to be used on the output map.
In both cases the default mode now outputs a KML with a <GroundOverlay>
section to include the resulting usability map. The result is a KML file that can be loaded into Google Earth or any other GIS visualizer, allowing the user to see a georeferenced version of the map where white boxes indicate that land is likely too steep to build on.
Garbage In, Garbage Out
All these quality of life changes are nice and all but how do I actually know if the map that the tool is producing is valid? I’m not standing on the hills with a laptop in hand and deciding whether the terrain lines up with my map, or whether the white tiles truly are steeper than I would prefer. To put this another way: If the inputs (usability score) are garbage then the output (usability map) will also be garbage. For all intents and porpoises I pulled the usability score out of thin air. I might have convinced someone in the first post that it makes sense using math they haven’t seen in over a decade, but how can I go about verifying whether the process is working as intended?
Thankfully the answer doesn’t require being in the mountains or any other form of physical in-person verification. All the information is exists in the source data if you know what to look for. What we need is a section where we know with some reasonable confidence that land should be marked as “too steep” right next to land we know is flat. Thankfully the DOT has done the hard work and carved out some nice flat sections of the mountains against which I can compare my outputs. Behold: roads.
According to the Missouri DOT (and other standards found online) a standard lane is 12 feet wide. By default the output map bins its output into 30 meter sections, far too coarse to capture a road, but if I pass the --tile-size
argument I can specify the size of the sliding window used to compute the usability score. A two-lane road is 24 feet wide, or about 8 meters. Something something Nyquist frequency so I’ll pass --tile-size 5
to ensure that I can properly capture higher frequency information in the usability map. We see that in areas where a road has been cut through what looks to be a relatively steep hill that the map follows the roads roads well enough. Sticking with the mantra of “Don’t let perfect be the enemy of the good” I’m happy with the outputs. I’m not using this for TERCOM or anything that needs high accuracy so it’s good enough for my use case. Also, engineering and design are itertative, maybe there will be a third post with even higher accuracy.