Command Line Carto III – Loops
So far, we’ve just worked explicitly – telling our makefile what our input is, and what the output should be. For all the reasons aforementioned, this can be plenty handy in of itself. However, the real ✨ shine ✨ behind programmatic carto work is being able to run your workflow over any amount of files in one go.
Instead of working on a single file, looping allows us to iterate over any number of em’, all with the same commands.
Some use cases:
- Project, style, and crop multiple raster images
- Work with datasets (such as state data) consistently without the need to merge them together or repeat workflows manually
- Splitting a stupidly large shapefile by attributes and work with the output
…and that’s just focusing on GIS-specific tasks. All of the stuff covered in this tutorial will be applicable for any CLI tool. If you wanted to, say:
- Download fifty files from an FTP site without manually clicking on each link
- Run ImageMagick over a folder of images
- Convert a folder of static images to a video or using ffmpeg or a gif
…you’ll learn all the tools here to do so.
OK - enough salesmanship. Let’s get loopy.
The new kid on the block is going to be GDAL. If you’ve used QGIS, chance is good you’ve used GDAL incidentally. In fact, if you have QGIS installed, you already have it installed as well. To test:
- Open your terminal
gdalwarp --hand hit enter.
- If your terminal responds with
Usage: gdalwarp [--help-general] [--formats]..., you’re good to go! Otherwise, you can install it from here.
Finally, here’s some files you’ll need for the rest of the tutorial. Letsago!
Loops are code that run X times. Where X can be any number greater than one - and hopefully smaller than infinity. Giving a computer a task that goes on indefinitely won’t fry it, but it will stall the poor thing out. We can manually tell our makebot to do a command five times,
manualrun: echo run echo run echo run echo run echo run
or we could create a loop that runs five times.
looprun: for num in 1 2 3 4 5; do \ echo run; \ done
for is the magic command we’ll get to know. The basic setup is:
for X in Y; do \ COMMANDS; \ done
The syntax is very important here. Like with mapshaper, a for loop needs to technically be delivered in one line. The
\’s allow us to break the line for our human eyeball’s sake, but computer’s won’t register them.
; desiginate a break in the code without it being a new line. This lets us write multiple commands that the computer can correctly interpret as:
for X in Y; do \ RUN COMMAND ONE; \ THEN RUN COMMAND TWO; \ THEN RUN COMMAND THREE; \ done
Most importantly is the very first line,
for X in Y. In our first example, this was written as for
1 2 3 4 5. Y can be any set of strings seperated by spaces. The length of this list is how many times our loop will run, once for each item in it.
X is the variable name we can use to call those strings in the looped commands. So long as it isn’t already defined, you can name it whatever you’d like. Inside the loop we can then refer to whichever item from Y we’re on by calling
$$X, which is whatever you declared X as. Try running:
fruitloop: for fruit in banana apple pear; do \ echo $$fruit; \ done
This should spit back the list of fruits. Let’s try a more practical application, using filenames instead of good food.
shpToJson: for file in colorado newyork wisconsin; do \ mapshaper example-one/$$file.shp \ -format geojson \ -o example-one/$$file.json; \ done
Makefiles demand a strict syntax, but thankfully they’re smart enough to deliniate our variable
file from the rest of the words it’s surrounded by. This means instead it will correctly parse our file locations as
As shown above, we can also use that variable as many times as we want within the loop. And it can be part of the path name, instead of the full path. With a minor tweak to the above script, we can create an output folder for the processed json’s to be saved into, then save em’ there. All while keeping the input safe and unchanged.
shpToJson_Cleaner: mkdir -p example-one-output for file in colorado newyork wisconsin; do \ mapshaper example-one/$$file.shp \ -format geojson \ -o example-one-output/$$file.json; \ done
You may be saying “Boy howdy, writing out each filename in the
for line seems annoying.” And you’d be right. Next up, how simplify that.
In part two we learned how to declare variables and use them in later commands. Following suit, we could rewrite our last command with a variable:
FILES=colorado newyork wisconsin listStateData_Variable: for file in $(FILES); do \ echo $$file; \ done
This is mildly better, but still not very viable for folders with oodles of files. Make has some in-built functions that can help.
FILES_WILDCARD=$(wildcard example-one/*) listStateData_Wildcard: for file in $(FILES_WILDCARD); do \ echo $$file; \ done
Much like how we call variables, you can call functions with
$(FUNCTION arguments). In this case
$(wildcard pattern) is a function built into Make that returns a space seperated list of files that match the pattern we give it. Convinent!
That should return a list for files in the folder example-one. You could work with that, but because the variable is the whole path, we won’t be able to save our output to a different file. Not to mention this lists out every file in the folder, so ack, it’s also returning the jsons we made earlier.
This also butts heads with shapefiles, given each one shapefile comes in packs of 4-6 separate files with differing extensions. More functions to the rescue;
Will return just files that end in
example-one/colorado.shp example-one/newyork.shp example-one/wisconsin.shp
FILES=$(notdir $(wildcard example-one/*.shp))
notdir, i.e. ‘Not Directory’, will ensure the output is just the filename:
colorado.shp newyork.shp wisconsin.shp.
FILES=$(basename $(notdir $(wildcard example-one/*.shp)))
basename will drop any extensions, giving us a clean
colorado newyork wisconsin list. We can always write out folders and extensions in the loop, so this way we free ourselves up to find our input and put our output anywhere, and as anything!
Putting it altogether we have:
FILES_BASE=$(basename $(notdir $(wildcard example-one/*.shp))) shpToJsonVariable: mkdir -p example-one-output for file in $(FILES_BASE); do \ mapshaper example-one/$$file.shp \ -format geojson \ -o example-one-output/$$file.json; \ done
Worth noting, if you chain multiple commands together Make won’t check the variable until it’s called in a command. This means you can set this variable to check a folder that doesn’t exisit yet. So long as the folder you’re searching contains files by the time you call it, you’re good.
That works well and good for a list of files. Sometimes we want to loop over a specified list of text instead though. For example, looping over a series of URL’s for downloading files.
The first step for that would be to generate a list of the strings you want as a CSV. There’s a million ways to do that, but most straightforward might be to write them out in Excel or Google Sheets, then download them.
Whichever way you get your file of strings, we can assign it’s contents to a variable like so:
LIST=example-one/list.csv CONTENT=`cat $(LIST)`
cat is short for concatenate. If used on any file, it’ll return the contents of the file. The
` tell Make that it should understand the inside text as a command and not as a string.
Just like with our previous example, this can be used to pass along a list of items to our for loop.
LIST=example-one/list.csv CONTENT=`cat $(LIST)` shpToJson_List: mkdir -p example-one-output for file in $(CONTENT); do \ mapshaper example-one/$$file.shp \ -format geojson \ -o example-one-output/$$file.json; \ done
Now let’s see some loops in action.
Strap in, we’re going to write commands to:
- Download the past 12 hours worth of weather data
- Convert each hour's worth of data into a geotiff
- Style each hour's data according to the temperature
- Merge those images into a single gif
The data we’re going to use is NOAA’s Real-Time Mesoscale Analysis. This file is a wealth of weather data.
Of ways to access it, we’re going to use NOAA’s FTP site where they post it each day. Now you could manually click on each file to download. No one’s got that kinda time tho.
URLS=`cat second-example/links.txt` fetcher: mkdir -p data cd data for url in $(URLS); do \ curl -O $$url; \ done
Hopefully you recognize the bits and pieces there. We’re:
- Parsing a list of links, saved as a TXT file
- Creating a folder for the data,
- Entering the folder
- For each URL in that txt file, downloading it
Worth noting you can do variations of this for batch downloading anything hittable with a web URL. If you wanted to download that Microsoft building dataset for the entire U.S., you could write a txt file with all 50 states in it and loop on that. Lots of possibilities here!
Extracting the Data
So now we have a folder full of mystery .grib2 files. Grib stands for ‘GRIdded Binary’ and is something you’ll encounter with weather datasets fairly often. Think of it as a brief case full of raster images. Each image is given a number, or a band, of which we can pull it out with.
You’re already familiar with bands if you’ve done anything with RGB images. In that color space, every color is three plates of red, green, and blue values smashed together. Here, instead of being smashed together, each band is it’s own layer.
RTMA has a bucket-load of layers, each a different weather phenomena. We only need one layer, so like a jenga block we want to pull just one out and save it as something more manageable; a good ol’ geotiff. It’s worth checking out what the other layers contain as well though! RTMA is a wealth of weather data.
Here’s how we dig just the raster we want from the pile.
FILES=$(basename $(notdir $(wildcard data/*))) exportBand: mkdir -p band for f in $(FILES); do \ gdal_translate -b 9 data/$$f -of Gtiff band/$$f.tiff; \ done
The new command up there is
gdal_translate. Unlike Mapshaper, gdal comes in bits and pieces. Translate is the goto for converting rasters between formats, and resampling/resizing. You can read all about what this and all other gdal commands can do here.
Our command we’re running on each file here reads:
Call gdal_translate, pick band 3, open [INPUT], save it as a Geotiff, and save it to [OUTPUT]
This will work for any grib file, but be warned! There is no consistency between different grib sources. RTMA will be consistent, but if you try this out on a different NOAA dataset, the bands may represent different things. Some may have two bands, others dozens. Windspeed may be located on band 8 in some, and 3 in others. It’s imperative you find an explainer file for the dataset you’re using to ensure you’re using the right image!
For RTMA, we can see what the bands translate on Google Earth Engine here. We’re going to make a temperature map, so we want the third band.
This is also where our pickiness with the file names pays off. By only using the file name and no extension or directory, we can use it for both the input and output. Running the above command, you should now have a data folder that’s sitting pretty and a shiny new band folder with the same amount of files, but in geotiff. Non-destructive repeatable GIS hoorah!
RTMA by default comes down in a variation on Albers U.S. making this step optional. Projections will come up a lot tho! So it’ll probably be worth seeing that quick.
exportBand: mkdir -p proj for f in $(FILES); do \ gdal_warp -s_srs "+" -t_srs "+" band/$$f.tiff proj/$$f.tiff; \ done
gdal_warp is the library that handles taking an image that’s in one projection and warp it into another.
s_srs is the source, or the original files projection.
t_srs is the target, or desired
Color by the Numbers
The final flavor of gdal we’ll encounter today is
gdaldem, which let’s us apply a color gradient to a geotiff based on it’s values. The command name hints this functionality was meant for coloring elevation models, but it works fine for this too!
You will need another external file, a guide for gdaldem to color-by-the-numbers by. I’ve prepped one for you already, but if you’re curious welcome ya
colorize: mkdir -p colorize for f in $(FILES); do \ gdaldem color-relief band/$$f.tiff color.txt colorize/$$f.tif; \ done
With that, we’ve got our frames for the animation. We just need to put it all together. Once you can stop caring about the georeference, you can leave gdal behind and kick open
ImageMagick. This is a library that handles
gif: convert -delay 15 -loop 0 $(DATS2) gif.gif
Overlay geographic data
composite: convert gif.gif \ -coalesce \ null: overlay/overlay.png \ -gravity center \ -layers composite \ result.gif
As I’ve tried to stress throughout these tutorials, Make is an old language. But a lot of the concepts covered here are universal to programming languages. Python and Node are fantastic alternatives to using this antiquated language - but the general workflow and way of thinking about CLI Carto should translate well.
– D.M., live from Bryant Park
April 26th, 2022
If you've read this and notice any parts that still seem super confusing, or just have any other questions feel free to reach out to me on Twitter!
Quick kudos to Matthew Bloch for both creating Mapshaper and sparking a huge interest in it in me during his talk at NACIS in 2017.
And finally another quick kudos to Derek Lieu for inadvertently introducing me to this approach for GIS work.