Jul 16, 2011

Batch merge CanVec shapefiles with gdal(ogr2ogr)

What the script does
I couldn't find a solution to batch merging CanVec shapefiles so I made my own script to run in the Ubuntu terminal.

The only extra program the script uses is GDAL. So I had to make sure GDAL/ogr2ogr was installed on my Ubuntu machine.

This script was actually my first Bash Script ever (not the batch downloader). It has a lot of bad practices that I have improved on since then, but in its current form works!

I will update the script once I get all my stuff settled into my place (moving...)

It will start by asking you for a workspace where it can create a folder called 'shp' and some subfolders: pt, ln and ply.

The zip file contents are then extracted to the shp folder.
Only the required shapefile parts are unzipped: .shp, .shx, .dbf, .prj
The script ignores the metadata or other unnecessary files for this process..

The shapefiles are then analyzed quickly for some feedback.

A single shapefile from each CanVec category is placed into a folder: pt, ln, ply
ogr2ogr requires a source shapefile to merge all other similar shapefiles to.

Source shapfiles are then appropriately renamed and the batch merging begins!


This script may bail if the shapefile size limit is reached (2-4gb) so I am considering other storage formats such as a postgresql database or a GRASS database for further geo-processing.

Note that merging in ogr2ogr does not create a seamless shapefile. Where two features come together, there will be a seam or segments. This is something I want to get rid of and will implement it probably through GRASS in the near future.



Batch Merge CanVec Shapefiles script
Like I mentioned there is lots of redundancy and bad practices in this script such as:
- Duplicate variables
- No string manipulation (all done with built-ins like cat rev cut etc...)

Download: CanVecSHP_v1.sh 13.7kb

Be sure to set the permissions (make it an executable) prior to running it or it won't execute:
chmod a+x script/CanVecSHP_v1.sh
./script/CanVecSHP_v1.sh
In the above it is ok to omit the 'a' in the 'a+x' so it becomes just '+x'

The script will provide lots of feedback during its course so scroll up in the terminal to get some interesting information when it completes.



Results
I am currently running this script to batch merge 960 sheets of CanVec data into non-seamless shapefiles.


Screen capture of CanVecSHP_v1.sh in Ubuntu terminal - Powered by Zoomify

960 zip files (3gb) unzipped in under 3 minutes.
Extracted 58,588 files (9gb)

Merged points(52mb), lines(7.6gb) and polygons(1.7gb) in about 13 minutes.
A 6gb shapefile took just under 4 minutes to merge and 20 seconds to merge 250mb.

Below are some screenshots of what the merged data looks like from way up.
Note that the CanVec data is not projected to BC Albers but in its native projection.


CanVec 50k Toponomy/Place Names.


CanVec 50k Transportation.


CanVec 50k Waterbody.


CanVec 50k Topography. Small black square is inset below.


CanVec 50k Topography. Inset from small black square on above map.

5 comments:

  1. It's good to someone else is using Canvec, and sharing their code!

    Personally I found it easier to start with the GML files than the .shp. First there are less files to download to begin with as one gml can hold all the points, lines and polygons for one layer type. Second it's easier to apply a look up table to convert the db-name (EN_1180009) to a human friendly one (Pipeline). See http://code.google.com/p/maphew/source/browse/gis/canvec/gml2shp_friendly_layers.bat and http://code.google.com/p/maphew/source/browse/gis/canvec/db-friendly-names.txt for examples.

    ReplyDelete
  2. I looked at your scripts when I was learning how to do this as well.

    I did not understand much of them at the time, I resorted to bash because it was simple to learn and now that I have learned the basics; your BAT files are understandable to a certain degree.

    I never used the GML format before but will now consider it because it seems to offer even smaller file sizes which is always good on my slow internet connection. I am downloading some of the data now to test it out. Thanks.

    ReplyDelete
  3. It's hard for me to make sense of your code as well, though much easier than it would have been a couple of years ago when I first started all this!  I read somewhere that reading other people's code is like reading and trying to understand something in a foreign tongue, even when the language is your own! ...((looks)) ahh, here is one by the same author on the theme though not the exact one I was thinking : http://www.joelonsoftware.com/articles/fog0000000053.html

    ReplyDelete
  4. Thank you very much for this script, I spent a few days trying to combine and rename the CanVec data before stumbling across your site. All my data is sorted away properly now and I'm in your debt!

    ReplyDelete
  5. Thanks Peter,


    Glad these worked for you, I use them all the time, but found a lot of problems in them that I'd like to fix. Hopefully I'll get those script updates done soon, as I've posted it all on github to make things easier.

    ReplyDelete