FWTools & MySQL leaves ArcGIS geoprocessor in the dust
22 01 2008I wanted to know what block groups where included in each of the Zillow neighborhoods. So I needed to:
- Merge all the Zillow neighborhoods into a single datasource
- Intersect the block group data source with the neighborhoods
I had just done a similar process for the block groups and all US Census “populated places”, so I knew that the spatial intersecting in the ArcGIS geoprocessor was going to take a bit of time. I already had a MySQL data source with the block groups, so I thought I’d do a comparison.
FWTools & the Geoprocessor merged all the shapefiles into a single shapefile in about the same amount of time. In case you don’t know, here is how you do that using the ogr2ogr utility & DOS/Windows BATCH syntax.
FOR %f IN (*.shp) DO ogr2ogr -update -append -f "ESRI Shapefile" zillow_hoods.shp %f -nln zillow_hoods
Now that I had a combined shapefile, I loaded into the MySQL database, using the ogr2ogr command for MySQL. This was executed in less than a few seconds.
Finally, I wrote a SQL query to intersect the 2 datasources and create a new table with the results.
CREATE TABLE remap.bghoods SELECT b.bg_num,z.* FROM remap.bg_data b, remap.zhoods z WHERE INTERSECTS(b.SHAPE,z.SHAPE)
The final result
ArcGIS Geoprocessor : 1hr 2min
MySQL : 5min 29sec

Nice post Matt. This closely mirrors my findings using GDAL, PostGIS, etc. for automated tasks on large datasets. It would be a good idea to verify that the intersects() operator is operating on geometries and not bounding boxes. Up until recently MySQL geometry operations were only performed on so-called MBR, or bounding boxes [1]. I think that they have updated this functionality, but haven’t tried it- I have since moved on to PostGIS.
1. http://dev.mysql.com/doc/refman/5.0/en/functions-that-test-spatial-relationships-between-geometries.html
You are right that this operation is actually on MBR and not the specific geometeries so it is faster and less acurate than using PostGIS or ArcGIS. I didn’t try it on PostGIS, just because I didn’t already have the data in a table like I did for MySQL. However, I am certain than PostGIS, Oracle Spatial, or SQL Server 2008 would all perform this query in less than 1/4 the time it took ArcGIS.
There is a beta or RC revision of the MySQL database which actually test geometries and not just MBR. However, I don’t have it installed yet.
Ok- it would be interesting to see how MySQL compares to the other spatial databases once it has proper geometry operators.
Good luck!
Interesting benchmark, altough it’s broken because MySQL uses bounding boxes instead of the actual geometry. Would be interesting to compare processing times on this with more spatial databases and GIS such as Oracle Spatial, IBM DB2, MS SQL Server 2008 and even maybe Manifold?
Does anyone know if someone has done benchmark tests for basic GIS operations like this somewhere?
Instead of trying to mimic the ArcGIS intersection in MySQL, can you mimic the MySQL intersection in ArcGIS (do the intersection in ArcGIS using the bounding box instead of the actual geometry)? Do you think that would make a difference?
@Tripp
I honestly don’t know how you would go about forcing ArcGIS to use BBOX intersection and not spatial intersection. I’m sure it is possible but probably not through any pre-installed tools or scripts.