Why Spatial Indexes are REALLY Important

 

I am not afraid to say that I failed…I compiled all of the Ordnance Survey OpenMap data into a UK vector coverage, just like I did with the Vectormap, Boundaryline & Strategi data, into a PostGIS database on a [Windows 2012] server. Unlike the other Ordnance Survey products, the OpenMap kept crashing my QGIS [2.8]….not something which is easily done!

There was a great incentive, the British Cartographic Society was holding an OS OpenData award & I had a great idea about recreating the old school 1:25000 scale maps using entirely open source data – Everytime I loaded the data, I would go to bring up the composer to create my work of art, QGIS would just crash….and not a little, this was epic. No matter what I did, reducing the drawing quality, reduce the rendering or even clip the output to the frame….NOTHING would work, it just crashed.

postgis

I’m a little new to PostGIS, I’ve worked with Oracle RDBMS, ESRI Geodatabase & a plethora of vector file formats but PostGIS is relatively new, so I took a few shortcuts in loading the ESRI Shapefile data to the database. Rather than loading the shapefiles using SPiT [in QGIS] I decided to use the OpenGeo Suite add-in (which is the dogs danglies)…..this allows me to not only load up the data but also styling & is GUI enough for me not to feel like I’ve gone back to being a database administrator. Unfortunately, as awesome as the add-in is,

IT FORGOT TO UPLOAD OR CREATE INDEXES FOR THE DATABASE OBJECTS

Before indexing...

Why is a spatial index important?

Indexes are what make using a spatial database for large data sets possible. Without indexing, any search for a feature would require a “sequential scan” of every record in the database. Indexing speeds up searching by organizing the data into a search tree which can be quickly traversed to find a particular record. PostgreSQL supports three kinds of indexes by default: B-Tree indexes, R-Tree indexes, and GiST indexes.

By default PostGIS creates indexes as B-Tree but best results (in my opinion) are found by indexing using GiST option. (GiST (Generalized Search Trees) indexes break up data into “things to one side”, “things which overlap”, “things which are inside” and can be used on a wide range of data-types, including GIS data. PostGIS uses an R-Tree index implemented on top of GiST to index GIS data.

B-tree-aggregated-example

So, I found the problem, what was the solution?

Having looked in detail into the problem, I found that I had 100+ data layers, with over 20GB of data which were un-indexed…luckily I am not the first to have an epic geofail…cue some hasty conversations with Ross Mcdonald [Angus Council QGIS & PostGIS Ge0-God] & Matt Walker [Astun Technology PostGIS Guru & Blogger]

Normally to create indexes, you would write the SQL script:

CREATE INDEX “[Index name]” ON public.”[Tablename]” USING GIST ([geometry fieldname]);

But with over 100 lines to do, it would have be extremely time consuming trying to extract all the names & index names… so Matt provided a small script:

SELECT ‘CREATE INDEX ‘ || quote_ident(f_table_name || ‘_geom_idx’) || ‘ ON ‘ || quote_ident(f_table_schema) || ‘.’ || quote_ident(f_table_name) || ‘ USING GIST (‘ || quote_ident(f_geometry_column) || ‘);’ AS SQL
FROM geometry_columns
WHERE f_table_name IN (f_table_name);

This goes through and writes the “Create index” script for you!!

If there is ever a time that you think that creating indexes is a waste of time, look back to this article….Last week I couldn’t open My data in QGIS without it crashing, today it not only loads but is relatively quick.

I may have lost a day creating indexes but I am going to save DAYS in the long run…..

Nick D

UPDATE!

As well as being niaive enough not to check the indexes when I was struggling to load the data but it would appear that I hadn’t updated the OpenGeo plugin for a while and was using 0.6.4 when the latest version is 0.7.2…..which, since 0.7 has supported creation of indexes by default.

As stated at the top of the article, I failed….but it is these failures which help us to be stronger and better in what we do [supposedly]

Nick D

 

2 thoughts on “Why Spatial Indexes are REALLY Important

  1. Which version of the OpenGeo Suite plugin are you using. The latest one uses ogr2ogr to import into PostGIS (unlike previous ones which used QGIS classes to do it), and ogr2ogr creates indexes by default (see the SPATIAL_INDEX switch in here: http://www.gdal.org/drv_pg.html)

    1. Yes, as I state above, the OpenGeo Suite plugin is awsome. It would appear that the fault here is all mine, I didn’t update my plugin and was using 0.6.4 instead of the 0.7.4…which seems to support the indexing by default!
      As always, thanks for all your support!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s