I’m coding something at the moment and I can honestly say it is the most fun I’ve had at work the whole year (and I generally love what I do). Doing this has reminded how I ended up in computer science and how much I used to love being paid to essentially solve puzzles all day. Anyway, I came across this YQL “bug” (Yahoo Query Language) and I thought I’d share it with you, or whoever googles this topic, and save someone a bit of pain.

My aim is to load a good clean database of South African cities linked to suburbs (I’ll write another post about that later) something a lot of people seem to want but no one seems to be doing very well, and those that have very clean data don’t open it up. I also wanted a relational table with SQL Server Geography datatypes, simplegeo or openstreetmap aren’t close to a perfect dataset. Ideally I wanted to load the geographic polygons that define the cities and the suburbs, I also want to load the bounding boxes, centroids and radius that encapsulate the suburbs.

I know there is some open street map data out there but Shaun Trennery, who I trust implicitly and is the tech brain behind safindit.com pointed me at YQL. Yahoo’s data seemed the cleanest and I decided to use their data to load the suburbs. (By the way, YQL is amazing: you can perform crazy queries like find all sushi restaurants in San Francisco “select * from local.search where query=”sushi” and location=”san francisco, ca” and Rating.AverageRating=”4.5”” awesome hey?) I have a JSON file of WOEIDs (Where on Earth ID) that define South Africa, this can be retrieved with a simple YQL query or download it here: LocationsZA.json. I wrote some code to parse these and load them into a SQL Server table that I’d designed (again – that post to follow).

I was having a problem loading Yahoo WOEID polygons as Microsoft.SqlServer.Types.SqlGeography objects in SqlServer2008. This fails on insert for certain polygons with “24200: The specified input does not represent a valid geography instance” exception which, typical of Microsoft, will take you here, and well, that doesn’t tell you anything you don’t’ already know (a side note, System.Data.SqlClient does not support the Geography type so you have to make a reference to Microsoft.SqlServer.Types).

I wanted to investigate if these polygons were in fact polygons. I also want to visualise the other elements of the WOEID (centroid, bounding box, radius etc) so I wrote the following ugly web form (let me reiterate I hate web forms and NML only build ASP.NET MVC but this was a quick prototype to see what the polygons looked like on google maps mostly taken from this post).

Figure 1 as you can see only slightly adapted from the codeproject. (click to enlarge)

why-I-wouldnt-trust-Yahoo-WOEIDs-sml

The following image illustrates Yahoo’s idea of the Constantia WOEID Polygon.

Building a relational City Suburb database for South Africa

As you can see it’s pretty obvious why the polygons won’t load, they aren’t true polygons. So that’s my warning don’t use yahoo polygons for suburbs rather uses bounding boxes where you can be sure that your geospatial queries will return trustworthy results. I’ve also noticed that in some instances Yahoo doesn’t even store the polygon.

So I had another chat to Shaun Trennery and we are keen to open this data up, and possibly build a geographic wiki app that allows anyone to modify the polygons, the administrators will have the final say. This will allow us to add suburbs yahoo isn’t aware of like those being defined by the Name Your Hood guys (if you know how much I initially disliked the Name Your Hood project you’ll find this ironic) and also define parent suburbs, for example Constantia is the parent suburb Bel Ombre, Nova Constantia, Constania Hills, Klein Constantia , Rust en Vrede and Alphen for example.

I’d be interested to get feedback from the GIS gurus on the best sources of Country/Region/City/Suburb/ information. Please also let me know if you are keen to collaborate on this mini project.

Enhanced by Zemanta

11 Responses to “Building a Relational City Suburb Database for South Africa (and why I wouldn’t trust Yahoo WOEIDs)”

  1. Have a romp around here – Surveyor General – I’ve dealt with them in the past, they have decent data, and are great to work with. One department I have no problem with my tax money going to … http://csg.dla.gov.za/

    StatsSA used to publish all the wards/municipalities (can’t remember if they did suburb/town) as shp files, but I see have pulled back on that now. You might be able to harass them for raw data.

    The big question is how public can you make the data you get from them? You’d have to have a chat with them and read their terms. Maybe you’ll get lucky. I use the Surveyor General data on my streetmaps site with credit, that’s within the scope of the license; I’m not sure if wholesale repackaging and distributing would be too.

  2. How long would this take you to change into a location booking application for the film indsutry, where you create the bound box of a shoot and run a pencil system that holds bookings until confirmed or a release is demanded by the pencil below in the order?

  3. @Greg I appreciate your comment and I know about the Surveyor General (in fact I had a job with them while doing my MSc [I was a lowly data capturer before the 99 consensus]. I really don’t find the surveyor generals data that useful, much like I don’t find the Yahoo data that great (yahoo has the Mallay Quarter vs I would like Bo Kaap for example). So maybe the wiki stores colloquial names also? I also think this would be a popular opinion driven thing. For example to me Constantia doesn’t extend past the M3 [call me a snob but I see that as Medoridge].

    Also I really want to be able to store parent suburbs like Greater Sandton (which as a yahoo polygon is tiny) which will include Morningside, Rivonia etc..

    Also I want to expose WKT and not just KML shapes files, it seems like the Surveyor General’s data is quite obscure in that way – but then I’m and engineer and not a GIS person.

    Love your work BTW.

  4. Anything is doable Joely, anything.

    ps. Damn fine idea.. as a wiki I could imagine this becoming very useful.

  5. Just asked ‘n GIS buddy of mine that’s been in the industry for years. He’s also only aware of the CSG dataset, but suspect it would be available at a municipal level… sounds like headaches.

  6. Firstly, congratulations on being born-again :-)

    Secondly, LocationsZA is only 3MB in size. It’s enough to fit in memory all the time and never have to store in SQL Server.

    Thirdly, it’s small enough in size to live in the browser, cached for a reasonable time.

    Fourthly, if you do everything in JS, your parsing won’t be parsing by plain old JS object manipulation, which, together with some decent list comprehension, your code will shrink to a just a few simple lines.

    Don’t stop having fun …

    – Aslam

  7. Hi Aslam,

    Not totally born again to the JC, I mean JS world ;)

    I still need this data in SQL Server (and ultimately .NET ;). Let’s just say it is an enterprise architecture constraint that I don’t have control over. I’ll post another post today that will make sense of what I/we are trying to do. But more than anything I’m wanting a colloquial database of suburbs rather than a governmental one. So this’ll turn into a basic MVC app (geowiki) that will allow point editing and suburb joining via a parent suburb – as described.

    I have 10 000 data rich points of interest in a SQL Server database (2008), these points are geography objects. My desire is to do queries like “give me all points of type x in STEnvelope y” [well that’s the pseudo code of it anyway]. There are other cool queries you can do in SQL Server geospatial like give me all suburbs polygons that intersect with suburb x. Also SQL Server geospatial queries are an order of magnitude faster than queries where longitude and latitude are defined as decimals or floats and the bounding box is defined by max min x y constraints. I don’t even want to think about doing these queries on the client side, besides a lot of the queries happen through native mobile apps.

    See you for No More Fluff.

    ps. yes.. this should so be a MongoDB problem.

  8. Nicholas Lindenberg says:

    Greetings

    You could make use of the OpenStreetMap.org dataset & API. It has suburbs defined in several ways (looking at the results from “Constantia, Cape Town, South Africa.”)

    Most of the major municipalities make their GIS datasets available to the public, for free/cost of media – you can contact the Cape Town GIS unit and obtain shapefiles of their suburbs, the road center-lines, etc and edit & upload from that base.

    vale,
    Nick

  9. The suburb data in OSM, was imported from SAGNS (South African Geographical Names System) database with a script in 2008.
    What i have read the SAGNS database seems to have clean city/town/suburb data.
    But I have not been able to get hold of the SAGNS database, if anyone has it please let me know.

    Thorsten

  10. Howzit guys,

    Any idea where I can buy a SUBURBS SQL DB for South Africa please?

    I have one from http://www.geopostcodes.com/ but it’s missing so much data and really unreliable for South Africa.

    I really appreciate your help…

    Stev

Leave a Reply