Journoid, data notifications

At the Open Interests hackday in November, a discussion with Martin Stabe from the FT's interactive desk led me to code up a prototype of Journoid. The idea is to monitor changing on-line datasets for remarkable information, like earth quakes, procurement in a particular industry or a close parliamentary vote. While we'd discussed alerting in the context of OpenSpending before, Martin had a pretty specific list of wishes that neither PANDA nor IFTTT can handle:

  • Search not just for a single keyword or query, but compare the incoming data to a table of matches, such as a list of famous people, well-known companies or any other set of items that you may be interested in.

  • Use Google Docs for configuration. The FT uses Google Apps internally and it's an interface that their reporters already understand - just add a "Config" sheet to your keyword document, and store all relevant settings - like the source URL and recipient email - in there.

The Journoid prototype from the hackday only fulfills the first of those requirements - and I'm still struggling with #2, as it's surprisingly hard to find a good Google Docs client library for Python.

Still, the hack was a nice demo: sift through a data dump from the UK departmental spending, check the supplier information against a list of companies of interest and finally send a message if there is a hit.

As a further experiment, I was able to use OpenCorporates to check the supplier's company status, answering a simple but interesting question: does the government do business with insolvent (or even dissolved) companies? It's interesting to think what other matches can be made when the comparison list is actually an API.

What's next? It's time to clean up the messy hackday code, to finish up GDocs configuration, some hosted solution and possibly a few other input formats.