5 Project Ideas for News Technology

One aspect of the fellowship is to think about the types of tools that news organisations may find useful in their work. For the past few moneths, I've been keeping a list of ideas that I feel there might be a need for. None of these are really new concepts, but they are places in which I feel that none of the existing technologies are mature enough to work in a newsroom.

Easy choropleth maps. While the tools for making maps without proper GIS software have gotten much better over the last year (thanks, for example, to the mapping support in D3 and the great CartoDB), I still feel that we're missing an easy tool to generate interactive choropleth maps in a rush. Fusion Tables look really dated now, while MapBox/TileMill are a bit heavy to use for a journalist on a tight deadline. Thankfully, Noah has started working on MapStarter, while the DataWrapper team are also working on an extension of their tool.

Simple data merge tool. This one is quite simple, but would be very useful: a tool to easily join up two datasets across a shared dimension. Of course there's Fusion Tables (pretty heavyweight), and VLOOKUP/HLOOKUP in Excel (urgh), but none of these have the charm and simplicity of Mr. Data Converter. A simple web service where you can upload two spreadsheets, specify their relationship and then retrieve a single, joined file would make this technique much more accessible to many reporters.

Data issues. Jacob Harris' recent source post reminded me of a discussion we had at OKFN about building an issue tracker for data. I have little to add to Jacob's excellent post, other than that I think a structured approach to to data quality management doesn't require fully revisioning the data, but could just work as a logging application for ETL scripts - simply creating a place where data wrangling tools could report suspicious events to.

Search-as-a-service for news applications. Most of the news apps that I've been working on recently have been driven by static data, often even through flat HTML pages. This generally works great, except for search: results must be generated dynamically and require knowledge of the whole dataset. While there are some in-browser attempts to do search, they don't scale very well. Google site searches, similarly, allow for little customization.

Searching election programmes requires a custom backend, but why?

Of course, there are hosted search solutions like Amazon CloudSearch, bonsai.io and websolr - but none of them support the kind of read-only, cross-origin query access that would be necessary to link them directly into news apps. More integreated offerings like Searchify can do remote read access, but their pricing model isn't very convincing (to be honest, I'd like to see an open source solution to this).

This is why, for a recent application to browse German party platforms, I had to default back to writing a node.js wrapper around Solr.

For more complex news applications, a search service could incorporate additional functionality, such as Google Alert-style notifications based on stored queries. Implementing those from scratch is a lot of work for a single news app, but the benefit to the users would be great.

A non-coder interaction language. This one is a bit more abstract than the others, but it's still an interesting discussion: as our tools for letting non-coders make data visualizations become more and more powerful, there will be a need to develop a language to describe possible interactions around a graphic without actually requiring people to code. While we can just re-invent Flash and Director, there may be more interesting approaches along the lines of Scratch or vvvv that could be tested out.

Kanban for news organisations. While it's not directly related to data-driven journalism, one of the more painful experiences I have each day is receiving a link to a Google Doc with SpOn's news planning. This could really benefit from a bespoke kanban tool that incorporates notions such as a developing story, topic desks and different delivery channels. Luckily, our friends at SourceFabric seem to already be on the plan.

Of course, this list is fairly random, but what I like about these things is that they would be fairly limited in scope. Other things I've blogged about before, such as solutions to data management or social network analysis seem much harder to pin down and align to the needs of news organisations.

Of course, I'd love to hear other people's ideas and pain points: what tools should we build (or adapt) next?