What if journalists had story writing tools as powerful as those used by coders?

03 December 2014

The last weekend saw the first Al Jazeera Canvas hackday in Doha. I had the opportunity to work with an amazing team of journalists, designers and engineers to tackle the challenge set by the organizers: re-think the way in which context is used in the production and dissemination of news stories.

The question we were asking ourselves was this: how can we help reporters to have more relevant facts at their disposal, right where they’re writing a story? This would include notes from their previous stories (and stories written by others in their news organization), and relevant data from wide range of global and local data sources.

Our solution to this problem is inspired by the tools that coders use to develop complex software architectures, so-called integrated development environments (IDEs): rather than using a plain text editor, these tools put all the relevant knowledge about the objects in your code and the software libraries relevant to your project right at your fingertips.

</a>

Eclipse is a typical IDE for software development, providing the necessary context for coders to write complex software. It's also the namesake to newsclip.se, which tries to do the same for journalism.

Applying this to journalism, we ended up building a simple story editor called newsclip.se which recognizes the people and companies that you are writing about, uses that data to search for records related to them in a wide array of online data sources. This way, journalists are encouraged to manage their notes and research on the platform, where it is easy to recall, expand and share them.

Obviously, I am quite excited about the resulting prototype, and I intend to continue developing it as part of the grano project. While it’s worth integrating this with further data sources, and experiment with the ways in which such a system would capture notes, the most interesting aspect to explore is probably a bit more subtle.

</a>

Newsclip.se provides contextual information about companies and people to journalists while they are researching and writing a story.

Most modern programming IDEs hook into a programming language’s so-called abstract syntax tree, a virtual representation of the elements of a program that is generated from its textual code. In that way, they can advise developers on next steps, and potential inconsistencies and errors in the resulting application.

What if journalism IDEs could do a similar thing? Rather than just managing notes, they could point out to journalists that they haven’t got enough evidence to make a given point, or that a certain person or company has not been investigated thoroughly enough? That a certain point is not relevant to the main point of the story?

This would probably require copious amounts of natural language processing, logical reasoning and semantic analysis - all technologies that are hardly mature enough to be used on complex topics. But it seems to me like there should also be plenty of ways of faking it - of generating context-aware inputs based on a partial of statistical understanding of the story which is being written.

In any case, it seems to me that replacing simple text processors like Microsoft Word with a tool that is more relevant to the challenge of managing the complexity that journalists are faced with is a very relevant project. The two-day hackday at Al Jazeera gives us a nice prototype to start collecting feedback with.

Thanks again to Philip, Bruno, Heinze, Kasia and Eva for forming such an awesome team!

A little tour of aleph, a data search tool for reporters
Over the past six months, I've been working for OCCRP to productise Aleph, a powerful search tool for investigative reporters. This is a little tour of it's key features, and a little view into the future development agenda.
A Poor Journalists's Text Mining Toolkit
How can journalists search and analyze collections of documents on their own computers with simple tools? At last weekend's DataHarvest, we ran a workshop trying to answer that question. This write-up to covers using Apache Tika for content extraction and regular expressions in Sublime Text as an advanced search tool.
Against Decentralization
In the free software/open web community, the notion that the web should be decentralized is more than a shared ideal, it is a piece of dogma. But are we really promoting a progressive vision of the web, or fighting a losing battle to avoid political engagement?
Keeping stock: investigative data warehouses
Data warehouses are used in industry to manage the many datasets accrued inside a company that might be relevant to reporting and analysis. I want to propose a similar pattern for investigative journalism.
SpenDB, a data analysis tool for government finance, looking for testers!
The first beta version of SpenDB features a small set of well-designed features for data import and analysis. Now the platform is ready to be adopted by anyone interested in exploring financial data, from budgets to procurement.
On Hacks/Hackers, Google and community building
A few weeks ago, the US team of Hacks/Hackers announced their plans to turn the network of journalism innovators into a collaboration with Google News Labs, starting with an event in Berlin. I tweeted about this, and Phillip Smith wrote a thoughtful reaction. Given this invitation to debate, I wanted to outline my criticism in more detail.
SpenDB, a light-weight tool for government financial data
Over the past few months, I have spent my weekends simplifying and modernizing the OpenSpending codebase to create SpenDB - a prototype-stage, light-weight data loading tool and analytical API for government financial data.
Who’s got dirt? - What if robots could do cross-border investigations?
If we want to make open data relevant to investigative journalism, we have to simplify the way people access it. We must create a way for our data tools to talk to each other and trade information about the companies and people we are researching.

What if journalists had story writing tools as powerful as those used by coders?

Other blog posts