Digital Research Tools for Investigative Reporters

What tools can investigative journalists use to enhance their digital research skills? This page accompanies an ICFJ webinar which aims to answer that question.

While the webinar will only explore a few tools, this page aims to give more context, and link to many resources that I've found useful while working with (investigative) reporters. I've tried to arrange them by the type of scenario in which you'd use them: trying to solve a specific data problem, or while looking for data on a particular topic.

Tools for documents

Most information that investigative journalists handle comes in the form of text documents, like Word files, PDFs or scanned images.

  • Storing and searching sets documents can be done with DocumentCloud, which is more appropriate than e.g. Dropbox.
  • Getting text and tables out of PDFs is an ugly process, but Tabula, CometDocs ($) and ABBYY FineReader ($) make it possible.
  • Exploring large sets of documents is done with tools like Overview, Jigsaw and Nuix ($).
  • Crowd-sourcing the analysis of documents has worked for some topics, with tools such as CrowData and transcribable.

Tools for data in tables

Tools for data on the internet

  • Scraping data from the web means extracting data from web sites. The easiest way is Google Spreadsheets (tutorial), browser plugins like Scraper and TableTools2.
  • Advanced scraping for more complex web pages is possible using, Kimono and OutWit Hub ($).
  • Sharing files on the web can be done with SpiderOak and tarsnap. Avoid Dropbox and iCloud for security reasons.
  • Whenever you work on the web, consider your digital security. Study Security in a Box to learn about tools that can help to protect your identity and data.

Investigating people and companies

Information about specific topics

There are many public listings of datasets, such as Awesome Public Datasets and the DataHub. Much of this data requires specialized processing, though, so investigatives will have to join forces with a technologist.

Connect with others

  • School of Data is an online community for learning about using data for journalism and advocacy.
  • NICAR-L is the mailing list of data journalists in the US, which carries lots of useful advice.
  • The European Journalism Center and Open Knowledge offer a data-driven journalism mailing list for journalists across the globe.

More resources