World-wide privatizations collected by the World Bank

18 December 2011

The World Bank [has collected](http://rru.worldbank.org/Privatization/) a database of major (i.e. at least one million USD) privatizations in developing countries from 1988 to 2008. The dataset - which is classified by country, sector and privatization type - is a fascinating view into the development of countries such as China, Russia and many states in South America. I've [imported this dataset into OpenSpending](http://openspending.org/wb-privatizations) to be able to generate custom aggregates. This is of course risky, as the list is unlikely to be comprehensive and does not include smaller privatizations on a local level, which are likely to make up the bulk of the financial volume. Some interesting views include the breakdown [by sector](http://openspending.org/wb-privatizations/sector) and [by nation](http://openspending.org/wb-privatizations/from). But the import into OpenSpending is problematic in other ways: the dataset does not contain information on who bought the privatized entities and therefore isn't a generic spending dataset. I've therefore decided to model it in a reverse mode. The individual entries do not represent financial transactions but the transferred assets - the source is given as the source country, while the recipient is the newly formed company. This may be an interesting preview into the problems OpenSpending will face as it begins to include balance sheet information.

A little tour of aleph, a data search tool for reporters
Over the past six months, I've been working for OCCRP to productise Aleph, a powerful search tool for investigative reporters. This is a little tour of it's key features, and a little view into the future development agenda.
A Poor Journalists's Text Mining Toolkit
How can journalists search and analyze collections of documents on their own computers with simple tools? At last weekend's DataHarvest, we ran a workshop trying to answer that question. This write-up to covers using Apache Tika for content extraction and regular expressions in Sublime Text as an advanced search tool.
Against Decentralization
In the free software/open web community, the notion that the web should be decentralized is more than a shared ideal, it is a piece of dogma. But are we really promoting a progressive vision of the web, or fighting a losing battle to avoid political engagement?
Keeping stock: investigative data warehouses
Data warehouses are used in industry to manage the many datasets accrued inside a company that might be relevant to reporting and analysis. I want to propose a similar pattern for investigative journalism.
SpenDB, a data analysis tool for government finance, looking for testers!
The first beta version of SpenDB features a small set of well-designed features for data import and analysis. Now the platform is ready to be adopted by anyone interested in exploring financial data, from budgets to procurement.
On Hacks/Hackers, Google and community building
A few weeks ago, the US team of Hacks/Hackers announced their plans to turn the network of journalism innovators into a collaboration with Google News Labs, starting with an event in Berlin. I tweeted about this, and Phillip Smith wrote a thoughtful reaction. Given this invitation to debate, I wanted to outline my criticism in more detail.
SpenDB, a light-weight tool for government financial data
Over the past few months, I have spent my weekends simplifying and modernizing the OpenSpending codebase to create SpenDB - a prototype-stage, light-weight data loading tool and analytical API for government financial data.
Who’s got dirt? - What if robots could do cross-border investigations?
If we want to make open data relevant to investigative journalism, we have to simplify the way people access it. We must create a way for our data tools to talk to each other and trade information about the companies and people we are researching.

World-wide privatizations collected by the World Bank

Other blog posts