Friedrich Lindenberg

pudo.org

Open Data Search: finding useful datasets, worldwide

Recently, there has hardly been a week in which there hasn't been an announcement of a new local, regional or national open data initiative - including ever more extensive catalogues of data that is being opened up (CKAN alone now runs in 20 or more places). While this is great news for those of us interested in re-using the data, it also means it becomes increasingly hard to keep a good overview of what kind of data are available for which places. To get a better overview we've now started a meta search engine for open data, opendatasearch.org.
opendatasearch.org is a global version of the prototype publicdata.eu site we announced in January: it's an aggregator for datasets, providing a simple and unified search interface to all of the catalogues contained. At the moment, this includes all known instances of the CKAN software, the Sunlight Foundation's National Data Catalog (and with it a large number of US-based data sources), the World Bank data catalogue, Sweden's DCat-enabled OpenGov.se and Nexedi's Data Publica portal. We've also put up search.ckan.net which provides access to the combined index of all CKANs only. Behind the scenes, opendatasearch.org is web spider with a twist: all collected data is converted to DCat, DERI/W3C's RDF-based ontology for dataset descriptions. While this convention is still in early development, it's interesting to see how well different kinds of catalogues can be expressed in it already (the harvested data can be found here). By harvesting a growing set of existing dataset descriptions, we hope to gather a comprehensive picture of the dataset properties that are widely used and that should be represented in a common format. Our goal with this is to establish some degree of interoperability between different data catalogues, leading into a federated catalogue architecture for Europe and perhaps beyond. These standardization concerns aside, we want to make opendatasearch.org useful on its own. For the immediate future this means adding support for more filter options, including licenses (and their compliance to open data principles), languages used in metadata and the data itself and geographic scopes of the collected information. This, of course, is an open source development effort and we'd glad to welcome those interested in contributing comments, catalogue data or functionality on the ckan-discuss mailing list!