Friedrich Lindenberg

Notes on Data Catalogue Federation

Over the past six months, I've been involved in the LOD2 projects use case for open government data, an effort to prototype a data catalogue federation platform for data from within the European Union. On May 3/4, OKF will be running a workshop on the same topic in Edinburgh. As I won't be able to attend it, here are some notes on requirements and technical alternatives (perhaps as a "scene setter") for the meeting. Purpose, Objectives The interest in exchanging catalogue metadata can be explained through various use cases, some of which include: These use cases motivate the exchange of metadata, in order to allow widespread re-use of metadata, make specific capacities of different catalogues available to each other and  to guarantee up-to-date information in data catalogues. The following is a somewhat random list of issues that need addressing for such exchange and federation to yield useful results. Scope and basic concepts of catalogues As with almost any other technology, various people expect and implement data catalogues to do many different and often mutually exclusive things. Any kind of exchange mechanism will have to bridge at least some of these gaps: Metadata Formats JSON, HTML, XML/OKFN, XML/GMC, XML/DC, RDF/DC, RDF/DCat, MARC Exchange and Harvesting Mechanisms Push or pull? OAI-PMH, CSW et al., RSS/Atom, RDFa, SDMX, DVCS (Git, Mercurial, Bazaar) SPARQL or specific (RESTish or RPC-type/SOAP) interfaces Best choice at the moment is possibly the Atom Publishing Protocol as it is widely understood, implemented, tested. Distribution of Changes Given both an exchange format and a mechanism for harvesting or pushing metadata, the possibility to merge divergent metadata must be created. I also think its helpful to tackle the question of metadata provenance in this context, rather than as an isolated and theoretical concept. This involves: Alignment of Metadata Once a basic architecture for federation is available, more effort can be invested into the creation of common metadata contents. Challenges here include: