Linked Data Horizon Scan

Table of Contents
There are 44 comments in this document

Executive Summary (5)
Introduction (5)
The Semantic Web (3)
Linked Data (5)
Tim Berners-Lee’s Linked Data Principles (7)
SWEO Linking Open Data Community Project (1)
‘Linked’ Data and ‘Open’ Data (3)
Examples of Success (4)
Consumption and Contribution (1)
The Higher Education Experience (2)
Recommendations for Future Work (2)
Web Identifiers (2)
Data Publishing (2)
Supporting Measures (2)
Acknowledgements (0)


This Linked Data Horizon Scan was commissioned from Paul Miller of the Cloud of Data by the Joint Information Systems Committee (JISC). The work was intended to provide an overview of current developments with respect to Linked Data, and to make a series of recommendations to JISC and the wider community.

The final report has been made available via JISCPress in order to facilitate comment and discussion around the various topics covered. The report is also available for download and printing as a PDF.

This work was commissioned, overseen and funded by the Joint Information Systems Committee (JISC).
Declaration of Interest

UK software company, Talis, offers products to the education market that utilise many of the techniques and approaches discussed in this report.

The author is a shareholder and former employee of the company. Every effort has been made to avoid bias and conflict of interest in the preparation of this report.

This work is licensed under the Creative Commons Attribution 2.0 UK: England & Wales Licence. Any reuse should attribute both the work’s author (Paul Miller, Cloud of Data) and funder (Joint Information Systems Committee,) with a link to the JISC website at

To view a copy of this licence, please visit or send a letter to Creative Commons, 171 Second Street, Suite 300, San Francisco, California 94105, USA.

A wealth of valuable data is collected and stored in the systems of complex organisations such as our universities, frequently underutilised for a multitude of reasons from institutional inertia to technological complexity. As budgets contract and competitive pressures increase, the timely and effective exploitation of data is becoming an increasingly important characteristic of the successful organisation; and universities are no exception. From efficiently transparent reporting to data-driven […]

The UK Government’s 7 December publication of Putting the frontline first,[1] and the 21 January unveiling of the site[2], mark the latest in a series of significant endorsements for the concept of Linked Data, to which the Prime Minister looks in ‘radically opening up publicly held data to promote transparency;’ “we will aim for the majority of government-published information to be reusable, linked data by June 2011; and we will establish a common licence to reuse data whi […]

As Matthews notes in his 2005 report, the broad vision of the Semantic Web was essentially laid out for public consumption in a seminal article for Scientific American in 2001[1]. Since then, development of the pieces comprising the Semantic Web ‘layered architecture[2]’ has continued apace.

The Semantic Web Layer Cake (3)
Core components such as RDF[4] were formalised and released as W3C Recommendations […]

The concept of Linked Data has been embraced by a particular set of the Semantic Web’s enthusiasts and by a growing cohort of potential beneficiaries, predominantly those active in research, media or government. From modest beginnings, Richard Cyganiak’s Linking Open Data Cloud diagram[1] now represents over 13 billion[2] RDF statements from across a growing network of participating sites. This diagram only scratches the surface, in all likelihood missing a number of poorly publicised re […]

As web inventor and W3C Director Sir Tim Berners-Lee notes in his Design Issues for Linked Data[1], “The Semantic Web isn’t just about putting data on the web. It is about making links, so that a person or machine can explore the web of data. With linked data, when you have some of it, you can find other, related, data.” (Linked Data – Design Issues) This straightforward realisation is expounded in a set of four deceptively simple ‘rules’ or (as Berners-Lee prefers) � […]

The Semantic Web Education and Outreach[1] (SWEO) Interest Group of the World Wide Web Consortium (W3C) was formed in 2006 to; “develop strategies and materials to increase awareness among the Web community of the need and benefit for the Semantic Web, and educate the Web community regarding related solutions and technologies.” (SWEO Charter[2]) Concluded in 2008, the Interest Group was responsible for a range of activities including the development of a business case paper[3], the creatio […]

There is some confusion evident in the way that the terms ‘Linked Data,’ ‘Open Data,’ and ‘Linked Open Data’ are used, often almost interchangeably. SWEO’s ‘Linking Open Data’ project did much to exacerbate this trend, as it grew beyond its original scope to embrace data that were not technically ‘Open.’ For clarity, ‘Linked Data’ should normally be presumed to respect Berners-Lee’s four rules[1]. ‘Open Data’ is harder to pin down with precision, but could usef […]

The early examples of publishing Linked Data tended to be undertaken as experiments, or as part of the work of academics researching the Semantic Web. This work was valuable, and taught the community much about the issues that would need to be overcome. More recently, large organisations have recognised the potential value of Linked Data, and they have begun to publish their own content in this way. BBC

Linked Data may be consumed from elsewhere to enrich an application, or contributed to the pool for use by others. The norm is, of course, to both consume data provided by others and to contribute your own back to the Commons, but this is certainly not required. A number of commercial organisations consume Linked Data from others using tools such as Open Calais, without giving anything back. It seems likely that the balance will shift as trust increases, but publication of some compelling cas […]

In consulting with the Higher Education community, it is clear that understanding of Linked Data and its implications is not currently widespread. It is worth noting, though, that the techniques and experiences described in this report may well prove to underpin the most cost-effective and sustainable responses to external trends toward transparency and data sharing, such as those implied by Higher Ambitions[1]. If the sector is to deliver more robust information about opportunities, outcomes an […]

The growth of Linked Data has been rapid, and early adopters such as the BBC and Thomson Reuters are convinced of the benefits they are seeing. As the scope and scale of these mission-critical applications begin to outstrip the research prototypes that make up much of the core Linked Data community, additional issues such as scalability, provenance, rights and trust become increasingly important. Progress is being made in each area, with commercialisation of increasingly scalable RDF databases[1 […]

Tim Berners-Lee’s Linked Data rules call explicitly for the use of HTTP URIs in naming resources. Although good at creating various schemes of identifiers (such as the JACS codes used to identify courses), the Higher Education sector appears less good at making those identifiers available for effective use over the web. By exposing existing schemes of identification for institutions, subjects, courses, resources, people and more, myriad opportunities are created for identifying related cont […]

As well as making commonly used identifiers available for easy use and re-use by means of HTTP URIs, there is clearly value in following the examples of both the Linking Open Data Community Project and the UK Government in identifying commonly used data sets that the community might benefit from seeing made directly available for use and reuse as RDF. Ready access to rich data from beyond the institution should serve to reduce costs when implementing new systems that require these data, minimise […]

Web-scale data services such as DBpedia, Freebase, Open Calais and others have much to offer in terms of solutions to constructing and scaling core pieces of data infrastructure. These services have also established a strong lead in assigning and maintaining persistent web URIs that the community might usefully seek to reuse, instead of inventing new ones. Equally, universities might take more control of the way in which they are represented by services outside the sector, contributing identifie […]

I spoke with a number of individuals during the preparation of this report, including those listed below. Many thanks to all those who willingly contributed time to share their perspectives. Any errors, omissions or misrepresentations are, of course, my own. Phil Barker, Heriott-Watt University Mark Birbeck, webBackplane Rachel Bruce, JISC Lorna Campbell, CETIS Les Carr, University of Southampton Ken Chad, Ken Chad Consulting Chris Clarke, Talis Keith Cole, Mimas Ad […]