Mapping digitised collections: how … and why

Kevin Gosling, Chief Executive of Collections Trust, introduces a recently-published feasibility study into mapping and connecting digitised cultural collections, with a view to making them searchable across organisations and disciplines.

Earlier this year, DCMS commissioned Collections Trust and Knowledge Integration to consider, from a technical point of view, how the 2016 Culture White Paper ambition to ‘access particular collections in depth as well as search across all collections’ might be realised. The resulting study has just been published: Mapping digitised collections in England (though its conclusions could apply equally to Scotland, Wales and Northern Ireland).

We will explore how we might ‘search across all collections’ in future blogs. (Spoiler alert: we recommend a national ‘aggregator’, a data-sharing tool that brings any kind of collections data together, enhances it using AI tools, then makes it available for unlimited end uses.)

But first, let’s start with why we might want to do that. We believe the potential benefits would include:

  • Enabling content curation to reach new audiences. Shared collections data would be the curatorial resource behind limitless end-use scenarios that presented curated content to specific audiences.
  • Supporting dynamic collections management. Addressing this priority of the Mendoza Review would be a lot easier if those working collaboratively across the sector could routinely search across the collections data that is currently siloed within individual museums.
  • Strategic partnerships with the higher education sector, starting with online access for researchers to the records in collections databases.
  • Being part of international research and development, by providing the pipeline needed to connect UK collections data to European and global networks.
  • A strategic, cross-sector approach to gathering audience data, such as applying digital fingerprinting at aggregator level to allow the onward journeys of downloaded or shared assets to be tracked with greater precision than currently attempted outside the commercial sector.

The report considers these benefits in more detail, presented below in slightly abridged form.

Enabling content curation to reach new audiences

DCMS’ 2017 Culture is Digital report noted that:

‘Differences in data standards and openness mean that it is difficult to curate across collections and create new online exhibitions or content for audiences, and limiting the educational value of the digitised asset. Unless images are tagged in a certain way, content aggregators will not be able to gather the image when searched. This has implications for modern audiences who expect digital content to be easy to navigate and to be open for them to enjoy, contribute, participate and share.’

The data-sharing tool demonstrated in the feasibility study would be able to bring together the raw material for curated content no matter what data standards had been used to create it. It would also use various tools to enhance the source data, mitigating its tagging inconsistencies to improve discoverability.

It is important to stress that most of the public benefits likely to flow from such data-sharing would be indirect rather than direct. The data-sharing tool would not itself be a destination site for the wider public, but rather the resource behind limitless end-use scenarios that presented curated content to specific audiences.

Moreover, the proposed tool could streamline the current, labour-intensive workflows for publishing curated content, which are often the digital equivalent of hand-crafting illuminated manuscripts. For example, Culture is Digital rightly counts Google Arts and Culture among the success stories of digitised collections. But until recently the only way an institution could contribute content to it was by uploading a spreadsheet or entering information item by item. Over a seven-year period the well-resourced Metropolitan Museum of Art only managed to contribute 757 artworks to the platform this way. Then in 2018 the Met launched a new API and collaborated with Google Arts and Culture to increase that number to 205,000 artworks. The tool proposed in the feasibility study would have the critical mass to work with Google Arts and Culture in a similar way, providing a less laborious route for any content provider who wanted to scale up their presence on this, and similar, platforms.

Supporting dynamic collections management

One of the priorities identified by the 2017 Mendoza Review of museums in England was ‘dynamic collection curation and management’:

‘Dynamic collections curation and management are the fundamental point of museums – to protect and take care of the collections they hold, and to make them accessible to the public, not just physically, but meaningfully as well. This is not without its challenges … [such as] less available curatorial time and expertise, and the ongoing need for a sensible approach to both growing and rationalising collections. There are good examples of where sharing skills and infrastructure can help to overcome these issues …’

Addressing this priority would be a lot easier if those working collaboratively across the sector could routinely search across the collections data that is currently siloed within individual museums. For example, a ‘sensible approach to both growing and rationalising collections’ might allow curators to find out what else was in the country’s 1,700 other museums before deciding to acquire or dispose of an item.

The proposed aggregator could also provide a long-term home for the results of important, but short-lived, projects such as regional or subject-specific collections reviews. As well as national-level work to describe museum collections (such as Cornucopia), the past few decades have seen many such projects conducted at regional level (eg in the South West) or by Subject Specialist Networks. A retrospective ‘review of collections reviews’ that aimed to rescue and repurpose valuable data currently languishing in spreadsheets across the country would be very worthwhile.

Strategic partnerships with the higher education sector

It is more than ten years since the Research Information Network concluded:

‘What researchers need above all is online access to the records in museum and collection databases to be provided as quickly as possible, whatever the perceived imperfections or gaps in the records.’

The potential for cultural heritage institutions to work in partnership with the higher education sector is obvious for those that are themselves part of universities or, like some of the nationals, are recognized as independent research organisations. But it is also true of smaller institutions: between 2016-18 the ACE-funded Museum-University Partnership Project ‘demonstrated how the higher education sector can be opened up to smaller and medium sized museums whose unique collections and engagement expertise are often an underutilised resource,’ and also published useful research into the potential of data aggregation within models of digital networking between museums and universities.

Being part of international research and development

Larger and more technically-sophisticated cultural heritage institutions are able to participate in advanced research and development projects through their own efforts. The Natural History Museum, for example, is aggregating its collections data through the Global Biodiversity Information Facility (GBIF) and has introduced tools to allow the scientific community to cite more accurately its use of the museum’s dynamic datasets. And the British Library is collaborating with the Turing Institute on the £9.2 million Living with machines project, which will digitise millions of pages from newspapers published during and immediately following the industrial revolution, combine this data with other sources (such as geospatial data and census records) and develop new AI tools to unearth patterns in the data.

As the national library, the British Library also collaborates directly with Europeana, the continent’s main aggregator of digital cultural heritage and its most extensive research and development ecosystem in this field. The default model for smaller institutions is that Europeana harvests their data from a network of national and subject-specific aggregators. Most UK museums wanting to join the Europeana ecosystem will need to do so through a national aggregator. However, we don’t currently have one. The legacy Culture Grid has been the UK pipeline to Europeana in recent years, but stopped receiving new content in 2015 after its funding was discontinued.

Europeana is a founding partner of the ambitious European Time Machine FET project, a cutting-edge AI collaboration that is currently in the planning stage. While it is open to individual institutions to become members of the new Time Machine organisation, data aggregated through Europeana (for free) will be available to the project without the contributing institutions having to do anything else.

A strategic, cross-sector approach to gathering audience data

While individual institutions may have analytics systems in place to monitor the traffic to and within their online collections, and even to track citations of individual digital assets, going beyond that will not be easy. The Audience Agency suggests that the question of who uses digitised collections, and how, might usefully be included within the remit of the ambitious cross-sector Culture Finder framework it is currently scoping. That would allow the use of aggregated digital collections to be tracked in a GDPR-compliant way on behalf of all participating institutions, and the results interpreted within the overall context of users’ online interactions with all forms of culture. Digital fingerprinting technologies could be applied at the aggregator level to allow the onward journeys of downloaded or shared assets to be tracked with greater precision than is currently attempted outside the commercial sector.

In future blogs, we will summarise some of the key ideas, proposals and results of the Mapping digitised collections report, but you can read the whole thing (along with a scoping report) by following this link.