Bigging up AI

AI technologies have great potential to help us discover meaning in museum collections, says Kevin Gosling, but they need to be trained using big data at a scale that’s currently hard for most institutions to deliver.

Artificial intelligence (AI) is a broad term which (strictly speaking) is used to describe systems and technologies that can essentially ‘self-learn and correct’ without human intervention. The definition, however, is often broadened to include the application of technologies that can be algorithmically ‘trained’ to recognise patterns in data with a little help from human intervention.

In both cases, the key requirement is that the AI system has access to a representative ‘training corpus’ of material (such as words and/or images) for its initial programming. For both text and image-based techniques, this works best when the training corpus contains large numbers of similar things. As a pre-digital analogy, if you were trying to learn about a topic from reference books, you would do better at the British Library than rifling through your own bookshelves at home.

Speaking of the BL, it is currently embarked on a cutting-edge AI research project, Living with machines. For this £9.2 million, five-year collaboration with the Turing Institute, the Library will digitise millions of pages from newspapers, including regional titles, published during and immediately following the industrial revolution. A multidisciplinary team of scientists will combine this data with other sources (such as geospatial data and census records) to develop new tools, including machine-learning algorithms, capable of unearthing patterns in the data to yield new insights into that period.

Smaller institutions will need to pool their digitised collections before they and the wider research community can apply AI technologies to them at anything like this scale. A typical art collection, for example, might have one or two oil paintings that include a particular historic fashion item. But if you bring together images of almost every oil painting in public ownership, as Art UK has done, you have the raw material to pick out a training corpus large enough for teaching an AI tool to recognise that fashion item in any painting. (Indeed, Art UK has already successfully collaborated with Oxford University’s Visual Geometry Group to train image-recognition software to complement the work of human ‘taggers’.)

But teaching an AI system to play ‘snap’ using a training corpus is just the start. To pursue the fashion example further, if the AI tool had not only been trained to recognise the fashion item in any digitised painting, but could also access data about when and where the artwork was painted, it could track the fashion across time and place. Similarly, art historians in the Netherlands have been deep-mapping the production and consumption of painting in Rembrandt’s time and neighbourhood. Information extracted from archival documents is layered on top of historical maps in the Amsterdam Time Machine.

The potential of AI to help us enhance and make sense of our digitised collections is huge. But first, we need to apply some human intelligence to the problem of turning the ‘small data’ siloed in our 1,700 museums into ‘big data’.