Kevin Bacon, Digital Development Officer for the Royal Pavilion & Museums, Brighton & Hove, takes an in-depth look at metadata in context of the upgrade of their digital asset management system. The work is taking place as the museums consider moving to charitable trust and in response to changes in the way they do business.
When reviewing the Royal Pavilion & Museums’ (RPM) digital asset management system last year, one option I considered was to not use a DAM at all. This was not merely a case of questioning my assumptions. Digital asset management systems are expensive: even after the initial outlay for the set up and licence, ongoing support and hosting fees cost us several thousand pounds per year. Why should we not simply use a shared drive?
It may present a clunkier solution that relies on third party tools, but it is possible to manage digital assets with a shared drive. A friendly IT department could apply a sophisticated permission structure so that assets cannot be deleted or edited by accident. The image sizing and conversion processes offered by our DAM can be replicated by using free imaging editing software like Paint.NET or Gimp. With good file naming conventions, collection images can be made searchable through Windows: at RPM, our collection images all have the accession number embedded in the filename, so stray images can easily be matched up to object records. A shared drive could even offer a more robust means of preservation than relying on a commercial agreement with a third party company that could go insolvent at any moment.
But there is one thing that a shared drive cannot handle: user defined metadata. For me, this is the single biggest reason to invest in a DAM, even if it’s not the easiest thing to explain to anyone else.
I’ve produced many a vacant stare and pained look over the years by talking about metadata to my colleagues. This isn’t surprising, as it’s a word that sounds almost wilfully obscure. Even the commonplace definition of metadata – that it’s ‘data about data’ – seems unhelpfully cryptic.
I find it most helpful to describe metadata as a sophisticated form of labelling. Labelling is an everyday activity most of us understand. It’s also easy to demonstrate why it is useful. If you made some pasta sauce and decided to freeze it, a label would help confirm what it is, and how long it will remain edible. Without this, you may find the container in three months’ time and have no recollection what it is, and whether it would still be good to eat. Metadata does much the same thing: it describes a data set so that a person or piece of software knows what it is and how it can be used.
The importance of labelling is not simply that it describes what an object is; it adds value to the object. The frozen pasta sauce is more likely to be eaten if someone knows what it is and that it’s still good to eat. The label helps ensure that the pasta sauce is eaten and not thrown away – in other words, it enables the value of the sauce to be realised. Similarly, the importance of metadata is that it adds value to data. When it comes to digital assets, we don’t want the equivalent of an industrial sized freezer full of pasta sauce that no one will risk eating.
If we think of metadata as something that makes our digital assets more valuable, then it becomes much easier to build a case for using a digital asset management system.
Deciding what metadata adds value to our digital assets is more tricky. To go back to the pasta sauce, a person living on their own would probably label the container with two things: what it is, and when it was made. If they were living with their family, or as part of a shared house, they may label it with other information: the ingredients it contains, who it belongs to, perhaps whether it was made for someone specific, like a young child.
The crucial point here is that the labelling is determined by the purpose of the object and the context of its use. A museum is a much more complex organisation than a shared house or a family, and while there are only a limited number of ways in which you can eat pasta sauce, digital assets can often be used for a variety of purposes. At RPM, images of our collections currently make up the bulk of our stored digital assets, but photographs and other digital assets are now regularly being created by our marketing, learning and conservation teams, among many others. These assets are frequently created with very different purposes and audiences, yet they can often support areas of work beyond their initial purpose: for example, the lead image we used to use for promoting our Ancient Egypt galleries was originally taken for conservation purposes.
Metadata and value chains
Before looking at the metadata structure we are using at RPM, I need to be clear about what sort of value we’re talking about. When we first developed our DAM, one of our key motivations was to sell images. Now we have moved away from that model, we can no longer assume these assets have a directly monetisable value. As the Collection Trust’s excellent Striking the Balance report demonstrates, providing open access to digitised collection data and images still brings considerable benefits to a museum, and although it’s harder to measure and express that value when it doesn’t have a price point attached, it is an important part of our public offer. But our digital assets are often valuable even when they are not directly used for public engagement; a photo of conservation work on an oil painting may never be seen outside of RPM, but it is still a valuable part of our work in collection care.
I think that the clearest way of expressing the value of digital assets is to think of them as being part of a ‘value chain’. This is an idea I’ve crudely cribbed from business theory, which looks at how all the individual processes of a business create value which is eventually sold on to consumers. For me, this is a really helpful concept for making the case for using a DAM and ensuring it is widely used across RPM. As digital assets are increasingly used across the organisation, any system that can add value to those various processes should become a key part of our infrastructure.
So how does this ambition translate into a metadata schema that adds value to our work? Having spoken to colleagues, and spent some time working through the affordances of our present DAM, I have come to the conclusion that our metadata needs to be able to record three broadly different sets of information. That may seem like an unnecessary complication, but essentially we need metadata that describes what the asset is; what it can mean; and who can access it. These elements are quite different to each other in terms of the needs they address, and where the data comes from, so I find it helpful to use some form of structure to define them. Since I’m so keen on labelling at the moment, I have termed these attributes, contexts and users.
- Attributes: this is information which describes the technical state of the asset, the circumstances and purpose of its production, and a brief summary of what the asset represents. These attributes are unlikely to change over time, and often map to recognised standards.
- Contexts: this is information which makes the asset meaningful by linking it to wider intellectual contexts. These contexts may relate to public access by linking to popular concepts (eg. ‘fine art’, ‘Victorians’) but they may also correspond to current staff activity, such as work projects. These contexts will inevitably change over time.
- Users: this is information about the groups of people who can access the assets and whether they can view or download it, and whether they can edit the information. These groups may include staff, privileged external groups (eg. press, community collaborators) or the public.
In some respects, attributes are the easiest set of metadata to work with, as there is much more supporting advice, and existing standards that can be adopted. Spectrum DAM has proven enormously useful, particularly in terms of thinking about how our DAM can be integrated into collection management processes. In our new implementation each asset can now be linked to the 21 collection management processes identified by Spectrum.
I have also used the IPTC specification to ensure that some of the descriptive information can be embedded into the file in a recognisable way. IPTC is a popular international standard for image metadata, and although it was originally developed for news media use, its metadata schema has expanded to cover broader fields of activity. The rights information, ownership and caption are now embedded into each asset on download. For collection items, this also includes cataloguing elements that map to both Spectrum units of information and IPTC Extension Artwork (although our installation of Asset Bank, our DAM, does not yet permit this as embedded metadata).
Indeed, the use of embedded metadata has been one of the most immediate benefits of this new implementation. Although some web services, particularly some social media platforms, strip out this data, many re-use it. Flickr is an obvious one, but it is particularly useful in our WordPress based website which populates its media library metadata from embedded file data.
I won’t go into a detailed breakdown of the schema here, but broadly speaking, the attribute data we collect falls into the following areas:
- Content information eg. caption, description
- File information eg. filename, pixel size, bitrate, orientation, file format
- Rights info eg. credit line, licence type, additional usage terms
- Object or Artwork info eg. creator, creation date, source
- Contact information eg. source, website URL
- Administrative info eg. asset creation date, which staff team created it, purpose of creation, expiry date of the asset
- Usage info eg. last download date, last modification date, audit trail of use
This may seem like an impractically large amount of metadata to apply to an asset, but in practice some of this is automatically generated by our DAM or collected from the file itself; other elements can be imported from our collection management system; and others can be created as part of bulk updates.
The attributes I’ve outlined above are relatively stable: we generally hope that the creator of one of our works is unlikely to change. The contextual information is much more slippery as it naturally changes over time, and needs to be rooted in concepts and language that people can understand.
For me, this contextual information is what will really unlock the value of these digital assets. If we can use our DAM to understand how our digital assets are being used, and how they can be practically re-used, they will become valuable to not only RPM, but also our collaborative partners, and the public.
This is very much an area we are still feeling our way through, but for now this breaks down into two loose areas.
- Business use eg. current and past projects, staff teams.
- Intellectual contexts eg. National Curriculum subjects, local information.
The need to incorporate the business use of our assets actually predates work on revising our DAM, as I was often asked if I could set up a ‘folder’ where various teams could store and share large digital media files. We are still in the early stages of training RPM staff in the new implementation of our DAM, but the initial response has been positive.
The use of wider intellectual contexts is much more complex, because there is a whole world of controlled vocabularies and other standards that can be adopted. As a regional museum service with a wide range of collections, National Curriculum subjects are an ideal place to start, particularly those subjects where we know local schools are interested in using our resources. We experimented with this approach last year with our online collections. Much more work needs to be done here, but enabling a time-pressed teacher to search for Victorians rather than an advance search for everything from our collections that dates between 1837 and 1901 feels like a step forward.
In a sense, people can’t really be part of a metadata structure at all, as they are obviously quite separate from digital assets. But there is a need to ensure that we can control who can access these assets, and what they can do with them. Within our current DAM this is achieved by grouping the assets into Access Levels, which establish a series of permissions relating to Groups of users. At present, this broadly breaks down into the following groups:
- Public access to both view and download specific assets
- Staff access to download
- Staff access to download and upload
- Press access to designated assets
While these three sets of permissions cover most of our needs, this is an area that is becoming more complex. We are looking at using this more with external contractors, such as freelance designers, and collaborators outside of RPM. Whether we will be successful in this is yet to be seen. The key challenge, I suspect, is that tools like Dropbox have become so familiar that our DAM may feel like an unnecessarily alien system to learn. It works reasonably well with the supply of images to registered press users, but it does require us to package the assets so they meet an immediate need (eg. views of the Royal Pavilion). While the People set of metadata may provide the appropriate permission structure, it’s the Context information that will probably determine whether this can become successful.
What does this all look like?
Rather than explaining our metadata schema in detail, a PDF snapshot of the DAM record for a digital photograph of one of our paintings can be downloaded by clicking the link on this page.
Are we getting this right?
We are still in the early days of working through this new implementation, and although I have based this metadata schema on known standards where possible, some of this is admittedly bespoke to RPM and ad hoc. These thoughts on metadata are very much my own, and I’d appreciate any corrections to my understanding. I also sincerely hope this has not put you off freezing pasta sauce.