What are Derivatives? Why do I care?

Me? I like the Federal Agencies Digitization Guidelines Initiative's definition for derivatives:

Often called service, access, delivery, viewing, or output files, derivative files are by their nature secondary items, generally not considered to be permanent parts of an archival collection. To produce derivative files, organizations use the archival master file or the production master file as a data source and produce one or more derivatives, each optimized for a particular use. Typical uses (each of which may require a different optimization) include the provision of end-user access; high quality reproduction; or the creation of textual representations via OCR or voice recognition. In many cases, the derivatives intended to serve end-user access employ lossy compression, e.g., JPEG-formatted images, MP3-formatted sound recordings, or RealMedia-formatted video streams. The formats selected for derivative files may become obsolete in a relatively short time.


Or, to be a little shorter and less precise, derivatives are copies of files in different formats that serve both archival and display purposes. In Islandora they are usually created when the asset is ingested into the repository, but they serve several purposes. Derivitives are a pretty standard approach when creating archival packages of content, and disseminating them to the public (for example, derivatives are usually a part of AIP, SIP, and DIP packages).*

In Islandora, as in other archival systems, we make derivatives for a few reasons:

  1. We need smaller versions of a file, or versions that are easier to deliver in a browser (or multiple browsers), so that people don’t spend all day waiting for a gigantic A/V or image file to load.
  2. We need smaller versions of a file, because access to the original asset is restricted by licensing or some other policy.
  3. We need a file that runs more cleanly through an Optical character recognition (OCR) process. This one is big for digital humanists, and the derived file is often discarded once it’s been milked for the best text possible.
  4. We want to persist information longer, and do a good job -

This one might require some explanation - The digitization community is far from unified when it comes to best formats for archival files, and nobody has a crystal ball to determine what types of files will persist through time. Generating several derivatives can be a way of ensuring the longevity and integrity of the information stored by a digital object. You may want to store an ancient computer file, but unless you work with a repository that is committed to developing and updating emulation software, it might be hard to find a version of software that will read your old file, and tell you what it contains. In this case, you might want to store a potentially hardier .txt or .pdf versions of that file’s contents so its contents remain legible. In the end, the uncertainty of software formats, combined with the certain desire to archive over time, means that digital assets need to be revisited repeatedly, and new derivatives may need to be generated to keep assets accessible.

All Islandora solution packs generate one or more derivative datastreams, but content models can be written to accommodate a number of scenarios and policies, depending on your institution’s policies and needs.

* Don’t know about AIPs, SIPs and DIPs? I didn’t five years ago. They are part of the Open Archival Information System’s digital preservation standard. If you want to learn more about OAIS, (also adorably known as ISO 14721:2012) then go ahead and download the updated text version of the standard.