Flint handaxe |
So, it being the Easter school holidays, we all went for a family outing to the Ashmolean Museum in Oxford. And within about two minutes (because I am a geek) I started spotting identifiers and thinking about how the physical objects in the museum are analogous to datasets.
Take for example the flint handaxe pictured above. It's obviously a thing in its own right, well defined and with clear boundaries. But in a cabinet full of other artifacts (even some other hand axes) how can you uniquely identify it? Well, you can stick a label next to it (the number 1) and then connect that local identifier to some metadata on display in the case:
Metadata for the flint handaxe (1.) |
That works, but it means that the positions of the artifacts are fixed in the case, so reorganising things risks disconnecting the object from its metadata. The number 1 is only a local identifier too - there were plenty of other cases in the gallery which all had something in there with the number 1 attached to it - so as a unique identifier it's not much good. And in this case, there were actually 2 handaxes identified with the number 1.
If you look closely at the surface of the handaxe, you'll see a number written on it in black ink 1955.439a This number (which I'm guessing is an accession number with the year the artifact was first put into the museum as the first part) is also repeated in small print at the end of the metadata blurb.
So, the moral from this example is that local identifiers are useful, but objects really do need unique identifiers which are present in both the dataset/artifact itself, and its corresponding metadata.
Sobek |
Sobek's identifier |
Sobek's metadata |
And it's also connected with his metadata.
A collection from an A-Group burial |
In this case we have a dataset that's a collection of other self-contained datasets. Each dataset/pot has its own individual value, but has greater value as part of the larger collection. These particular datasets were all found in the same location at the same time, so have a very definite connection - they were all grave good excavated from on grave in Farras, Sudan.
Close up of some of the grave goods |
And of course, each of the artifacts has its own id (sort of - the group of 7 semi-precious stones only has one id between them) as well as a local identifier to link it to its metadata.
Collection metadata and individual item metadata |
The collection itself has its own metadata too, which puts the individual items' metadata into context.
Non textual metadata |
Faience Shabtis |
Here we have a data collection that is joined by theme rather than by geographic location. These statues are all shabtis, but came from different places and were ingested into the museum at different times.
Faience shabti metadata (15.) |
With digital data we've got it easier in one way, in that the same dataset/shabti can be in multiple collections at the same time and displayed in lots of different ways in different places. The downside is that it can be hard to know exactly what dataset is being displayed where and is part of what collection. That's why the permanent, unique ids are so vital to keep track of things.
Granularity issue! Mosaic tiles |
Metadata for the mosaic tiles (49.) |
Because the dataset is in lots of pieces (files), none of which is uniquely identified, there is always the risk that a piece may become detached from its collection and lost/misidentified. Moving this particular dataset around the place could be quite problematic - but on the other hand, there's so many pieces that losing one or two in transit might not be too much of a problem. On issues of granularity, data repository managers, like museum curators, need to decide themselves how they're going to deal with their datasets/artifacts.
Silver ring, temporarily removed |
I think we worry about data a lot, because it's so hard to draw distinct lines around what is and what isn't a dataset. But honestly, there's such a wide variety of stuff in museums that all have identifiers and methods of curation that I really do think we need to worry less about how to turn a dataset into a standardised book, and think of them more as artifacts/things that come in all sorts of shapes and sizes.
Oh, and if you're in Oxford, do go check out the Ashmolean museum. It's great, and has lots more stuff than just the pieces I took photos of!