Going meta on metadata

In this week’s Organizing and Access to Information class I fear I steered the conversation into a detour on the subject of metadata. The prof’s introduction to the concept began straightforwardly enough — metadata is data about data, e.g. author, title, publication date, etc. But then as his examples got more complex he began to call things metadata that I would have considered part of the data itself. I don’t have his slides in front of me but I think it started when he put chapter 1, page 1 of Pride and Prejudice on the screen and said that its structure — chapters, paragraphs, etc. — was also metadata. That surprised me, and we spent probably far too much class time working on why. (I feel guilty but not too guilty — I kept offering to drop it, but other people had questions and comments, too.)

It boiled down to this: in my naive interpretation, metadata is information that applies to the “information object” as a whole, or is extrinsic to it in some way. I’d call anything integrated with the meat of the object “data”, not “metadata”. That includes structure and layout information — the information represented in typography and layout on a printed page, or in ordinary inline markup on the web, etc. Of course we can abstract structure away from presentation but that doesn’t mean that the structure is no longer part of the work. Where Jane Austen chose to put her paragraph breaks is as much a part of the novel as the words she chose to put inside them. There were some interesting examples presented in class, such as whether the abbreviation and typography conventions used to identify the parts of speech in a dictionary are metadata. I argued that whether represented through an italic n. or an XML <part-of-speech> entity, the part of speech is an integral part of the content. Or in multimedia terms, the string of bits representing an audio or video stream is the data and it doesn’t make sense to speak of the volume level or amount of cowbell as being a separate thing called metadata.

The prof worked hard to explain his model and I was probably just being thickheaded not to get it. On reflection I see that the concepts of “data” and “metadata” are conventions and where to draw the line is a matter of utility; if it is helpful in a certain setting to call internal markup (or its pre-digital equivalents of layout and typography) “metadata” then so be it. Also, there are many times when it’s good for information to live in both places: you can hear the cowbell in the audio stream, and you may also want an access point in the catalog of your music library which says “Cowbell: track 6890691, timepoint 1:37″.

But I don’t think I’m alone in my naive model. The very next day’s reading assignment was chapter 3 of Erik Ray’s Learning XML, where he defines the term: “Metadata is information about the document that is not part of the flow.” What I’m wondering now is whether these diverging definitions have any consequences beyond occasional confusion.

3 thoughts on “Going meta on metadata

  1. It’s a difficult concept to get agreement on. There’s such a thing as implicit metadata–for example, a document title–that blurs these lines. There are also many definitions of metadata, some that pertain to structure rather than semantics; DTDs are often considered metadata, for example. I don’t think you’re naive in your interpretation; it really is confusing and unclear.

  2. In Media Studies (and other domains like Rhetoric) people sometimes draw a distinction between form and content. I suppose the consensus view is that its impossible to draw a fine line between form and content in any given text, but it can be useful to talk about formal aspects of a text – like punctuation and line breaks in written texts or menus and HUDs in video games – in order to understand how a text is structured. The formal aspects indicate to the reader how to consume a text, and, in that sense, its data about the data. I would contend, tho, that trying to collect and organize this kind of metadata in isolation from the text would be a difficult, if not quixotic task.

  3. I agree with you. I’d make a distinction between metadata and formatting. But it’s a fine line. They both give more meaning to the content itself without actually “being” the content.

Leave a Reply