Monday, December 13, 2010

Metadata vs. ontology

It's been rather a long hiatus in blogging terms, and an even longer one from the issues that I professed to form the core of this blog. So I'm returning again to DAM, and I wanted to come back in with a topic that's been cropping up again and again throughout my course: information integration and interoperability. This becomes an issue when you want to be able to search across domains (or even institutions within the same domain), since each has their respective metadata structures and terminological systems.

Why is this important? Essentially, research patterns these days are increasingly digital, remote (i.e. web-based) and cross-disciplinary. Online content is predominantly public-facing and we find now that 'engagement' and 'discovery' are as important as traditional focused research goals when it comes to making information accessible. It's important that information can be found via different locations and pathways, so it needs to be linked. If anyone's used WorldCat, then they've experienced some of the potential of this simple idea.

So how to do it? Something like WorldCat is pretty straightforward - it's lots and lots of metadata, harvested and converted to WorldCat (MARC) format. Metadata for libraries has a long history (MARC goes back to the 1960s) and works well for describing the traditional book format, but it does have difficulty accommodating new media and other types of objects. The metadata solution to this problem has been Dublin Core, a 'lowest common denominator' set of terms that can theoretically describe anything in only 15 base text fields, prioritising interoperability above domain-specific detail.

The problems with this solution have been several. Fundamentally, the trade-off between more complex metadata descriptions for interoperability isn't alway acceptable to specific communities. As such, efforts have been made to add additional fields to Dublin Core for greater detail, but this just returns us again to the problem of interoperability - it's another data set that doesn't work outside of your domain. A metadata system, which is a based on terminology, also doesn't allow for interrelations between objects, beyond the applied terms they have in common.

Enter, then, the ontology. Briefly, for those who might not know what an ontology is, or perhaps think it sounds vaguely Kantian (as I did until fairly recently), it can probably most easily be described as a formal logic (artificial intelligence, if you like) for a computer system, which can only 'know' what you tell it to know and how to know it.

Unlike metadata, which uses simplified terminological structures written by humans for human consumption, an ontology provides a formal system for data integration that can be far more complex. At its best, an ontology can recover the context and concepts behind the simplifications of terminological systems, focusing instead on an object itself and how it relates to other objects. It's further advantage is that by representing objects through formal relationships, they are freed from the constraints of domain-specific metadata, allowing for a top-level ontology to search across multiple domains.

It would seem that such a system might be able to better serve the 'engagement' and 'discovery' side of a user's needs, never mind providing a powerful research tool. But it's not as simple as my title makes out. Can a digital object truly 'exist' without metadata? It couldn't be found without it, so metadata remains the prime building block in this search for interoperability and information integration - everything else is built on top of that. A core metadata structure my not be the solution to these challenges on its own, but it needs to retain a strong presence. Add to that the reluctance of many in the cultural sector to commit to the time and effort required to master the ways of ontologies and it would seem that the just is still out.

No comments:

Post a Comment