Monday, December 20, 2010

"Culturomics"

Various research disciplines have been trending towards 'big data' in recent years. This has, of course, been most evident in the sciences, but I've enjoyed reading a recent publication by a team at Harvard including, amongst others, the psycholinguist Steven Pinker: someone has finally gone wild with Google Books. A summary of their findings can be found here.

The project has taken about 5 million of the digitised volumes and reduced them to UTF-8 encoding to create 'n-grams' - a sequence of letters without spaces from which a computer can derive words, or compounds up to 5 words long in this instance ('1-gram' = 1 word). This mechanism actually came about to avoid copyright concerns - no one can actually 'read' the books; they're dealing with data, not information per se - and the simplicity of the resulting database made me wonder about the level of metadata in this mix.

One of the key questions we've been tasked with evaluating at the end of our first semester of digital asset management is whether, broadly speaking, detailed metadata is really necessary, or if fully-searchable text will do - essentially whether Google fully takes care of our searching needs, making all other methods redundant. This 'culturomics' project raises two issues: the possible advent of a new phase in the digital humanities, and a great case study for a Google research model.

Staying with the metadata question for the time being, the apparent simplicity of their n-gram UTF-8 stream is necessarily backed up by metadata, and quite a lot of it if you look at all the metadata originally harvested by the Google Books project that ultimately forms the basis for this work. The reason that fully-searchable text can never work by itself is because data requires context to become information, and that context can only be provided by metadata. Within the context of Google Books and the n-gram data derived from it, it's meaningless without an accurate date and place of publication.

The Harvard team selected the 5 million books with the highest quality metadata (and OCR), then carefully filtered it through various algorithms for enhanced accuracy (even then, they were left with a few percentage points of error). It would seem that, the larger the data set, the more straightforward your metadata needs to be, so my current perspective is that the Google (re)search model, like culturomics, is complementary to the type of traditional focused research that requires detailed metadata. Sometimes you just need to drill down within a specific collection or domain with more expert metadata than Google can provide.

As for the question of a shift in research patterns within the humanities, Jon Orwant, director of digital humanities initiatives at Google, has described the culturomics method as something that can "complement" traditional research models, rather than being an end in itself. This is worth bearing in mind when some have lebelled the technique as crude - it seems like a useful tool, with a lot of potential, but not a replacement for traditional research models. Processing this sort of data in isolation has the potential for all kinds of problems and whether this new work will spark a field of stand-alone "culturomics" is another question entirely.

Thursday, December 16, 2010

The future of digital memory

Here's a sobering bellwether for the state of our immersion in the digital realm, recently pointed out to me by a colleague who's been engaged with the issue for some time: what do we want to happen to our digital legacy after we die? Almost all of us have one, given the extent to which we rely, in one way or another, on digital and online content, so should we care about providing our nearest and dearest with what amounts to a 'digital will'? For some of the questions this raises, you can see Dave Thompson's fascinating FAQ on digital wills here. Undoubtedly, this generation represents a watershed for future memory and history. Given the pace of things, we may well be around long enough to experience its impact.

A recent article by the BBC alludes to the fact that many of us probably wouldn't want our digital lives to be shared out - we might well be more inclined to erase our data than to leave a digital will. Despite the very real consequences of an individual's online actions (think about legal action relating to Twitter feeds, amongst other things), the medium retains the suggestion of anonymity, informality and, consequently, greater personal freedom, but is it something that we would wish to represent our memory?

Of course, in the UK, society itself is online (e-government, e-science, e-learning &c.) so where does this leave the future of our collective memory and the history of our times? The question is so large that I'm going to keep it rhetorical, but I'm particularly interested in how this also feeds into issues surrounding digital preservation in the cultural and public sectors. Some of these have been raised within a recent Discover Magazine blog article discussing a publication examining the long-term storage of e-services in instances where there is a legal obligation to retain the information for a very long time (100 years, in the case of some e-government data). It presents the thesis that there is too much data, and most of it is not safe in the long-term - why not use analogue storage methods in conjunction with digital media and get the best of both worlds?

Perhaps the solution in this context is to save digital data onto lots of microfilm as the authors suggest, but it highlights the fact that the digital drive has never been coupled with sustainability. In memory institutions, like libraries and museums, the move towards digital is mostly extremely expensive, and represents a major investment, despite the risks. Digital content represents one of the best leverage tools available to cultural heritage institutions at present by prioritising user access, which equates to institutional relevance and compatibility with current information demands. Sustainability is really a bonus that everyone wants to achieve, planning as best they can, but long-term storage is indeed a mighty complex problem and a conversation that will endure.

Monday, December 13, 2010

Metadata vs. ontology

It's been rather a long hiatus in blogging terms, and an even longer one from the issues that I professed to form the core of this blog. So I'm returning again to DAM, and I wanted to come back in with a topic that's been cropping up again and again throughout my course: information integration and interoperability. This becomes an issue when you want to be able to search across domains (or even institutions within the same domain), since each has their respective metadata structures and terminological systems.

Why is this important? Essentially, research patterns these days are increasingly digital, remote (i.e. web-based) and cross-disciplinary. Online content is predominantly public-facing and we find now that 'engagement' and 'discovery' are as important as traditional focused research goals when it comes to making information accessible. It's important that information can be found via different locations and pathways, so it needs to be linked. If anyone's used WorldCat, then they've experienced some of the potential of this simple idea.

So how to do it? Something like WorldCat is pretty straightforward - it's lots and lots of metadata, harvested and converted to WorldCat (MARC) format. Metadata for libraries has a long history (MARC goes back to the 1960s) and works well for describing the traditional book format, but it does have difficulty accommodating new media and other types of objects. The metadata solution to this problem has been Dublin Core, a 'lowest common denominator' set of terms that can theoretically describe anything in only 15 base text fields, prioritising interoperability above domain-specific detail.

The problems with this solution have been several. Fundamentally, the trade-off between more complex metadata descriptions for interoperability isn't alway acceptable to specific communities. As such, efforts have been made to add additional fields to Dublin Core for greater detail, but this just returns us again to the problem of interoperability - it's another data set that doesn't work outside of your domain. A metadata system, which is a based on terminology, also doesn't allow for interrelations between objects, beyond the applied terms they have in common.

Enter, then, the ontology. Briefly, for those who might not know what an ontology is, or perhaps think it sounds vaguely Kantian (as I did until fairly recently), it can probably most easily be described as a formal logic (artificial intelligence, if you like) for a computer system, which can only 'know' what you tell it to know and how to know it.

Unlike metadata, which uses simplified terminological structures written by humans for human consumption, an ontology provides a formal system for data integration that can be far more complex. At its best, an ontology can recover the context and concepts behind the simplifications of terminological systems, focusing instead on an object itself and how it relates to other objects. It's further advantage is that by representing objects through formal relationships, they are freed from the constraints of domain-specific metadata, allowing for a top-level ontology to search across multiple domains.

It would seem that such a system might be able to better serve the 'engagement' and 'discovery' side of a user's needs, never mind providing a powerful research tool. But it's not as simple as my title makes out. Can a digital object truly 'exist' without metadata? It couldn't be found without it, so metadata remains the prime building block in this search for interoperability and information integration - everything else is built on top of that. A core metadata structure my not be the solution to these challenges on its own, but it needs to retain a strong presence. Add to that the reluctance of many in the cultural sector to commit to the time and effort required to master the ways of ontologies and it would seem that the just is still out.

Sunday, November 28, 2010

DIY Broadband

The website for the Broadband Cumbria campaign was launched over the weekend. As I described a few weeks ago, this project essentially represents the pilot for investment in universal broadband for the UK, and leads Europe as an initiative for connecting rural communities. The most striking thing about it is, of course, that communities in the Eden Valley, Cumbria, will be connecting their broadband themselves, and that will involve some digging. As such, the project has massive resonance during this austere time in the UK.

Systems in every sector of UK society have been set up with the expectation of good broadband access, so it's important to be sure that no one is left behind. If there are entire communities who cannot get online, then this seems to me as important a problem as taking care of digital information; access, in one way or another, justifies its existence in the first place.

Please consider getting involved with the campaign - wherever you are, whatever you do, you are probably using the Internet. Imagine not having it.

Tuesday, November 23, 2010

The fate of Iraq's seized documents

Many will be familiar with the story of the millions of records removed from the Iraq National Library and Archive (INLA) by the US military after the invasion in 2003. Contrary to an article published in the Brunei Times earlier this year, the INLA's archives are still retained by the US Department of Defence, Central Intelligence Agency, and Hoover Institution at Stanford University. A special Iraqi delegation to the US in April formally requested their return, an event that was covered at the time by Reuters.

I mentioned previously that digitisation cannot be relied upon as a preservation tool, but in some instances it can seem to hold great potential indeed, often revolving around issues of access. When I first heard about the problem of the INLA's seized records, it seemed that a straightforward solution would be to digitise the collections, thus allowing for digital copies to either be retained in the US and the originals returned or, at the very least, provide digital copies to the INLA. This was rather naive.

I've recently been in touch with Jeff Spurr of the Sabre Foundation and chair of the Middle East Librarians Association (MELA) Committee on Iraq Libraries, who updated me on the situation and kindly shared his report on the Committee's San Diego meeting of November 17. As it happens, the Iraqi records were digitised some time ago, but there is as yet no timetable for their return to the INLA. It's unclear why this is the case - the US Department of State has been supportive of Iraqi cultural initiatives in several other regards - but the conversation for their return has at least begun. Dr. Saad Eskander, Director of the INLA, has detailed the problem in an excellent article here.

The post-Saddam INLA under Dr. Eskander has seen the largest reader numbers since its foundation in 1961. His vision is one of open scholarship: the MELA report details extensive expansion in the development of digital collections at the INLA and the desire to make the US-held records fully available from within Iraq is clearly seen as essential for the process of reconciliation there. As detailed in Dr. Eskander's article above, the records are currently only being made available to select US higher education institutions for research under the aegis of the Pentagon's Minerva project.

It would seem that, whatever the potential for open research (or even reconciliation) that would come from the return of Iraq's records, it will be politics that decides the outcome. Similarly, while digital media has great potential to facilitate communication, collaboration and, in this case, Dr. Eskander's vision for the INLA, it will ultimately be people and politics determining the fruition of that technological potential.

Friday, November 19, 2010

Data for transparent government

According to Francis Maude, the UK minister for the Cabinet Office writing in The Guardian today, the current UK Government "will be the most transparent and accountable government in the world". This reasoning follows from the publication of the 'Whitehall accounts' for the first time ever, which detail Government expenditure since the coalition came to power. This data is presented online at Data.gov.uk.

I'm not particularly interested in the finer points of the actual spending, but rather the nature of the move towards transparency itself, how this has been presented and the possible outcomes. It's clear from Francis Maude's claim that this is a work in progress, and an intriguing experiment it certainly is. By the Government's own admission, this is not complete data, but an unprecedented starting point. From a DAM perspective, the Data.gov.uk website hosting the data itself feels a bit like a work in progress - the information is probably there, but the website doesn't do the user any favours. Yet this raises a key point: to be transparent, the Government needs to present raw data, rather than digested information - it must ride the balance between accessibility and leading the user towards any particular conclusion. There may be virtue in its spare design.

In any case, people are downloading and interpreting the data, and numerous independent developers have already released a number of their own analytical tools. This is where the real nature of the experiment lies: will the Government be held to account and real savings found, or will it be put in a straitjacket, possibly resulting in further spending just to analyse all the claims made against it and avoid more in the future? Everyone will have different views on spending priorities, and their own way of interpreting the data; the findings may well make for good press. One hopes that this won't result in the kind of knee-jerk reactions that have hit US politics, which must be partially to blame on new media, seemingly able to create a loud enough drone that can stifle constructive debate.

Thursday, November 18, 2010

Daniyal Noorani's Find Heaven Project


A friend of mine, Daniyal Noorani from Lahore, Pakistan, has put together a very interesting project: seizing on some of the more transcendent media of music and animation, Daniyal is using these as tools in an effort to bridge the growing gap in relations between Pakistan and the US, his current home, reflecting the wider issues that continue to place the West and the Muslim world at odds.

It started with a short film, Find Heaven, which made the Official Selection at the 2010 London Independent Film Festival and screened at Harvard University as part of the Muslim Film Festival earlier this year. This has grown into what are effectively four related projects of music and animation, including a studio album and what Daniyal describes as Pakistan's first anime series, relying on expertise in Pakistan itself. He's launched it as a Kickstarter campaign.


The Find Heaven video


If I had $10,000 in my pocket, I know where I would put it - I'd like to see where this goes. But whether or not this particular project succeeds in this instance, I like Daniyal's work for several reasons: it has a strong appeal for young people, which is what is going to make a difference; it's also multi-disciplinary and evidence suggests that it works well at both ends, in Pakistan and with a Western audience. With this kind of subject matter, one can be tempted to cry from frustration, but this stuff laughs resoundingly, yet retains poignancy.

Finally, in cold hard political terms, Pakistan is important. I'm not a foreign policy expert, but here is one explanation for why Pakistan matters, from a member of the Senate Committee on Foreign Relations in response to Obama's troop surge in Afghanistan at the start of 2010. The message has been repeated often. Right now, anything that can offer some informed popularisation of the myriad issues connected to these intercultural problems, like the Find Heaven Project, is hugely welcome.

Tuesday, November 16, 2010

JPEG2000: the solution to mass digitisation?

Simon Tanner lays it outA number of libraries have begun major digitisation projects using JPEG2000 images. Why is this of interest? It's a question that Simon Tanner, from King's College, tackled at the opening of today's JPEG2000 seminar at the Wellcome Library. His short answer was, bluntly, that cultural heritage institutions are seeing the JPEG2000 format as increasingly attractive for mass digitisation because they can afford it; essentially, it's the only reason the format is considered.

This was borne out in subsequent presentations - that you can do more with less money - but there are other reasons that JPEG2000 can seem enticing. Basically, the cost savings are based on smaller file sizes - the Wellcome Library reported a 89% reduction in file size after compression before loss of 'visual' resolution in comparison to TIFF format for their needs. A smaller file size also means faster processing - high definition, zoomable images of large maps or entire newspapers, visually indiscernible to uncompressed TIFFS, are displayed in real time and rapidly integrate with other image software - two notable projects were the National Library of Norway's digitisation programme powered by Geneza, and the Old Maps Online and Georeferencer projects. Each library that has chosen to commit to JPEG2000 is therefore principally concerned with access: new online collections where more material is delivered faster.

Another notable thread to the disussion was the relative ambiguity of the format's preservation credentials. Richard Clark, the UK head of delegation to JPEG, queried why the digital preservation community hadn't been more involved in feeding back to the developers (compared to the commercial sector, notably the film and security industries). I suspect that communication will be on the rise, as it was noted today that future migrations of JPEG2000 images may result in the loss of ICC colour profiles. This, coupled with concerns about failing image resolution down the road, makes for some pretty fundamental preservation problems.

Implementing JPEG2000: Bedrich Vychodil of the National Library of the Czech Republic, Christy Henshaw of the Wellcome Library and Sean Martin of the British Library

As observed by Johan van der Knijff of the Koninklijke Bibliotheek, just because JPEG2000 has been taken up by major institutions doesn't mean that the format is tried and tested in the long term context of digitisation. If anyone still thinks that digitisation is going to save the world's collections, this is some of the best evidence I've seen so far that this is a fantasy - a piece of paper, properly stored, will always outlast a digital file. So why bother? There is a balance between long-term preservation and access here, and while you making every effort to keep the stuff as long as possible (saving you the need, at least, to re-scan), these are programmes driven by access. While a secondary outcome may be the relief of pressure on your physical collections, digitisation projects largely reflect the new order of information retrieval.

Saturday, November 13, 2010

"Does poetry need paper?"

This is a quote from author Don DeLillo in this month's Prospect Magazine, responding to the rise of electronic publishing. The comment actually referred to language more generally, but poetry itself is a nice place to start in discussing some of the issues raised in the article.

I'm not sure that poetry has ever held the position of being a money-maker. Oxford University Press, for example, axed its poetry list back in 1999, openly admitting that the decision was made on financial grounds. If poetry deserved to be published anywhere, one would think that a university department publishing group would be the place to do it. Nevertheless, university departments are still required to make money, particularly these days, and poetry tends not to - there are just a few publishing houses able to maintain a significant output of poetry, usually subsidised in one way or another from sales in other departments or by a funding body such as the Arts Council England. An article in The Observer a few years ago placed poetry sales nationwide at 890,220, compared to fiction at 45,772,541.

With poetry already under threat within the traditional publishing model, the Internet and e-publishing may be exactly what is needed with its proven ability to effectively market titles peripheral to the mainstream in a variety of media (a phenomenon popularised by Chris Anderson in 2004 as the 'Long Tail'). As an example from my own experience, I've only ever read the poems of Federico GarcĂ­a Lorca in electronic form, via websites. In retrospect there were two advantages to this (albeit, highly subjective): first, since each poem was not bound within a collection of Lorca's other works and essentially isolated on the screen, I would spend more time with a single poem than I would usually do with bound volumes of poetry. Second, because the original poetry is in Spanish, getting hold of the original and the translation (or multiple translations, for that matter) is much easier than hunting down a printed edition with parallel text. Having said that, I would probably favour the experience of a book, particularly if it were presented to the poet's original specifications, but I learnt a great deal about Lorca without ever holding one.

My MP, Chuka Umunna, described society on BBC Radio 4's Any Questions? this weekend as increasingly 'bespoke' - how do politicians communicate with an electorate that may be seeing the world increasingly in black and white as their wants are more and more frequently tailored to suit them? This is probably a direct consequence of the new media - the speed of the information we receive, and the choice. Unfortunately, it could be that this choice itself can only serve to narrow our views, as we seek out the information most agreeable to us. As the Prospect article points out, the new novel, for example, may simply be customised by the individual reader, a more interactive experience that some would argue might give you more of what you want, but less of what you need.

Friday, November 12, 2010

Why have digital asset management?

I have a philosophy that you should always ask yourself why you're doing something when you do it (and preferably before), and keep on asking that question. For that reason, I wanted to backtrack and apply that to my new discipline, and to this blog. It will probably form a core stream of my writing here, tackling the 'what', 'why' and 'how' of DAM. A number of groups have described the key components in the operation of a DAM system, but I want to start by looking at the reasons for its being in the first place.

Briefly: the 'what'. It is probably the 'asset' within DAM that is most ambiguous, yet defines the discipline itself. My overriding sense of the term is that here assets are leverage. They are kept as a means to some conceptual end (as digital data at the logical level is never an end in itself) - looking at it in its broadest sense, that end could simply be that, in one way or another, someone is willing to invest their time in these assets, which is the key commodity in a digital economy. Any digital file needs to be worthy of an investment of time (that is, also, attention), otherwise it is not an asset.

In one example, digital records held by a business for legal reasons may never be seen, but have the potential for leverage in the eyes of the law. In another example, a university library's forward-thinking digital mission may capture a wider audience's attention and justify it's very existence to university administration, the leverage here serving to attract funding to ensure it's survival as a relevant institution. It could also be said that a business might potentially survive or collapse on the basis of its legal records.

All of this goes some way towards answering the 'why' of DAM, but this can clearly depend to a large extent on the sector within which one is operating a DAM system, as in the two examples above. At this stage I would like to look for some aspects of DAM that serve as the lowest common denominator in justifying its purpose. I believe that these are the access, efficiency and preservation of digital assets. These form the roots that then branch out into various manifestations of detail in different sectors.
  1. Access: DAM allows for the dissemination of digital information to those who require it. There is no asset without access, and this involves the understanding and application of metadata and ontologies, which form a bridge between access and efficiency.

  2. Efficiency: As well as optimising access, metadata and ontologies enhance suitability and reliability of information. Access needs to be rapid and fit for use to make digital information viable. This efficiency is the essence of the 'e' in e-learning, e-science &c. which in fact stands for enhanced - properly managed, digital information can undoubtedly enhance knowledge transfer.

  3. Preservation: Data needs to remain compatible and often interoperable between systems for effective use and optimised access. In most cases, it also needs to meet these criteria of use and access for extended periods of time. DAM systems can deliver this and avoid the need for costly digital archaeology, or total data loss, in the future.

Tuesday, November 9, 2010

The public domain

I'm probably the last person to blog on this, but what the hell, it's an interesting story. This is the case of copyright infringement by Cooks Source Magazine against Monica Gaudio, which I read about on the Techno Llama blog yesterday and find interesting for a couple reasons.

First, it's provided me, and probably many others, with some clarification of Internet copyright. This case raised the important question of what the public domain actually is. Stuart Karle at the Columbia School of Journalism put it very simply in a Techland news article: just like any published book, original material is within copyright throughout the lifetime of the author plus 70 years - you don't give up copyright just because you put something up on the Internet. As we've seen in many different contexts, you can believe that the digital world is as open as you like, but copyright will usually be there somewhere (this is also, incidentally, the exact same copyright specification that libraries face in digitising their collections).

Second, it's another good example of idea transfer via the Internet. Taking the Techno Llama blog title - Why sue when you can use social media? - it would seem that on balance the resulting Internet storm will be more damaging (at least psychologically) than a quiet settlement. It's difficult to quantify, but Cooks Source Magazine would probably have preferred straightforward litigation at this point. In any case, it seems that Gaudio's outcome was unintentional - it sounded as though she was looking for advice, albeit publicly, and the injustice was confirmed by mass consensus. Just as many people spend a lot of effort in trying to make their online content 'viral', it can also happen by accident. The plus side here is that probably everyone has a much better understanding of Internet copyright law than ever before, which might reduce the chances of further events of this nature, and make future arguments over copyright more clear-cut.

Sunday, October 31, 2010

UK Government digital policy

Iain Lobban, the head of GCHQ, recently came forward in a spate of intelligence leaders publicly discussing hot potatoes. In his case it was the cyber threat to the UK, which subsequently made its presence felt in the Strategic Defense and Security Review (SDSR) as concerns over cyber terror and threats to digital infrastructure build.

I preface with this because it’s an indicator of the extent to which we really are in a digital economy, and while digital hasn’t quite torn down the ‘old’ economic models in the way that commentators during the 90s predicted, it’s raised a lot of new possiblities and clearly opened up new vulnerabilities. Security issues aside, I'm interested in feeling out what domestic policy currently is on digital information and infrastructure.

The last Government introduced the Digital Britain programme in 2009 through the Department for Business, Innovation and Skills. This promised a three year plan to boost digital participation, universal access to broadband by 2012 and a fund to invest in enhancing its capacity, amongst other initiatives. The scheme's website can now be found safely archived on the National Archives' servers, which says a lot about what has become of the programme.

Whether for political or economic reasons, or both, the whole thing seems to have been 're-scoped' and will, probably, be rebranded (the reason that universal broadband has been pushed back is the last Government's fault &c., we may yet forget it was ever their initiative). The only concrete thing to come out the programme so far is the Digital Economy Act, shoring up business interests by reinforcing copyright law. Even before the election, many Digital Britain measures had been abandoned due to Tory opposition and the distraction of the election itself.

Meanwhile, without legislative support, it seems that the digital divide in the UK will continue to increase, which essentially equates to an urban-rural split. It's notable that in the constituency of Penrith and The Border, one of the most rural constituencies in the UK, MP Rory Stewart has made broadband a central component of his agenda, at one time presenting the possibility of communities there connecting the final miles of cable themselves. Having secured funding for a pilot project, which could lead the way towards the 'universal' 2mb connection (currently scheduled for 2015), one more Digital Britain measure may yet be achieved. With most in Government and business agreeing that high-speed broadband is the most ciritical element required to enhance the digital economy, this pilot is one to watch.

Friday, October 22, 2010

Digital development: INASP

I had the opportunity to meet with Peter Burnett from INASP (International Network for the Availability of Scientific Publications) yesterday, an Oxford-based development organisation supported by multilateral funders and INASP partner countries.

As is fondly said of many of the best British institutions, they seem able to punch above their weight. INASP's name itself certainly provides only a limited indication of the real scope of their work. Their basic mandate - providing access to electronic journals in the developing world - is not as simple as it sounds and requires five main areas of activity, very briefly:
  1. ICT training, including bandwidth management and optimisation, and associated online IT skills.

  2. Information delivery, negotiations with external publishers and the development of library consortia in-country.

  3. Library development, particularly surrounding digital collections, with long-term preservation and access.

  4. Open access, providing a platform for developing countries to publish their journals online and AuthorAID to provide support to individual researchers.

  5. Publishing support, facilitating websites that can host electronic publications created in developing countries.
As far as I'm aware, INASP is the only group looking to establish a digital knowledge infrastructure in developing or emerging countries in this kind of holistic manner. The follow-through on these initiatives is also quite something - the whole digital foundation is taken one step further by trying to connect national policy makers in partner countries with the research information delivered via the means decribed above, promoting evidence-informed policy making.

Monday, October 18, 2010

Ontology in digital media

Frankly, a blog on digital issues doesn't lend itself particularly well to imagery, though I look forward to being corrected on this. Since I'm dipping into the philosophical here, I've taken the opportunity to preface this piece with a view from outside our seminar room, which I snapped last week.

View from outside the Anatomy Theatre, Kings College London
Ploughing through readings on DAM, one word seems to come up rather frequently, clearly packing some weight: ontology. It's come up so often that, rather than dismissing these philosophical trappings as superfluous to practical implementation, it seems worth taking the bull by the horns and exploring the potential of an ontological enquiry in this context. I did, after all, begin this blog as a means of self-education.

The term 'digital' is probably taken for granted by most people, and registers largely at the conceptual level. It's a fluid medium that is hard to pin down: is it merely our conceptual perception of information presented on a screen, for example, or the process by which hardware and software interact to interpret digital data - or even the physical medium to which any of these can ultimately be traced?

These notions were put forward by Kenneth Thibodeau at the US National Archives and Records Administration some years ago and I find them very useful, not least because they highlight some of the major differences between digital and traditional preservation concerns. Essentially it seems that in most cases, the conceptual delivery of digital materials are of prime importance (that is, the point at which digital data takes on meaning for humans), to the point that the physical source and even the logical processes that deliver digital data in conceptual form can be altered to suit that end. Compare that with the object-orientated world of traditional preservation which seeks to alter physical objects as little as possible - when applied to digital media, a traditional approach would probably favour the preservation of obsolete hardware instead.

I might also add that, by nature, DAM is a more active process, because leaving a digital object in a box (whatever form that object may take) for 50 years is going to ensure its loss and destruction, rather than its preservation, as would be the case with paper or other physical objects. But I don't believe that DAM can be truly effective without ontology, since its effective delivery hinges on a proper understanding of the differerent facets of a digital object and how they relate to one another to achieve preservation, function, accessibility and so on. Whether or not you even aware that you are applying an ontological investigation to DAM processes, it's always there. As such, it does seem like a practical tool, even if the etymology of the word is Greek.

Wednesday, October 13, 2010

A National Digital Library for the US

In the current issue of the New York Review of Books there's an article by Robert Darnton, Director of Harvard University Library, from a recent talk that opened a conference on the possibility of creating a National Digital Library for the US. His ideas on the library in the 'new age' are fleshed out rather more extensively in an earlier article from 2008, featured in the same publication.

In Darnton's writings he gives a laudible defense of the traditional book and library institution. This perhaps appeals more to a minority (even a small minority) in the current research climate, but it's an important argument that often gets drowned out. While agreeing entirely with the ideas of access and traditional preservation that he describes, it's a little concerning to see what has been left out of this discussion surrounding a National Digital Library.

To begin, it isn't just the 'modern' and 'postmodern' student who performs most of their research digitally - all the signs show that, within the sciences in particular, we are being inundated with born-digital material. We will find that even the most bookish scholar - should we decide to value his output sufficiently to archive it - will at least have left behind an email correspondence. Indeed, the first hits for 'born-digital data' via Google find an explanation of why the Crafts Study Centre at the Surrey Institute of Art & Design chose born-digital storage for the 'reusability of the resource', and an article in the New York Times praising Emory University's acquisition of Salmon Rushdie's digital files. It should go without saying that, meanwhile, the scientific community have long since entered the age of the petabyte.

While many books have indeed lasted many hundreds of years, they, like digital data, also get lost and destroyed - any advantage they have displayed in longevity doesn't seem to compensate for their limitations in time and space as research tools. With a focus on the printed book, dismissing born-digital as an 'endangered species', we are throwing out the majority of modern scholarship. It therefore seems that this approach will create exactly what Darnton claims to want to avoid: the library as museum. It's a museum of past research at the expense of the future, dictating the centrality of the traditional library when in fact the modern researcher expects resources to come to them, and not vice-versa.

Just digitising books is really only a part of the digital puzzle when it comes to libraries and, for the reasons mentioned above, doesn't reflect the current and future trends in scholarship. Nor is it a progressive response to the question of a National Digital Library: the first digital library started in 1971 with Project Gutenburg; the first ISBN issued to an e-book was in 1998; Google Books was launched in 2004. The push for digitization presented here sticks to a rigid hierarchy surrounding the supremacy of the book and simply doesn't accommodate born-digital (or even archival) content. Copying every book around is not going to address the most pressing concerns for a National Digital Library and will never further the scope of scholarship.

In a recent survey of 275 US insitutions (with a 70% return), the OCLC identified that special collections in the US were primarily concerned with issues of space, followed by born-digital content, and then digitisation. Only 50% of insitutions had assigned responsibility to born-digital collections. Ignoring born-digital collections and focusing on books does not take care of the problem, and while we'll probably have our Folger First Folios to consult for years to come, much of modern research will be left uncollected and unpreserved, and the real potential for new avenues in digital scholarship lost. It may well be that the scale of the problem does necessitate the creation of a new, exclusively digital insitution, but the realities of digital scholarship are far more dynamic than they're given credit for here.

Saturday, October 9, 2010

Collecting UK online publications

It's not difficult to understand why copyright concerns aren't always tackled in scoping documents concerning digital futures (as mentioned in the previous post) - the topic is big enough that it probably merits a report entirely to itself. One angle on the copyright issue is the question of legal deposit for UK online materials. The UK Department for Culture, Media and Sport recently ran a consultation on the Collection and Preservation of UK Online Publications, to which there were a number of responses from various institutions.

The general recommendation from the Legal Deposit Advisory Panel (LDAP) to the Secretary of State for Culture (currently, Jeremy Hunt) was regulation-based harvesting and archiving - the libraries have a legal entitlement to UK domain sites, though by the nature of the material in question, the libraries will need to collect (or harvest) these materials directly themselves. Incidentally, 'agents' are mentioned as harvesting material on the behalf of libraries, which continues the theme of third party involvement in matters concerning digital management.

One of the more interesting sections of the report is that covering policies for deposit, access and use (p. 31). Despite defining online content as "available free of charge and without access restrictions" throughout the report, the LDAP recommends that "access must be confined to readers (and staff) using terminals, screens or devices that are controlled by the Libraries, and whilst they are on the Libraries’ premises". This takes the Legal Deposit Libraries Act 2003 (which aimed to encompass digital publications but not websites - the 2003 Act calls them 'non-print publications') and applies it in a literal fashion to all online content, resulting in an apparent contradiction.

Having said that, while I'm not sure how threatened libraries really are by a transition to digital, this level of restriction could empower them as gatekeepers to the most complete collection of archived web content available - after all, the websites would not have continued to exist without their intervention, the live web is not the same as a depository and there would be multiple access points to this content throughout the UK. However, the idea of taking something that was once "available free of charge and without access restrictions" and making access restrictive is probably too much of a leap.

So far, web archiving in the UK has been permissions-based, rather than regulation-based. While the precedent for web archiving operated under a much more restrictive model, it could easily allow free access to all. It will be interesting to see if re-writing the legislation to accommodate this (if indeed it is re-written, I believe that there is going to be a second round of consultation, which is a positive sign) requires a compromise between ease of harvesting (ideally, regulation-based) and ease of access (free for everyone, anywhere in the UK).

Wednesday, October 6, 2010

A fear of the digital?

We've spent the last week examining various documents concerning the future of digital collections and their associated technologies in cultural institutions over the next couple of decades. Specifically, two of the UK's national libraries have recently attempted to tackle the subject in separate scoping documents: the British Library's 2020 Vision and the National Library of Scotland's report on the library in 2030.

I've noted recently that, based on my brief experience so far, many cultural institutions don't seem fully prepared for digital (whether culturally, strategically or technologically), but the idea that institutions might actually be afraid of embracing digital is a slightly different angle that emerged from some recent discussion surrounding these documents.

There are certainly a few reasons to be fearful of launching major projects concerning digital collections and infrastructure. The one that springs most readily to mind is the existing perception of the impending obsolescence of analogue media. "Throw it in the Charles [River]" was one scientist's recent response to the collection storage problem at Harvard College Libraries. All of us access and use different information in different ways, so as a blanket solution the notion would be a bit absurd, but the idea of obsolescence is a pretty powerful one, not least because those who hold it often appear to have more influence on outcomes than the institutions in question.

Libraries, in particular, are stuck with the problem of at once appearing to be 'vital' by embracing digital, yet not letting go of core cultural elements within their institutions that in many instances stretch back hundreds of years (like storing books). While the scientific community requires an increase in the quantity of born-digital material to continue pursuing cutting-edge experiments, the cultural sector doesn't need digital in the same way, but rather appears to see it as complementary to their original mission of preserving analogue cultural collections (mostly, these are digitised items, so a direct link remains to the analogue). Digital can certainly promote learning and access, but in some cases it may be a necessary evil driven by economic and political factors.

Perhaps fear is also reflected in a reluctance to handle the really big questions associated with digital collections within these documents. It's slightly frightening for some to think that we can't save everything - indeed, that we can't save most things - so who tackles the problem of what to save? Then, with the Digital Economy Act passed in the UK in April, copyright will continue to be a significant hurdle, but this isn't usually explored in much depth, if it's broached at all. Finally, who will manage these projects, and with what technology? Certainly, it's a big unknown, but perhaps it's better to shoot first and ask questions later, particularly when you're under attack.

Sunday, October 3, 2010

Research on the Silk Road

I've been fortunate to have had involvement in some pretty varied projects lately, so I thought I would post a link to a bit of work I put together with a colleague at the Harvard Art Museums. The project (ongoing) involves an examination of paintings on textile recovered from the Mogao Caves near Dunhuang, Western China. The poster here, focuses on such paintings in US collections.

I've been drawn to Dunhuang, and the Silk Road in general, ever since seeing the fantastic exhibition The Silk Road: Trade, Travel, War and Faith at the British Library in 2004. The Mogao Caves are particularly important, as they yielded a sealed cache of documents, paintings and other artefacts untouched for a period of around nine hundred years. The materials from the caves have been of wide interest to many disciplines, revealing economic, social and religious aspects of Silk Road cultures, Asiatic languages and, in the case of the paintings on wall and textile, artistic development.

Indeed, so extensive are the materials recovered from the site that they have spawned an international digitisation project, aimed at connecting researchers around the world with these unique artefacts: the International Dunhuang Project. Almost as interesting as the artefacts themselves are the stories of the (mostly) European explorers who raced into the Chinese deserts to procure them - one of them is thought to have inspired in part the character of Indiana Jones.

Wednesday, September 29, 2010

Data about data...

Our first real class conversation was nicely framed by a comment that Simon Tanner had made the day before during his introduction to the course: in digital systems it's not so much about what you do, but why you do it - digital systems are receptive to logic, but we don't do things because they're logical, we do things because we're human.

Welcome, then, to the metadata 'universe'. For a visual representation to prove that 'universe' is really no exaggeration, Jenn Riley from the Indiana University Digital Library Program has created this wonder here. The information professionals commenting on metadata issues online are probably the first I've known to plump for statements that amount to 'people are terrible at communication', but perhaps they have good reason.

Since metadata is describing data, we have the human ingredient. It becomes apparent early on that a discussion of metadata - something of practical use - has the potential to descend all too easily into a discussion of semantics, which, while interesting from a philosophical perspective, elevates the existing challenges of getting to grips with metadata towards something Sisyphean.

This concern, however, is balanced with the genuine value of an intellectual (i.e. human) hierarchy applied to a collection with digital metadata by someone familiar with the collection and, hopefully, its users. Perhaps inevitably, Google also entered the conversation - doesn't Google actually take care of all our searching needs anyway?

This question does help to highlight the fact that the intellectual work of various information communities on the myriad metadata out there isn't necessarily so confounding after all, from a collection-by-collection perspective. Perhaps Google is more intuitive, perhaps it tackles more data - it's a great tool with a lot of potential, but it also has limits on what it can do.

UC Berkeley's Geoffry Nunberg provides a rather good deconstruction of some of the recent problems with Google's metadata (in the case of Google Books, specifically, where one would think metadata might be a priority) here. A quick check of the problems that he describes shows that a number still remain. It's at this point that you might feel that the complexity and compartmentalisation of the metadata universe aren't such a bad thing after all.

Saturday, September 25, 2010

Digit Ass Man

I've now been registered and inducted onto the new graduate course in digital asset management at King's College, also referred to as MADAM, or even, wonderfully, DIGIT ASS MAN, as it appears on my ID card. One hopes that the quality of the course and its outcomes will be inversely proportional to the rather unfortunate manner in which its title may be abbreviated.

Having met the faculty responsible for the programme, I'm very confident that this will prove to be the case. I believe that this is the first formalised training on this subject anywhere in the world, and the process is necessarily going to be organic - I haven't experienced this form of openness amongst faculty, or freedom for self-direction as a student, in higher education before. I also calculate that we have a 1:1 ratio between faculty and students, which is going to make this a rich experience.

The course website describing its content can be found here, and it kicks off on Tuesday with the core component introducing DAM. I also expect to open with an optional module on metadata theory and practice. While currently there is a heavy emphasis on the digital, the course intends to touch on traditional archives theory, which seems important for reasons I've touched on previously (i.e. to be prepared to help institutions expand to digital). Finally, the King's environment is an exciting one, with the Centre for Computing in the Humanities churning out a large number of interesting and varied projects, and I'm intrigued to see that the Department of War Studies, amongst other policy departments, is just around the corner...

Sunday, September 19, 2010

The (perceived) need for digital preservation

I'm interested in the implications for digital infrastructure in developing countries, but what about right here in the UK? Facing two years of training in DAM, an obvious question has been: how prepared are the public, commercial and cultural sectors for managing and preserving digital material in the UK. If I'm honest, I'm curious to know what my job prospects are, and what sort of job I might be doing. I was therefore interested to find a document on the Planets website from last year that aimed to answer, in part, that very question: assessing whether organisations are prepared for digital preservation.

In brief, the findings identify that awareness of, and action on, digital collection concerns in various institutions seem to have increased in a manner corresponding to the growing prevalence of digital information in the world in general. That seems positive, but most of the survey respondents in this article are cultural institutions with a known digital profile, and many of them are national institutions in EU countries. Even then, the article shows that museums, for example, made up only 3% of respondents - we don't know the percentages for the various types of institution targeted or rate of return, so it's difficult to know whether museums deem digital collection management unimportant and thus have not responded to the questionnaire, or simply that very few museums were targeted by the survey. In either case, the 3% suggests that museums have a relatively low digital profile at the moment.

The question I'm looking to answer here is a difficult one, and was not the target of the article, but a few things can be inferred. The institutions surveyed were supposed to have an 'interest' in digital preservation, but it's unclear as to whether that interest has been declared by the institution itself, or inferred by the authors of the article - of these, the article shows that around 25% of institutions who probably ought to be practicing digital preservation in one form or another are currently making no attempt to do so. For institutions who lack resources, or even awareness, of digital preservation, they may need assistance from a third party - almost half of the respondents themselves already use one, so this seems to be an important growing market.

Before I signed up for the DAM course at King's, I discussed the training with HSBC Global Archives and the European Central Bank to see how applicable DAM would be to their operations (the commercial sector being outside of my professional experience so far). Their response was that, while they were looking to explore DAM, they would need a manager who had an understanding of traditional archives and library theory. Perhaps the most encouraging thing for the new graduate is the widespread interest in digital preservation in all quarters, but it doesn't seem that everyone is ready yet - while jobs in pure DAM certainly exist, many will need expertise that embraces both analogue and digital preservation to bring them up to date.

Sunday, September 12, 2010

Preservation and (business) management

I recently had the opportunity to meet and talk with Barclay Ogden, Head of the Preservation Department at the University of California Berkeley Library. His activities in the field of preservation have included the development of the CALIPR collection needs assessment tool, a free piece of web-based preservation survey software hosted on UC Berkeley's servers (it's availability and ease of use would make it suitable for implementation in almost any collection around the world). Barclay has been in the preservation business for forty years and, following the birth of random sampling in the 1970s to quantify preservation problems and needs assessment in the 1990s based on collection use and value principles, he has turned his attention to risk management.

His approach to risk management is a novel one in the field, as he has managed to collaborate with the second largest risk management consultancy in the US after approaching the University of California Office of Risk Management with his preservation concerns. This has resulted in positive attention from the UC administration, and that translates to funding for the UC Berkeley Library. In this economic climate, it's a great working model, but for anyone who's worked in a library or cultural institution, they will probably have encountered some cultural resistance to business models (and for anyone who's studied business, they will know that culture eats strategy, for lunch). Barclay has circumvented cultural opposition and the problems of emotive, experiential modes of risk management that can often be encountered in the cultural heritage community, going straight to the experts in rational, analytical risk management - arguably the group that can really influence outcomes at the UC Berkeley Library.

I can understand the resistance to business models. One of the problems with accreditation/certification in conservation, for example, and standards in general, is that cultural heritage professionals could potentially lose their individual operational flexibility in a sector where outcomes are often subjective and difficult to quantify. Under the current circumstances, though, business management models represent an area that requires serious and considered engagement in order to achieve a mutual understanding; I don't believe that learning this language has to result in losing one's culture. Business models, language and communication, and the adoption of standards, have the potential to give the cultural heritage sector a real leg to stand on.

Thursday, September 9, 2010

The digital insurgency

I've mentioned a few of the benefits of the web and the digital world in this blog, but here's a flip-side. On September 11, a pastor in Florida has decided to burn copies of the Quran - at this point, most people reading the news will have probably heard about it, and it's prompted direct discussions between Hamid Karzai, the Afghan president, and General David Petraeus, the US and NATO commander in Afghanistan. Their concerns are covered in an Al Jazeera article here, but they are worried that the Taliban will use images of the burning to incite opposition to the NATO mission in Afghanistan and fuel the insurgency there.

It's obvious how all of this is possible. The use of digital media and the web have allowed a small minority to promote their message, which will potentially have grave consequences far beyond Florida. From the initial publicity surrounding the event, the constitutional uproar in the US and finally the Taliban printing off the pictures for the Afghan public, it raises the question of whether 'digital' counter-insurgency might be a viable project. The potential for real cultural awareness and development via digital means seems to be there, but this necessarily raises some big questions regarding access (the 'digital divide') and education. For their part, extremist groups the world over have long been well established on the web.

Monday, September 6, 2010

Dynamics of the repisitory

'Going digital' doesn't seem to be a progressive movement in cultural institutions anymore, but rather a necessity. When even very focused collections, such as the Folger Shakespeare Library, are now holding digital collections, the question isn't whether an institution should go digital or not, but how they should do it.

Digital assets, whether born-digital or created through the digitisation process require management and the logical place to start would seem to be through a repository system, perhaps Tessella's SDB (Safety Deposit Box) developed with the UK National Archives. You have, after all, just doubled your collection by creating digital versions of analogue, hard-copy materials; by acquiring any digital material at all, you have at the very least added a significant additional conceptual dimension to an analogue archive.

Given the complexities surrounding digital collections, it's interesting to find that there is as yet no certification regarding their actual implementation and management. In 2002, a joint report by the Online Computer Library Center and Research Libraries Group concluded that there was a need "to develop a framework and process to support the certification of digital depositories". While certification soon arrived for a conceptual framework governing digital archives (ISO 14721:2003 OAIS), and repository systems like SDB now comply to that standard, there is essentially no agreed best practice for the management of digital collections.

This situation remains perhaps due to the complexities of the practical implementation of digital asset management protocols. A good example is the question of agreed formats (succinctly described by Dave Thompson in his article here), and tackling this problem forms the foundation for a viable digital acquisition policy. The fact is that appraisal of formats prior to acquisition is very much up to the individual institution: most libraries and digital asset managers prioritise intellectual content, while galleries and conservators would concern themselves with preserving hardware. A system like SDB cannot begin to answer such complex questions, which can border on the philosophical: is there loss of artistic integrity in migrating 16mm film to a digital medium? Who cares about the format as long as the information is there? What's the significance of converting Office Documents to PDF? It would seem that, even at the input stage, there are limitations to what a repository can do.

Wednesday, September 1, 2010

Web archiving at the Wellcome Library

This morning, Dave Thompson, Digital Curator at the Wellcome Library, gave a talk on web archiving, following on from a presentation he made at the Future of Medical History Conference at UCL in July. Dave introduced web archiving at Wellcome in 2004, where the numbers of users engaging with digital collections has been on the rise ever since. Below is a brief overview of his observations on the topic.

While use of the web is increasing, it is perhaps the most transient digital medium - it remains largely unregulated and facilitates the rapid publishing of material by anyone, yet similarly permits the instant alteration and loss of that material. The UK Web Archive was started by JISC and British Library, with the Wellcome Library, in order to preserve websites that 'reflect the diversity of lives, interests and activities throughout the UK'. One of several such preservation efforts taking place around the world, the site allows access to retired sites (for example, set up for major projects or world events, such as the Asian Tsunami) or old versions of existing sites.

So why bother? Taken within the context of the 'future of medical history' at the Wellcome, collections growth without digital assets would be limited. As an example, it may be noted that scientists are increasingly beginning to elucidate their work via open digital means such as Twitter, Flickr and so on; certainly within the domain of the arts and foreign policy, it is not difficult to find, say, film directors and diplomats, respectively, lending their expertise to current issues online. In particular, the Wellcome Library has focused on preserving online material from smaller organisations who simply do not have the resources to preserve their own websites, and one such project gave rise to the Personal Experiences of Illness collection in 2005. The existence of this immediate, personal and unmediated material is undoubtedly one of the great strengths of the web.

The legislation in the UK regarding web archiving has been well summarised here, and it remains a grey area. While bureaucracy may arbitrarily preserve some material, publishing cycles within institutions and other stake-holder mediation can cause content to be lost just as quickly. There are also, of course, costs involved in preservation, and just to design and maintain a website. At the Wellcome Library, web content is preserved using WARC (Web ARChive), which stores collected data in large aggregate files, where aggregated data objects can be identified and extracted without the use of a companion index file; this means that if one data component becomes obsolete, it can then be isolated and upgraded for compatibility. Due to the legal, ethical and technical complexity of this topic, I'll doubtless be returning to it in the near future.

Friday, August 27, 2010

International scope

I mentioned 'market demands' for digital asset management (I'm going to bite the bullet and start calling it DAM from now on), as well as my interest in international cultural heritage and development, so it seemed worthwhile to get rolling with an examination of these related topics.

At a conference on The State of Digital Preservation: An International Perspective which took place in Washington, DC in 2002, 'international' seemed to be limited to the EU and Australia, outside of the United States. This was probably a reflection of those institutions employing best practice in digital preservation at the time, but since then, digital preservation has expanded exponentially; digitisation projects seem to be taking place almost everywhere, from the Tibetan Digital Library Project creating copies of endangered Tibetan manuscripts in India, Nepal and Bhutan, to the Afghanistan Digital Libraries (see below) and World Digital Library. Concurrently, internet usage has become as free and widespread as ever, providing the digital preservation community with one of its greatest challenges: the preservation of the world's web content.

The demand for digital assets in the developed world seems clear enough, and the digital influence on education is especially noteworthy, where even US higher education institutions are seen to be falling behind in digital provision for students (though I don't think that I can pass up the opportunity to highlight an example from the opposite end of the spectrum: the Boston prep school that removed all of its books in favour of digital learning). But what use is DAM elsewhere, such as in countries with a developing digital infrastructure?

As a starting point, a lot can be done right here to enhance cultural relations, and one project that I witnessed recently that was particularly exciting was the Islamic Heritage Project managed by Stephen Chapman of Harvard College Libraries, with the support of Prince Alwaleed Bin Talal of Saudi Arabia. This provides free access to Islamic manuscripts, maps &c. from Harvard's museums and libraries, and provided its own unique challenges as a project (metadata for Arabic and reverse script, for example). But, in a developing country, how much digital infrastructure is required for DAM to be useful?

Some regions get more attention than others, certainly, but in terms of establishing digital infrastructure the Afghanistan Digital Libraries is a collaborative project between USAID, the University of Arizona and Afghan higher education institutions to digitise and provide access to what they term 'unique Afghan records'. While previously I discussed some distinctions between the preservation of hard-copy and digital preservation, projects such as this (and the Tibetan Digital Library Project) are perhaps the best chance, overall, to preserve rare and vulnerable hard-copy.

It's encouraging that key standards for digital preservation have already been embraced internationally - OAIS has been an ISO standard since 2003 (ISO 14721:2003 OAIS) - and certainly within Europe, groups such as DigitalPreservationEurope exist to promote 'collaboration and synergies' amongst the various digital preservation efforts occurring throughout the EU. There's a great deal of very open dialogue going on, which makes sense, since we all share the same problems, and this bodes well for international digital preservation projects.


Tuesday, August 24, 2010

Some definitions, and distinctions

Very quickly the need has become apparent to take a first stab at some definitions of the terminology already peppering this blog. What, for example, is 'preservation'? This word alone, when used across different disciplines, has the capacity to induce instant confusion.

One example of this from my own experience is the digitisation process. This is a moment where hard-copy meets digital and two theories of preservation collide - one is 'traditional' preservation, the other is digital preservation. While the preservation of born-digital items seems inherently understood from its context (digital items need digital preservation), the problem comes when you apply the word (preservation) to collections of digitised materials themselves - is hard-copy 'preserved' by being digitised?

While the digital image data files of these digitised collections have entered the domain of preservation in the digital sense and become 'digital assets', the original hard-copy has not in fact been preserved as an object in its own right. Take the extreme example of digitised documents being discarded in the name of economy of space, only to find that the digital image data files have become corrupt, or the system required to interpret them has become obsolete. All evidence of the documents are then lost - they were never preserved. It seems important to make this distinction.

Certainly, stories of the ephemeral nature of digital assets are legion, and businesses are still forced to retain hard-copy. The problem of the previous example is as old as sound recording technology, well summarised in a Wired article on Digital Archaeology from 1993, when CD-ROM was becoming firmly established. Nevertheless, my impression at this early stage is that the digital asset manager is primarily concerned with born-digital material since, once a digital data file is created from hard-copy, that is the creation of a new asset (in addition to the original hard-copy) that enters the domain of digital preservation, just like any born-digital asset.

Once a digital data file is created, the term 'preservation' can then be comfortably applied within this understood digital context. For what that might then mean, I've just been introduced to OAIS (Open Archival Information System) designed by the Consultative Committee for Space Data Systems after the data migration problems that befell NASA after the Apollo Program (also mentioned in the Wired article). OAIS identifies two divisions of digital preservation: the preservation of digital data as it is migrated across media and across formats; and the preservation of access services to digital data as technology changes and software is ported (adapted) to new systems, including emulation. A pithy explanation of OAIS itself can be found here.

Saturday, August 21, 2010

Going Digital

I've just accepted a place on the UK's first course in digital asset management at King's College London. Until earlier this summer, I had been working in paper conservation (that is, repairing and preserving paper-based artefacts for museums, libraries and archives). Why go digital?

To begin, this is not so much a departure from traditional preservation as a complementary set of skills that also embrace the worlds of information technology and management. At the same time, while a foundation in preservation theory via conservation is most welcome, going digital represents a great leap, without doubt, and there are several reasons why I've chosen to do this.

I wasn't really aware of digital assets and their management in a cultural heritage context until I attended the Mellon Symposium in Conservation Science at Harvard Art Museums earlier this year, entitled Technical Conservation Issues of Time-Based Media. The keynote speaker, Pip Laurenson, Head of Time-based Media Conservation at Tate, put forward two simple facts: first, Tate (and other large modern art galleries) now acquire more time-based media than traditional media - that is, over 50% of their total acquisitions each year; second, there is currently no formal training to deal with the preservation demands of time-based media (for a definition of time-based media, and further discussion, see here). I've also been drawn to the ubiquitous nature of digital assets and the wide potential for the application of the skills associated with their management and preservation, and my next prompt came from a very different source.

For some time I've been interested in preservation concerns beyond the museum, gallery or library institution, specifically in relation to cultural heritage in developing countries. Such concerns took me to the Kennedy School of Government at Harvard where I had the pleasure of meeting Rory Stewart, then Director of the Carr Center for Human Rights Policy and founder of the Turquoise Mountain Project, and his colleague Gerard Russell who, following diplomatic service in Iraq and Afghanistan over the last seven years, has dedicated himself to promoting cultural understanding and interaction between the English-speaking public and the Arab/Islamic world. These conversations, in conjunction with discussions amongst conservators strongly engaged in the international cultural heritage and development arena, lead to a realisation of the limited application of traditional conservation to said arena, yet also clearly highlighted its potential as a foundation to be built upon in order to tackle wider preservation concerns.

These are just a couple of the reasons I have for venturing into digital preservation, to which I might briefly add an acknowledgement, on a less philosophical note, that digital asset management inevitably appears to meet current market demands. I now hope to use this opportunity to document a brand new formalised training process at King's that attempts to meet the growing demand for this skill set. Indeed, the first step is knowing exactly what that skill set should be, and the possibilities seem numerous. At the same time, I also want to explore the relationships between traditional and digital preservation, information technology and management practices, all the while keeping an eye on their potential implications for international cultural heritage and development concerns.