Wednesday, September 29, 2010

Data about data...

Our first real class conversation was nicely framed by a comment that Simon Tanner had made the day before during his introduction to the course: in digital systems it's not so much about what you do, but why you do it - digital systems are receptive to logic, but we don't do things because they're logical, we do things because we're human.

Welcome, then, to the metadata 'universe'. For a visual representation to prove that 'universe' is really no exaggeration, Jenn Riley from the Indiana University Digital Library Program has created this wonder here. The information professionals commenting on metadata issues online are probably the first I've known to plump for statements that amount to 'people are terrible at communication', but perhaps they have good reason.

Since metadata is describing data, we have the human ingredient. It becomes apparent early on that a discussion of metadata - something of practical use - has the potential to descend all too easily into a discussion of semantics, which, while interesting from a philosophical perspective, elevates the existing challenges of getting to grips with metadata towards something Sisyphean.

This concern, however, is balanced with the genuine value of an intellectual (i.e. human) hierarchy applied to a collection with digital metadata by someone familiar with the collection and, hopefully, its users. Perhaps inevitably, Google also entered the conversation - doesn't Google actually take care of all our searching needs anyway?

This question does help to highlight the fact that the intellectual work of various information communities on the myriad metadata out there isn't necessarily so confounding after all, from a collection-by-collection perspective. Perhaps Google is more intuitive, perhaps it tackles more data - it's a great tool with a lot of potential, but it also has limits on what it can do.

UC Berkeley's Geoffry Nunberg provides a rather good deconstruction of some of the recent problems with Google's metadata (in the case of Google Books, specifically, where one would think metadata might be a priority) here. A quick check of the problems that he describes shows that a number still remain. It's at this point that you might feel that the complexity and compartmentalisation of the metadata universe aren't such a bad thing after all.

Saturday, September 25, 2010

Digit Ass Man

I've now been registered and inducted onto the new graduate course in digital asset management at King's College, also referred to as MADAM, or even, wonderfully, DIGIT ASS MAN, as it appears on my ID card. One hopes that the quality of the course and its outcomes will be inversely proportional to the rather unfortunate manner in which its title may be abbreviated.

Having met the faculty responsible for the programme, I'm very confident that this will prove to be the case. I believe that this is the first formalised training on this subject anywhere in the world, and the process is necessarily going to be organic - I haven't experienced this form of openness amongst faculty, or freedom for self-direction as a student, in higher education before. I also calculate that we have a 1:1 ratio between faculty and students, which is going to make this a rich experience.

The course website describing its content can be found here, and it kicks off on Tuesday with the core component introducing DAM. I also expect to open with an optional module on metadata theory and practice. While currently there is a heavy emphasis on the digital, the course intends to touch on traditional archives theory, which seems important for reasons I've touched on previously (i.e. to be prepared to help institutions expand to digital). Finally, the King's environment is an exciting one, with the Centre for Computing in the Humanities churning out a large number of interesting and varied projects, and I'm intrigued to see that the Department of War Studies, amongst other policy departments, is just around the corner...

Sunday, September 19, 2010

The (perceived) need for digital preservation

I'm interested in the implications for digital infrastructure in developing countries, but what about right here in the UK? Facing two years of training in DAM, an obvious question has been: how prepared are the public, commercial and cultural sectors for managing and preserving digital material in the UK. If I'm honest, I'm curious to know what my job prospects are, and what sort of job I might be doing. I was therefore interested to find a document on the Planets website from last year that aimed to answer, in part, that very question: assessing whether organisations are prepared for digital preservation.

In brief, the findings identify that awareness of, and action on, digital collection concerns in various institutions seem to have increased in a manner corresponding to the growing prevalence of digital information in the world in general. That seems positive, but most of the survey respondents in this article are cultural institutions with a known digital profile, and many of them are national institutions in EU countries. Even then, the article shows that museums, for example, made up only 3% of respondents - we don't know the percentages for the various types of institution targeted or rate of return, so it's difficult to know whether museums deem digital collection management unimportant and thus have not responded to the questionnaire, or simply that very few museums were targeted by the survey. In either case, the 3% suggests that museums have a relatively low digital profile at the moment.

The question I'm looking to answer here is a difficult one, and was not the target of the article, but a few things can be inferred. The institutions surveyed were supposed to have an 'interest' in digital preservation, but it's unclear as to whether that interest has been declared by the institution itself, or inferred by the authors of the article - of these, the article shows that around 25% of institutions who probably ought to be practicing digital preservation in one form or another are currently making no attempt to do so. For institutions who lack resources, or even awareness, of digital preservation, they may need assistance from a third party - almost half of the respondents themselves already use one, so this seems to be an important growing market.

Before I signed up for the DAM course at King's, I discussed the training with HSBC Global Archives and the European Central Bank to see how applicable DAM would be to their operations (the commercial sector being outside of my professional experience so far). Their response was that, while they were looking to explore DAM, they would need a manager who had an understanding of traditional archives and library theory. Perhaps the most encouraging thing for the new graduate is the widespread interest in digital preservation in all quarters, but it doesn't seem that everyone is ready yet - while jobs in pure DAM certainly exist, many will need expertise that embraces both analogue and digital preservation to bring them up to date.

Sunday, September 12, 2010

Preservation and (business) management

I recently had the opportunity to meet and talk with Barclay Ogden, Head of the Preservation Department at the University of California Berkeley Library. His activities in the field of preservation have included the development of the CALIPR collection needs assessment tool, a free piece of web-based preservation survey software hosted on UC Berkeley's servers (it's availability and ease of use would make it suitable for implementation in almost any collection around the world). Barclay has been in the preservation business for forty years and, following the birth of random sampling in the 1970s to quantify preservation problems and needs assessment in the 1990s based on collection use and value principles, he has turned his attention to risk management.

His approach to risk management is a novel one in the field, as he has managed to collaborate with the second largest risk management consultancy in the US after approaching the University of California Office of Risk Management with his preservation concerns. This has resulted in positive attention from the UC administration, and that translates to funding for the UC Berkeley Library. In this economic climate, it's a great working model, but for anyone who's worked in a library or cultural institution, they will probably have encountered some cultural resistance to business models (and for anyone who's studied business, they will know that culture eats strategy, for lunch). Barclay has circumvented cultural opposition and the problems of emotive, experiential modes of risk management that can often be encountered in the cultural heritage community, going straight to the experts in rational, analytical risk management - arguably the group that can really influence outcomes at the UC Berkeley Library.

I can understand the resistance to business models. One of the problems with accreditation/certification in conservation, for example, and standards in general, is that cultural heritage professionals could potentially lose their individual operational flexibility in a sector where outcomes are often subjective and difficult to quantify. Under the current circumstances, though, business management models represent an area that requires serious and considered engagement in order to achieve a mutual understanding; I don't believe that learning this language has to result in losing one's culture. Business models, language and communication, and the adoption of standards, have the potential to give the cultural heritage sector a real leg to stand on.

Thursday, September 9, 2010

The digital insurgency

I've mentioned a few of the benefits of the web and the digital world in this blog, but here's a flip-side. On September 11, a pastor in Florida has decided to burn copies of the Quran - at this point, most people reading the news will have probably heard about it, and it's prompted direct discussions between Hamid Karzai, the Afghan president, and General David Petraeus, the US and NATO commander in Afghanistan. Their concerns are covered in an Al Jazeera article here, but they are worried that the Taliban will use images of the burning to incite opposition to the NATO mission in Afghanistan and fuel the insurgency there.

It's obvious how all of this is possible. The use of digital media and the web have allowed a small minority to promote their message, which will potentially have grave consequences far beyond Florida. From the initial publicity surrounding the event, the constitutional uproar in the US and finally the Taliban printing off the pictures for the Afghan public, it raises the question of whether 'digital' counter-insurgency might be a viable project. The potential for real cultural awareness and development via digital means seems to be there, but this necessarily raises some big questions regarding access (the 'digital divide') and education. For their part, extremist groups the world over have long been well established on the web.

Monday, September 6, 2010

Dynamics of the repisitory

'Going digital' doesn't seem to be a progressive movement in cultural institutions anymore, but rather a necessity. When even very focused collections, such as the Folger Shakespeare Library, are now holding digital collections, the question isn't whether an institution should go digital or not, but how they should do it.

Digital assets, whether born-digital or created through the digitisation process require management and the logical place to start would seem to be through a repository system, perhaps Tessella's SDB (Safety Deposit Box) developed with the UK National Archives. You have, after all, just doubled your collection by creating digital versions of analogue, hard-copy materials; by acquiring any digital material at all, you have at the very least added a significant additional conceptual dimension to an analogue archive.

Given the complexities surrounding digital collections, it's interesting to find that there is as yet no certification regarding their actual implementation and management. In 2002, a joint report by the Online Computer Library Center and Research Libraries Group concluded that there was a need "to develop a framework and process to support the certification of digital depositories". While certification soon arrived for a conceptual framework governing digital archives (ISO 14721:2003 OAIS), and repository systems like SDB now comply to that standard, there is essentially no agreed best practice for the management of digital collections.

This situation remains perhaps due to the complexities of the practical implementation of digital asset management protocols. A good example is the question of agreed formats (succinctly described by Dave Thompson in his article here), and tackling this problem forms the foundation for a viable digital acquisition policy. The fact is that appraisal of formats prior to acquisition is very much up to the individual institution: most libraries and digital asset managers prioritise intellectual content, while galleries and conservators would concern themselves with preserving hardware. A system like SDB cannot begin to answer such complex questions, which can border on the philosophical: is there loss of artistic integrity in migrating 16mm film to a digital medium? Who cares about the format as long as the information is there? What's the significance of converting Office Documents to PDF? It would seem that, even at the input stage, there are limitations to what a repository can do.

Wednesday, September 1, 2010

Web archiving at the Wellcome Library

This morning, Dave Thompson, Digital Curator at the Wellcome Library, gave a talk on web archiving, following on from a presentation he made at the Future of Medical History Conference at UCL in July. Dave introduced web archiving at Wellcome in 2004, where the numbers of users engaging with digital collections has been on the rise ever since. Below is a brief overview of his observations on the topic.

While use of the web is increasing, it is perhaps the most transient digital medium - it remains largely unregulated and facilitates the rapid publishing of material by anyone, yet similarly permits the instant alteration and loss of that material. The UK Web Archive was started by JISC and British Library, with the Wellcome Library, in order to preserve websites that 'reflect the diversity of lives, interests and activities throughout the UK'. One of several such preservation efforts taking place around the world, the site allows access to retired sites (for example, set up for major projects or world events, such as the Asian Tsunami) or old versions of existing sites.

So why bother? Taken within the context of the 'future of medical history' at the Wellcome, collections growth without digital assets would be limited. As an example, it may be noted that scientists are increasingly beginning to elucidate their work via open digital means such as Twitter, Flickr and so on; certainly within the domain of the arts and foreign policy, it is not difficult to find, say, film directors and diplomats, respectively, lending their expertise to current issues online. In particular, the Wellcome Library has focused on preserving online material from smaller organisations who simply do not have the resources to preserve their own websites, and one such project gave rise to the Personal Experiences of Illness collection in 2005. The existence of this immediate, personal and unmediated material is undoubtedly one of the great strengths of the web.

The legislation in the UK regarding web archiving has been well summarised here, and it remains a grey area. While bureaucracy may arbitrarily preserve some material, publishing cycles within institutions and other stake-holder mediation can cause content to be lost just as quickly. There are also, of course, costs involved in preservation, and just to design and maintain a website. At the Wellcome Library, web content is preserved using WARC (Web ARChive), which stores collected data in large aggregate files, where aggregated data objects can be identified and extracted without the use of a companion index file; this means that if one data component becomes obsolete, it can then be isolated and upgraded for compatibility. Due to the legal, ethical and technical complexity of this topic, I'll doubtless be returning to it in the near future.