This morning, Dave Thompson, Digital Curator at the Wellcome Library, gave a talk on web archiving, following on from a presentation he made at the Future of Medical History Conference at UCL in July. Dave introduced web archiving at Wellcome in 2004, where the numbers of users engaging with digital collections has been on the rise ever since. Below is a brief overview of his observations on the topic.
While use of the web is increasing, it is perhaps the most transient digital medium - it remains largely unregulated and facilitates the rapid publishing of material by anyone, yet similarly permits the instant alteration and loss of that material. The UK Web Archive was started by JISC and British Library, with the Wellcome Library, in order to preserve websites that 'reflect the diversity of lives, interests and activities throughout the UK'. One of several such preservation efforts taking place around the world, the site allows access to retired sites (for example, set up for major projects or world events, such as the Asian Tsunami) or old versions of existing sites.
So why bother? Taken within the context of the 'future of medical history' at the Wellcome, collections growth without digital assets would be limited. As an example, it may be noted that scientists are increasingly beginning to elucidate their work via open digital means such as Twitter, Flickr and so on; certainly within the domain of the arts and foreign policy, it is not difficult to find, say, film directors and diplomats, respectively, lending their expertise to current issues online. In particular, the Wellcome Library has focused on preserving online material from smaller organisations who simply do not have the resources to preserve their own websites, and one such project gave rise to the Personal Experiences of Illness collection in 2005. The existence of this immediate, personal and unmediated material is undoubtedly one of the great strengths of the web.
The legislation in the UK regarding web archiving has been well summarised here, and it remains a grey area. While bureaucracy may arbitrarily preserve some material, publishing cycles within institutions and other stake-holder mediation can cause content to be lost just as quickly. There are also, of course, costs involved in preservation, and just to design and maintain a website. At the Wellcome Library, web content is preserved using WARC (Web ARChive), which stores collected data in large aggregate files, where aggregated data objects can be identified and extracted without the use of a companion index file; this means that if one data component becomes obsolete, it can then be isolated and upgraded for compatibility. Due to the legal, ethical and technical complexity of this topic, I'll doubtless be returning to it in the near future.