Digital preservation and digitization at Lund University (and elsewhere in Sweden)

Sigfrid Lundberg , Lund University Libraries

Revision:$Revision: $, Date:$Date: 2009/06/29 11:07:59 $, Editor:$Author: sigfrid $

Lecture notes

Notes to presentation held in Lund 7 June 2005.

Table of contents

Let me first intrduce myself!

My name is Sigfrid Lundberg, research engineer and computer programmer within the Univertity's Library Network.

Over the years I have been working on diverse matters:

In addition to these activities I participated in EU funded telematics projects, Desire, EUN, EUC, ETB.

Presentation (cont'd)

Concentrating on text

Around 1999 I became involved digitization and Text Encoding and Interchange (TEI), and I also realized that what I had done for years actually had a name -- text technology. Before that and after I've been involved in:

Much of these techniques are use in Humanities computing (e.g., creation digital critical editions of literary works), digitization, digital preservation and electronic publishing.

What do libraries do?

Libraries exists because they:

      The past          Now             The future
      -------------------+------------------------> Time

Persons who knows or have experienced something may write about it The results are published. The publication is data and is stored as data in books, journals and possibly on hard disks, CDs etc.

Any corporate body storing published information is a library for sharing whithin its constituency is a library.

What is an archive?

An archive is information collected by persons or corporate bodies to document a process. The process can be a person's life or a university's research or a company's business.

      The past          Now             The future
      -------------------+------------------------> Time

Why preservation?

Knowledge is stored in brains

Only human beings may aquire knowledge. They do so through a cognitive process called learning. Obviously, we may can only do that from information that were created in the past.

There is one exception: When we do research we gain knowledge that has never hitherto been stored as information.

We can only gain knowledge from information that has been preserved.

      The past          Now             The future
      -------------------+------------------------> Time

If we don't preserve, then all all knowledge would have been gained through original research.

It is cost effective way for the society to keep its information, be it publications or archival records, in institutions that collect, preserve, organize and provide access to information for their constituencies.

What is digital preservation?

You may engage in digital preservation for two reasons:

All preservation (be it digital or conventional) is based on human efforts. It requires software, storage servers, hard disks, tape backup etc, and a lot of computing. It is a question of workflows...

There exists a specification of an Open Archival Information Service, which is an ISO standard. We have not even tried to implement it here in Lund. It would eat resources. Links about OAIS:

What is going on in Sweden?

Major (industrial scale) digitization activities

What is going on in Sweden? (cont'd)

Digitization at libraris and archives

  • Waller project (UUB)
  • Evert Taube Archive (GUB)
  • S:t Laurentius Digital Manuscript Library (LUB)
  • Axel Oxenstierna, XML demo

What is going on in Sweden? (cont'd)

Preservation of objects that are borne digital

Archival Networks

S:t Laurentius Digital Manuscript Library

The service:, Laurentius

Properties of manuscripts

S:t Laurentius Digital Manuscript Library (cont'd)

A challenging problem

The entire catalogue in PDF. A fully indexed draft to a printed book.

More about Laurentius: Lundberg, Sigfrid (2002) Excursions along the border between metadata for resource discovery and for resource description

Ediffah -- An search engine archival finding aids


More info on the technical solution:

Electronic theses and dissertations

Two services, quite ordinary ones: Xerxes and Scripta

I will only discuss them from a digital preservation perspective. Typically an electronic publishing supports at least three functions:

Infrastructure for cataloging and publishing

Users are document authors and editorial staff (e.g., reviewers). Benefits from campuswide authentication system through LDAP. LDAP is also used for getting unique identification of authors (Lund University has more than five staff and students with the name Anders Nilsson).

Architecture of Xerxes' and Scripta's cataloging tool

Infrastructure for indexing and searching

Users are typically researches and students.

Architecture of Xerxes' and Scripta's search engine

Infrastructure for dissemination of metadata

Users are remote search engines, e.g., Google and Google Scholar and OAIster. Like Ediffah this system supports OAI

Other activities in Lund

Digitization of images, films and analog sounds and videos

I honestly don't know all units facing these problems. Neither do I know were these problems are addressed competently and where their not.

The Internet and its business models

We see a number of tendencies in the global economy.

Creative Commons and Open access is good, but...

Open Source Software, Open Access Media, Creative Commons, etc means to give away content, and earning your money from additional services (that's how Linus Torvalds became millionaire).

Preservation is an investment you normally cannot get a fast revenue from. What you may get is results from future research, or just patrons satisfied from reading a good book.

The question is: Who will pay preservation of things that are available for free on the Internet.