Thursday, October 29, 2009

Digital Archiving in National Libraries

According to Margaret Phillips, “A primary role of national libraries is to document the published output of their respective countries” (2005). This includes collecting books, pamphlets, maps, music, and newspapers. This is a large feat, especially for a large country such as Australia. The move to digital collections affects libraries all over the country, but it poses an important question to national libraries. What should national libraries collect and preserve? Should national libraries attempt to catalog all online content that is published by people of the country? According to Phillips, there are two approaches to collecting and preserving online content in national libraries (2005). The first is the whole domain or comprehensive approach, and the second is the selective approach. Many libraries are taking different approaches within the selective approach for building their archives. For example, the National Libraries of Denmark and Canada selectively archive static Web resources as well as print resources. Phillips defines static Web resources as those "that are like print publications and that do not change or contain interactive or dynamic elements" (2005). Australia's National Library selectively archives static Web resources as well as dynamic resources. Whole domain harvesting is an approach used by National Libraries of Sweden, Finland, and Norway (Phillips, 2005). Whole domain harvesting involves "using harvesting robots and a minimum of human intervention for identifying resources" (Phillips, 2005). The Bibliothèque nationale de France is involved with a project that combines selective archiving and whole domain harvesting (Phillips, 2005). Additionally, a thematic approach to archiving takes an in-depth look at a certain subject, such as September 11, 2001 (Phillips, 2005). National Libraries may also choose to archive material based on collaborative agreements with commercial publishers (Phillips, 2005).

According to Phillips, there are six main advantages to these selective approaches to archiving (2005):

* Each item in the archive is quality assessed and functional
* A gathering schedule can be individually tailored for each selected title, taking into account its publication schedule or the frequency with which the Web site changes
* Each item in the archive can be fully catalogued and therefore can become part of the national bibliography.
* Each item in the archive can be made accessible via the Web to readers immediately because permission to do so can be negotiated with publishers.
* The "significant properties" of individual resources and classes of resources within the archive can be analyzed and determined.
* Sites that are inaccessible to harvesting robots can be identified and archived using other methods

Disadvantages of the selective approach to archiving include the subjectivity of selection, high cost, and loss of contextual meaning (Phillips, 2005).

Whole domain harvesting is a good idea in theory but in reality, it is still far from ideal. Whole domain harvests are run periodically because they demand so much computer space and time, so any material that comes into being in the interim will be missed (Phillips, 2005). Quality control is almost impossible with such a huge wealth of information being gathered. Many commercial Web sites which contain important digital heritage may employ passwords, which will prevent a robot from gathering information from that site (Phillips, 2005).

The National Library of Australia has implemented PANDORA, Australia's Web archive. The need for archiving online publications became apparent, so in 1996, steps were made to begin archiving this information (Phillips, 2005). The criteria for selection was agreed upon, and collection began. After seven years of collection, the selection guidelines were reviewed to see if they were flexible enough or if they needed to be changed (Phillips, 2005). The assessment indicated that there were resources not being collected that contained important information. The assessment also identified gaps in the collection (Phillips, 2005). Once this information was gathered, the selection guidelines changed to focus more on six specific categories, including government publications, publications of educational institutions, e-journals, and conference proceedings (Phillips, 2005). Several resources that had not been included in the past were still excluded from archiving, including datasets, online daily newspapers, news sites, bulletin boards, and blogs (Phillips, 2005).

PANDORA is a good example of selective archiving. The National Library of Australia has realized that it would be impossible to archive every piece of information pertaining to Australia, so they implemented clear guidelines for selection and then evaluated them after several years. This has allowed the National Library of Australia to create a comprehensive yet edited collection for the good of the country.

What do you think is the best approach to digital archiving?
Do you think that PANDORA should archive sources not currently archived (such as online daily newspapers, datasets, and blogs)? Why or why not?

Phillips, M E (Summer 2005). What should we preserve? The question for heritage libraries in a digital world. Library Trends. , 54, 1. p.57(15). Retrieved October 21, 2009, from Academic OneFile via Gale:


Amy Smola said...

It's interesting that you mention PANDORA. I recently came across a reference to this, and I didn't know what it was. Now I do. You ask our opinion on digitizing things like newspapers, datasets and blogs. My thought about the newspapers would be yes. As a national database of Austrailia, I do believe that newspapers should be preserved electronically. I think back to when I was a kid and pulling microfishe rolls from various newspapers. I think the newspapers were a few prominent ones like the New York Times or something like that. But there they were... entire newspaper pages that you could view on that big screen and even print out. Now that things like this can be digitized so that libraries don't need those boxes of microfiche rolls, I definitely think newspapers should be preserved. If a national library project can scan and perserve them, why not do it? Local libraries could link to the national collections so that each library would not have to re-invent the wheel and digitize their own community's newspapers. Newspapers are daily snapshots of our world... current events, weather trends, stock market info, the obituaries, and on and on. Newspapers contain a wealth of current information from the very day something happened. As time goes on, these facts and details may be lost as people edit important articles in the future and forget ones that may seem not so important. Hey... consider every now and then a newspaper from London in 1888 turns up with a contemporary story about Jack the Ripper. A story written the day after a murder may have an interview with a man who claims to have heard a scream at 10pm the preceding night. Ooohh... what if this eyewitness account suddenly challenges previous held assumptions that the murder occurred at a much different time? This could be a major clue and could drastically alter the opinion of a particular individual as a possible culprit. Had this particular newspaper not been kept and discovered 120+ years later, this fact would have vanished into history. So, too, still with newspaper articles. Facts are most fresh right after they happen. So wouldn't an article written the very day of an event be a better resource than someone writing about that same event 5 years later? I believe daily newspapers contain information and facts that should be preserved, and if space is not an issue I believe they all should be digitized for future reference.

Steph said...

I agree with Amy. I believe it is important to digitize newspapers. I also think they are an important snapshot of history, and have local value, too. My mom spent quite a bit of time looking through old papers, using microfilm, when she was investigating our family's genealogy and searching for wedding announcements, obituaries, etc. As for blogs, I am still undecided. I definitely do not think all blogs need to be digitized, and I question the criteria that could be established to decide which blogs to preserve.