According to Margaret Phillips, “A primary role of national libraries is to document the published output of their respective countries” (2005). This includes collecting books, pamphlets, maps, music, and newspapers. This is a large feat, especially for a large country such as Australia. The move to digital collections affects libraries all over the country, but it poses an important question to national libraries. What should national libraries collect and preserve? Should national libraries attempt to catalog all online content that is published by people of the country? According to Phillips, there are two approaches to collecting and preserving online content in national libraries (2005). The first is the whole domain or comprehensive approach, and the second is the selective approach. Many libraries are taking different approaches within the selective approach for building their archives. For example, the National Libraries of Denmark and Canada selectively archive static Web resources as well as print resources. Phillips defines static Web resources as those "that are like print publications and that do not change or contain interactive or dynamic elements" (2005). Australia's National Library selectively archives static Web resources as well as dynamic resources. Whole domain harvesting is an approach used by National Libraries of Sweden, Finland, and Norway (Phillips, 2005). Whole domain harvesting involves "using harvesting robots and a minimum of human intervention for identifying resources" (Phillips, 2005). The Bibliothèque nationale de France is involved with a project that combines selective archiving and whole domain harvesting (Phillips, 2005). Additionally, a thematic approach to archiving takes an in-depth look at a certain subject, such as September 11, 2001 (Phillips, 2005). National Libraries may also choose to archive material based on collaborative agreements with commercial publishers (Phillips, 2005).
According to Phillips, there are six main advantages to these selective approaches to archiving (2005):
* Each item in the archive is quality assessed and functional
* A gathering schedule can be individually tailored for each selected title, taking into account its publication schedule or the frequency with which the Web site changes
* Each item in the archive can be fully catalogued and therefore can become part of the national bibliography.
* Each item in the archive can be made accessible via the Web to readers immediately because permission to do so can be negotiated with publishers.
* The "significant properties" of individual resources and classes of resources within the archive can be analyzed and determined.
* Sites that are inaccessible to harvesting robots can be identified and archived using other methods
Disadvantages of the selective approach to archiving include the subjectivity of selection, high cost, and loss of contextual meaning (Phillips, 2005).
Whole domain harvesting is a good idea in theory but in reality, it is still far from ideal. Whole domain harvests are run periodically because they demand so much computer space and time, so any material that comes into being in the interim will be missed (Phillips, 2005). Quality control is almost impossible with such a huge wealth of information being gathered. Many commercial Web sites which contain important digital heritage may employ passwords, which will prevent a robot from gathering information from that site (Phillips, 2005).
The National Library of Australia has implemented PANDORA, Australia's Web archive. The need for archiving online publications became apparent, so in 1996, steps were made to begin archiving this information (Phillips, 2005). The criteria for selection was agreed upon, and collection began. After seven years of collection, the selection guidelines were reviewed to see if they were flexible enough or if they needed to be changed (Phillips, 2005). The assessment indicated that there were resources not being collected that contained important information. The assessment also identified gaps in the collection (Phillips, 2005). Once this information was gathered, the selection guidelines changed to focus more on six specific categories, including government publications, publications of educational institutions, e-journals, and conference proceedings (Phillips, 2005). Several resources that had not been included in the past were still excluded from archiving, including datasets, online daily newspapers, news sites, bulletin boards, and blogs (Phillips, 2005).
PANDORA is a good example of selective archiving. The National Library of Australia has realized that it would be impossible to archive every piece of information pertaining to Australia, so they implemented clear guidelines for selection and then evaluated them after several years. This has allowed the National Library of Australia to create a comprehensive yet edited collection for the good of the country.
What do you think is the best approach to digital archiving?
Do you think that PANDORA should archive sources not currently archived (such as online daily newspapers, datasets, and blogs)? Why or why not?
Phillips, M E (Summer 2005). What should we preserve? The question for heritage libraries in a digital world. Library Trends. , 54, 1. p.57(15). Retrieved October 21, 2009, from Academic OneFile via Gale: