California Digital Newspaper Collection

The California Digital Newspaper Collection (CDNC) is a freely-available archive of digitized California newspapers, accessible at the project's web site. It contains millions of pages, ranging from the first publication in 1846 to contemporary issues. The project is part of the Center for Bibliographical Studies and Research at the University of California Riverside.

History

The CBSR was one of six initial participants in the National Digital Newspaper Program,[1] a partnership between the Library of Congress and the National Endowment for the Humanities. Between 2005 and 2011 the CBSR received 3 2-year grants and contributed around 300,000 pages to Chronicling America,[2] the public face of NDNP. Titles submitted included the San Francisco Call, Los Angeles Daily Herald, Amador Ledger, and Imperial Press. In 2015 the CBSR received a fourth NDNP grant. Between 2015 and 2017 the project will contribute another 100,000 pages of gold rush era and foreign language newspapers.

The CDNC was officially launched in 2007, and contained the initial 100,000 pages produced for NDNP from 2005 to 2007 and another 50,000 pages created with support from the U.S. Institute of Museum and Library Services under the provisions of the Library Services and Technology Act, administered in California by the State Librarian (LSTA). All content contributed to NDNP is also hosted in the CDNC, with important difference noted below in Digitization. Between 2007 and 2013 the CDNC digitized roughly 300,000 pages through the LSTA program administered by the California State Library. In 2014 the project announced a 5-Year Program, supported by LSTA, to digitize one title per county up through 1923.

In 2010 the CDNC started a "born digital" project to collect and host contemporary PDFs from newspaper publishers. Roughly a dozen publishers have or do participate in the project. See http://cdnc.ucr.edu/site/uploading.html for more information.

Digitization

The CDNC follows standards established by NDNP. Microfilm or newsprint is scanned to create TIFF images. Whenever possible master negative film is used. The CBSR manages an archive of roughly 100,000 reels of negative film, the California Newspaper Microfilm Archive.[3] When negative film isn't available positive can be used, but image quality and OCR will not be as good.

The TIFF images are then processed or "digitized" to create derivative files, including a JP2, PDF, and METS/ALTO XML for each page.

Unlike NDNP, the CDNC has traditionally digitized to article-level rather than just page-level. Individual "segments" on a page—articles, illustrations, advertisements, etc.--are identified during digitization and can be retrieved by the researcher. For an illustration of the difference between page- and article-level, compare the San Francisco Call in the CDNC to the same title in Chronicling America.

Recently the CDNC has begun digitizing some titles to page-level, but most are still article-level. The main advantage of page-level is lower cost when done in an automated fashion, without human input.

Papers covered

References

External links

This article is issued from Wikipedia - version of the 10/5/2016. The text is available under the Creative Commons Attribution/Share Alike but additional terms may apply for the media files.