Tag Archives: Newspaper digitization

Indiana Historic Newspaper Digitization, Project News

New Issues Available!

November 23, 2022 Justin Clark

Hey there Chroniclers!

We have several new titles from Greencastle, Milford, and Syracuse available for you through Hoosier State Chronicles, totaling 8,795 issues and 108,389 pages. This brings our total page count in Hoosier State Chronicles to 1,398,302!

Here are the papers and dates available:

Greencastle Banner Graphic (Daily): March 1, 1973 – June 30, 1992

Indiana Journal (Weekly): March 25 – November 25, 1937

Milford Mail-Journal (Weekly): January 4, 1962 – December 26, 1990

Syracuse Enterprise (Weekly): January 7, 1875 – December 30, 1875

Syracuse Register (Weekly): January 25, 1894 – January 6, 1898

Syracuse and Lake Wawasee Journal (Weekly): June 3, 1915 – April 26, 1923

Syracuse Journal (Weekly): January 7. 1908 – March 18, 1937

Syracuse Indiana Journal (Weekly): March 25 – November 25, 1937

Syracuse-Wawasee Journal (Weekly): December 10, 1937 – February 8, 1962

As always, happy searching!

Indiana Historic Newspaper Digitization, Project News

Nappanee Advance-News Available on Hoosier State Chronicles

June 23, 2021 Justin Clark

Greetings Chroniclers!

We are proud to announce that the Nappanee Advance-News is now available on Hoosier State Chronicles! The collection, spanning 1879-2018, comprises 7,155 issues and over 84,000 pages. You can check it out here.

As always, happy searching!

Indiana Historic Newspaper Digitization, Project News

Indianapolis Times Available on Hoosier State Chronicles!

June 11, 2020 Justin Clark

Greetings Chroniclers!

We are proud to announce that the Indianapolis Times is now available on Hoosier State Chronicles! The collection, spanning 1920-1952, comprises 10,283 issues and over 234,000 pages. The iconic daily newspaper, which ran for over fifty years, became known for its “crusading” journalism, exposing the collusion and corruption between the Indiana state government, governor Ed Jackson, and the Ku Klux Klan. The Times earned the Pulitzer Prize in 1928 for “exposing political corruption in Indiana, prosecuting the guilty and bringing about a more wholesome state of affairs in civil government.” You can check it out here.

As always, happy searching!

Indiana Historic Newspaper Digitization, Project News

New Batches Available!

June 8, 2020 Justin Clark

Hey there Chroniclers!

We have new batches available for you through Chronicling America: http://chroniclingamerica.loc.gov/.

These batches comprise 3,094 issues (totaling 56,038 pages) and brings our total page count in Chronicling America to 373,376!

Here are the papers and dates available:

Indianapolis Times (Daily): February 14, 1927-May 27, 1936

As always, happy searching!

This project has been assisted by a grant from the National Endowment for the Humanities.

Indiana Historic Newspaper Digitization, Project News

Sooth Your Inner Perfectionist: Fixing Searchable Text in Hoosier State Chronicles

Image September 12, 2019 Justin Clark

One of the most important features of Hoosier State Chronicles is the use of Optimal Character Recognition, or OCR. It is created by automated computer software that “finds” characters (letters, numbers, etc.) in digitized images and then transcribes them into searchable text. OCR allows users to search within the text of digitized newspapers for names, dates, or any other term that is relevant to their research. While OCR adds tremendous value to digitized materials, it doesn’t always correctly transcribe words or characters. You will frequently come across OCR that looks like the image below. (Click on images to enlarge them in separate tab.)

This is where our users come in. When you create a free account on Hoosier State Chronicles, you can actually edit the OCR text of a given page, which improves the functionality of our digitized newspapers. To date, our users have corrected over 315,000 lines of text; one user alone has corrected over 40,000 lines of text—more than anyone else! This blog post will show you how to create an account on Hoosier State Chronicles and how to correct OCR text in our digitized newspapers. With the tools provided here, we hope you will correct as many lines as possible. Who knows, you may even top the current record holder. Regardless of how many lines you correct, each one will make Hoosier State Chronicles a better platform for researchers delving into Indiana’s past through newspapers.

Creating a Free Account on Hoosier State Chronicles

Before you can edit OCR-generated text in Hoosier State Chronicles, you need to create a free account. To do this, click the “Register” link in the upper right-hand corner of the Hoosier State Chronicles homepage.

Fill in the required fields (email, display name, password) and click “go.” You’ll then receive an email to confirm your new account. Click the link in the email to confirm your account. You can now login via the account confirmation page and you’re ready to go!

OCR Text Correction

To correct OCR text, you can choose any issue or page you’d like. In this blog, we’ll work on the issue shown earlier, the February 1, 1916 edition of the South Bend News-Times. Choose a page of the issue either by clicking on the image itself or the page link on the left hand side. Once you’ve done that, you’ll see a “Correct this text” link; the text correction feature is accessed by clicking that link when viewing section text. This feature is split into two parts: the right side shows the page images that make up the document, and the left side is used for editing the lines of text.

When you move over the page images on the right, sections of the page will be highlighted. You can change this view by dragging with the mouse, or zoom in/out using the buttons above the images on the right-hand side. Clicking a highlighted section will select it and generate a form for editing that specific section on the left-hand side of the page.

You can now correct the text line by line. A red box is displayed on the right-hand side to help you determine what text should be included in the line on the left-hand side. Once you have finished correcting the text, click “Save.” The changes you make will take effect immediately. Alternatively, clicking the “Cancel” button will discard any unsaved changes you have made.

You can then make further corrections to the same block, move onto the next block by clicking the “Next” button, select another block in the right-hand side, or exit the text correction view by clicking the “Return to viewing mode” link. Clicking “Save & exit” instead of “Save” will save the changes and automatically return you to the normal viewing mode.

While our text correction feature is pretty robust, it has one limitation that we hope to change in the future. Currently, you can only edit existing fields generated by OCR; it doesn’t allow for the creation of new text fields. Even though this is a limitation, the OCR fields on our newspapers are fairly exhaustive and still give us substantial editing abilities.

Here’s another useful tip: many web browsers include spell-checking functionality and this can assist with your text correction by identifying misspelled words. If your web browser does not have this functionality, it’s likely there is a spell-checking add-on available (see your web browser’s help for information on how to install add-ons).

Now armed with the knowledge of text editing on Hoosier State Chronicles, you can improve the quality of our digital newspaper collection. Happy editing! If you have any other follow-up questions or concerns, please contact Justin Clark, Indiana State Library’s Digital Initiatives Director, via email at jusclark@library.in.gov.

Thanks to ISL’s Brittany Kropf for the blog’s title.

Indiana Historic Newspaper Digitization, Project News

New Batches Available!

September 10, 2019 Justin Clark

Hey there Chroniclers!

We have new batches available for you through Chronicling America: http://chroniclingamerica.loc.gov/.

These batches comprise 1,447 issues (totaling 20,734 pages) and brings our total page count in Chronicling America to 373,376!

Here are the papers and dates available:

Indiana Daily Times (Daily): June 14, 1922 – June 24, 1922

Indianapolis Times (Daily): June 26, 1922 – February 12, 1927

As always, happy searching!

This project has been assisted by a grant from the National Endowment for the Humanities.

Indiana Historic Newspaper Digitization, Newspaper histories, Project News

Digitizing Newspapers in Indiana: Hoosier State Chronicles

July 30, 2019 Brock Stafford

Newspapers are an essential historical resource for researchers, journalists, and genealogists by capturing the lives and events of individuals in a particular area throughout the years as well as reporting national news. However, even under the best climate and preservation circumstances, the longevity of newspapers is hindered by the relatively short lifespan of newsprint, a thinner and lower quality of paper. One solution in the past was the use of microfilm or microforms. According to Managing Microforms in the Digital Age from the American Library Association, “microfilm has been used since the 1940s for the long-term storage of newspaper content because the medium preserves file integrity, maintains the proper sequence of the data, and discourages theft.”[i] Libraries and historical organizations have used these tools for years, but even microfilm has limitations. It takes up a great deal of space, is expensive to produce, and often requires on-site access.

Over the past twenty years, institutions have shifted their focus from microfilm to digital formats. To aid this transition, the Library of Congress, with funding by the National Endowment for the Humanities (NEH), executed a nationwide newspaper project from 1982 to 2011 called the United States Newspaper Program, which cataloged and collected newspapers nationwide. However, in 2005, the Library of Congress and NEH formed the National Digital Newspaper Program (NDNP) and its digital newspaper database, Chronicling America, which offers free access to digitized historic newspapers from across the country via partnerships with statewide organizations.[ii] Indiana’s largest collection of digitized newspapers are housed within the Indiana State Library’s own database, Hoosier State Chronicles.

As a project, Hoosier State Chronicles focused on digitizing newspapers at the state and local levels- sometimes through the NDNP or institutional partners, but often by partnering with groups endeavoring to save their local papers. The efforts of these smaller organizations have been hindered by the lack of information about how to begin such a process, as well as securing the necessary resources to handle storage, digitization costs, and labor. This blog provides an introduction to the entire process of how newspapers are selected, organized, digitized, and publicly shared through Hoosier State Chronicles. To begin, let us start with the formation of Hoosier State Chronicles and its collection of digitized newspapers.

OUR HISTORY AND COLLECTION

Indiana’s largest public repository of microfilmed newspapers is managed at the Indiana State Library and contains over 3,000 titles. In 2011, the Indiana State Library, Indiana Historical Bureau, and Indiana Historical Society collaborated on the first grant for Chronicling America, which digitized over 100,000 pages of Indiana newspapers. After the initial two-year grant cycle, the Indiana State Library and Indiana Historical Bureau, (now part of the Indiana State Library,) took over future efforts to digitize Indiana papers, eventually creating the Hoosier State Chronicles website in 2015 and receiving three more NDNP grants for digitizing newspapers. This included collaborations with Indiana colleges and universities to digitize partial collections, as well as partnerships with community organizations to digitize local papers through grants.

Today, Hoosier State Chronicles has a collection of over 950,000 pages and 124,000 issues, ranging from pre-statehood (The Indiana Gazette, 1804) to contemporary newspapers (The Muncie Gazette, 2011). The Indianapolis Recorder contains the longest run in the collection with 96 years of newspapers, but because it was a weekly paper, the whole run only contains around 5,000 issues. The largest number of issues for a single newspaper belongs to the Indianapolis News, with over 12,304 issues over 38 years, though The Daily Banner from Greencastle comes in a close second with 10,649 issues spread over 68 years.

An important element of Hoosier State Chronicles is an effort to digitize newspapers across all of Indiana. Of the state’s 92 counties, Hoosier State Chronicles contains newspapers from 54. This is not to say every county in our collection offers an equal number of newspapers or pages. The largest county in our collection by both number of newspapers and pages is easily Marion County, with 25 newspapers and over 43,000 issues. And the smallest? Posey County’s New-Harmony and Nashoba Gazette, or, Free Enquirer with one solitary issue. Does this mean that the counties with lower representation in Chronicles are less important? By no means! Limitations in access to historic newspapers, financial resources, or the quality of the papers have hindered our efforts to share titles from every area in the state. However, smaller or scattered issues may come to us as a part of a community effort to preserve some part of their history digitally. If even one newspaper represents a unique region, time-period, or subject, we absolutely want it to be a part of our collection.

Our collection covers a broad range of eras in Indiana history. The oldest newspapers in our collection begin prior to statehood in 1804 with Vincennes’ Indiana Gazette, the earliest newspaper in the state, as well as its successor, the Western Sun. Two areas of strength for the collection are pre-Civil War and late 1800s newspapers, including early runs of the Indianapolis News, Indianapolis Journal, Indiana State Sentinel, Crawfordsville Daily Journal, and several in Terre Haute and Evansville. In the early 1900s, titles like the Richmond Palladium and Hammond Times provide terrific materials from eastern and northwest Indiana. Greencastle is also an area with multiple papers during these eras, particularly The Daily Banner and associated papers. The latest title in our collection is that of the Muncie Times in 2011, giving us 207 years of collections to share.

Another facet of our newspaper collection is the variety of materials in the collection. Politically, the collection displays contrasting perspectives, with newspapers supporting Republicans and Democrats, Whigs and Socialists. These feature both local and national news, often sharing the statewide perspectives of several parties. In regards to ethnic and racial diversity, we still have a long way to go. As mentioned previously, The Indianapolis Recorder, an African American newspaper, is the longest run in our collection. Additionally, the Evansville Argus and Muncie Times also share African American culture in Indiana throughout the late 30s-early 40s and the 1990s-early 2010s, respectively. Another long run of ethnic and cultural newspapers is the Jewish Post, later called The Indiana Jewish Post & Opinion, with issues from 1933 until 2005. Finally, the Indiana Tribüne has the distinction of being both the only predominantly-German newspaper and the only foreign language newspaper in Hoosier State Chronicles.

While every newspaper occasionally offers controversial news, Hoosier State Chronicles contains one newspaper that is especially difficult for modern readers. The Fiery Cross, a Ku Klux Klan newspaper out of Indianapolis, was published during the early 1920s. Despite its nature as an official newspaper of a hate group, it nevertheless provides insights to the rise of the organization during the 1920s, when they gained immense political power. It also highlights both the explicit and subtle racism and cultural biases of the Klan, particularly against African American, Jewish, Catholic, and immigrant individuals and groups.

One newspaper not included in this list, but that is coming soon to Hoosier State Chronicles is the Indianapolis Times. The Times was an influential newspaper from the 1920s through the 1960s, whose exposure of the Ku Klux Klan’s influence on Indiana politics won them the Pulitzer Prize for journalism in 1928. They also covered other social issues like corruption in the prison system during the 1930s as well as inadequate care in the mental health-care system and corruption in state road projects in the 1950s.[iii] We are currently digitizing a large portion of the newspaper in two steps. First, 1922 through 1936 is being digitized through a NEH-funded partnership with the Library of Congress Chronicling America project, where these resources will be shared. Later issues between 1936 and the early 1950s are currently being digitized through a partnership with Indiana University-Purdue University Indianapolis and a grant from the Central Indiana Community Foundation. Once completed, close to thirty years of this daily newspaper will be available on Hoosier State Chronicles.

DIGITIZING PAPERS: SELECTION

Selecting newspapers can be challenging due to several factors. When assessing where our collection needs to grow, meeting community needs is first and foremost to the process. For the past eight years, Chronicling America and the Library of Congress assisted Hoosier State Chronicles through a NEH grant to digitize nearly fifty newspapers. Yet, sometimes the desire to digitize Indiana newspapers comes from communities. We assist them through the process of securing grants, selecting vendors, and creating appropriate digital resources that can be added to Hoosier State Chronicles. [iv]

Next comes determining what newspapers are readily available for scanning and processing. Oftentimes, this comes from the collection at the Indiana State Library, with over 3,000 newspapers from the state available on microfilm. Using microfilmed reels (1^st/2^nd generation negative master reels or 2^nd generation positive service reels) makes processing faster and the materials easier to ship. However, some newspapers have limited availability due to scarcity of service copies or the lack of original master reels. Creating new-microfilm copies can be difficult due to few companies offering the service at a manageable cost.

Though we may have microfilmed copies of newspapers in the State Library, it does not necessarily mean all are available for digitization. First and foremost, copyright restrictions limit which newspapers are candidates. Justin Clark, former Project Manager for Hoosier State Chronicles, wrote an extensive blog on the subject last year:

Have you ever wondered why the vast majority of NDNP’s content, and most digitized newspaper content, ends around 1923? It’s for a very simple reason: all works published in the United States before 1923 are in the public domain. No copyright research is necessary for this material; it’s free and clear for you to use. However, NDNP announced in 2016 that it has expanded its date range for newspaper titles, from 1836-1922 to 1690-1963. Thus, post-1923 works are in the public domain if a copyright claim was never filed from 1923 through 1977 or if the copyright was never renewed from 1923 through 1963.

This means that more recent newspapers may be wholly or partially unavailable due to copyright concerns, including advertisements or cartoons that could fall under intellectual property laws. That is why only three newspapers appear in our collection after 1971: the Indianapolis Recorder, the Jewish Post and Opinion, and the Muncie Times. These papers are available in Hoosier State Chronicles with the permission of the newspapers’ owners.

However, even newspapers that fall outside the copyright permissions may have other restrictions. Some newspapers have been sold or given to for-profit organizations for digitization or distribution, giving them exclusive access for digital distribution as long as the copyright is in place. Local communities who digitize through for-profit companies often gain access to the files in perpetuity, but at the detriment to those outside of the community who must pay for the digital version through a subscription. The cost of subscription, as well as restrictions on use, limits the average consumer from being able to view these for research or genealogy. Oftentimes, they are marketed as subscriptions to libraries or other organizations for popular use. Hoosier State Chronicles, Chronicling America, and other organizations involved with the NDNP offer newspapers in their collections for free to the public, giving alternatives to researchers, the public, and local communities.[v]

The last two concerns are intertwined: cost and time. Digitization can be a lengthy process, often taking months or years for larger collections. We will cover more in the next section, but the hours required to create a high-quality digital copy may be beyond the resources of smaller organizations. Additionally, the various costs involved with the acquisition, shipping, scanning, processing, and completing a run of newspapers may be daunting, but finding programs and grants to help relieve the burden is often a major part of starting such a program.

DIGITIZING NEWSPAPERS: PROCESSING

Example of an original document scan and a grayscale edited copy,
which makes the text clearer and more legible. Image Credit: Jill Black, ISL

Once a newspaper is selected and deemed eligible for digitization with no restrictions, the process of assessing the collection can begin. The initial process often involves cataloging each newspaper issue to verify its condition, making sure all pages are included and duplicates are noted, sorting to make sure all images are in order, notating any errors in the original print run, and marking flaws in the microfilm. This step can take months to complete in order to provide a thorough template for individuals digitizing the information and adding metadata (the data that organizes and makes the pages and newspapers searchable), as well as keeping meticulous records to assure everything leaving can be accounted for when it returns.

There are several potential options for the digitization process, and many of these depend on the size and number of reels for the newspaper. If the number of newspapers is small enough, or in a physical medium, it may be handled by a local or state agency like the Indiana State Library, who have on-site digital scanning capabilities. However, for larger runs of newspapers, outside companies will likely be required to handle both the digitization and metadata. While there are many options for vendors, the quality requirements, size of the order, and cost may dictate which vendor to go with.

While the scale of work may vary, the system of digitizing large and small projects is very similar. The images are photographed by a high-quality digital scanner that scans the whole document, captures the fine details, and avoids capturing text bleeding through from the other side. From there, the images will be modified for readability, removing flaws and cropping out extraneous space. Files are usually saved in multiple formats for different uses: TIFF files are the highest quality and provide the archival copy, but are extremely large; JPG or JPG2 files provide usable quality copies at a lower resolution and size than TIFF files; and PDF files, which can vary in quality and size, can be downloaded by the public.

Metadata creation is distinctive from the digital scanning process, and while both systems need to work collaboratively, each could be performed by separate vendors. Metadata is the “data about your data” that gives images their descriptions, allows them to be easily sorted, and provides an order and structure to the files. If you are not familiar with metadata, think about how newspapers are numbered. Each issue of a newspaper has a volume number, an edition and a date that gives you a newspaper’s order of publication. Within each issue, page numbers also keep the newspaper in sequential order. All of these numbers are points of metadata that help us sort and organize the newspaper on a daily basis. They are also points that a computer system needs to know to organize the information when putting the files in order and allowing them to be searched and indexed. XML files act as the directory for metadata to be able to sort these files (see image on right.)

Another aspect of metadata for newspapers is making sure the text in the body of the newspaper is readable and searchable. Thankfully, one tool that makes this process easier is Optimal Character Recognition software, or OCR. OCR scans the printed pages in the images, translates them to text, and allows that text to be searched. Not only does this make the newspapers much easier to use, but it also adds a rough transcription of the pages (see image below).

Example of text transcribed using OCR on the Hoosier State Chronicles website.

Unfortunately, OCR is not perfect. The system works best when text is in standard fonts, in straight lines and columns, contains no illustrations, and is relatively the same size. As you may guess, this is rarely the case, particularly in modern or larger newspapers that contain advertisements, comics, or unusual text fonts. These can also be caused by the condition of the documents when they are scanned or the contrast of images. This occasionally results in gibberish translations or incorrect transcriptions from items the software recognizes as text (like an image). Still, like most technology, the systems improve as time goes on, and OCR is an essential part of making the information in newspapers more accessible.

This image is an example of good OCR. The majority of the text is
recognized (blue highlighted areas) and displays distinctive paragraphs. Image credit: Jill Black, ISL

The blue areas on this image is what the OCR recognizes as spots which contain “text” for the purposes of character recognition. Errors like this are common with illustrations.

Without metadata, digital newspapers are nothing more than images. Metadata orders these images to replicate the experience of reading a newspaper while adding searchable information. The process of adding metadata requires a team with keen eyes to monitor the organization and placement of files during the digitization process, specialized technology that accurately recognizes text, and maintaining the image quality of every single newspaper.

DIGITIZING NEWSPAPERS: REVIEW AND UPLOADING

The process of creating a digital copy and adding layers of metadata can take the same amount of time as the initial review of the collection. Yet, after these are completed, the individual agencies who accept these digital copies must review as much as they can to assure that the highest standards are maintained. If this is done internally, the control process may be easily assured by spot checking the creation process. However, on a larger scale where vendors are utilized, checking to make sure each batch, or group, of digitized newspapers is correct as soon as they are available means you can request corrections before they return the microfilm.

What kind of issues come up? Sometimes the scanning is not at the right quality or resolution, which necessitates a rescan. Maybe the dates, page numbers, or page orders are incorrect in the metadata and the information needs to be reorganized or edited. Occasionally, missing pages or issues that should be there need to be tracked down between the original film reels, the digitized files, and the metadata files. This is why it is important to review and revise everything in smaller groups, or batches, so the process of digitizing, adding metadata, and reviewing the completed material can take place simultaneously. Locally saved materials can be revised as you go, but larger-scale batches may require a remote digital transfer before you begin, or physically shipping off a hard drive.

Maintaining a digital collection of any kind, with thousands of individual newspapers saved in multiple formats, means investing in both external hard drives and backup drives. For example, our current digitization project with the Library of Congress contains portions of the Indianapolis Journal, The Daily Times, and the Indianapolis Times, which collectively require roughly eleven external hard drives and nearly seven terabytes of storage. To make sure everyone who needs these materials has them, we often have three copies: one on an external hard-drive that is shipped to the Library of Congress, one back-up copy on our local computer system for immediate access, and one copy maintained on our website. All three have associated costs, but it is good practice to maintain each for future use.

Hard drives with digitized newspapers for the NDNP are shipped off to the Library of Congress for approval.

Finally, after all batches undergo quality review of their images and metadata, revisions are completed, and the batches are ready, they are sent to the appropriate locations. For the newspapers that are part of the Chronicling America project, they are sent to the Library of Congress in Washington D.C., where they undergo a second review to assure the files meet their specifications. Once everything is approved by all organizations, the files can finally be sent to either Chronicling America and/or Hoosier State Chronicles, where they are uploaded for public access.

CONCLUSION

Starting a new digital newspaper collection is often a large undertaking, but the established specifications, technologies, vendors, and programs throughout the United States show interested organizations that it can be done. If you are looking for how other organizations have handled this process, check out the list of organizations that have been awarded NDNP grants on the Library of Congress website: https://www.loc.gov/ndnp/awards/. Ultimately, the goal of digitization is making documents more accessible to the public, reducing damage to original sources, thus providing more contextual resources to our understanding of history.

A special thanks to Connie Rendfeld, Chandler Lighty, Justin Clark, Leigh Anne Johnson, and Jill Black in the creation of this document.

[i] Canepi, Ryder, Sitko, and Catherine Weng. “Microforms in Libraries and Archives” from Managing Microforms in the Digital Age, American Library Association, 2013. Accessed on July 16, 2018. http://www.ala.org/alcts/resources/collect/serials/microforms01

[ii] The program currently hosts newspapers from 46 states and Puerto Rico, and Indiana has 59 newspapers on the site as of today.

[iii] For more information on the Indianapolis Times, see notes from the Indiana Historical Bureau’s marker at: https://www.in.gov/history/markers/4115.htm

[iv] One source of funding is that of Library Services and Technology Act (LSTA) grants, which are funded by the Institute for Museum and Library Science (IMLS), of which the State of Indiana distributes funds. For more information on the availability of these grants, check out the State Library page at https://www.in.gov/library/lsta.htm, or contact Angela Fox at (317) 234-6550 or anfox@library.in.gov.

[v] The Indiana State Library and Hoosier State Chronicles have partnered with Newspapers.com in the past to digitize a large number of newspapers. In exchange for three years of exclusive access, over 1.5 million pages of Indiana newspapers are now digitized and accessible via the Indiana State Library’s Inspire website by following the links to Newspapers.com.

Project News

Fair Use and Copyright Research for Newspaper Digitization: What You Need to Know

September 21, 2018 Justin Clark

This article is based on a talk I gave at the Digital Public Library of America’s DPLA Fest conference on September 21, 2018.

Disclaimer: I am not a lawyer and this is not professional legal advice. This article is for educational purposes only. Please consult counsel concerning any potential digitization projects your institution is interested in pursuing.

Introduction

Good afternoon. Thank you very much for attending this session. I’m Justin Clark, Project Manager of Hoosier State Chronicles, our state-wide historic digital newspaper program at the Indiana State Library. We are a part of the National Digital Newspaper Program (NDNP), a joint venture between the Library of Congress and the National Endowment for the Humanities. To date, we’ve digitized nearly a million pages of historic Indiana newspapers, of which over 300,000 have gone into NDNP’s Chronicling America database of nearly 14 million digitized newspaper pages from across the county.

When digitizing historic newspapers for NDNP, one of the most important things to consider is whether the paper is under copyright. You could have picked the perfect title, had it approved by your institution, and completed all of the arduous work of collation, but if you don’t check its copyright status, your work could all be for naught. This is why a basic understanding of fair use, the public domain, copyright, and conducting copyright research is essential to any newspaper digitization project. This talk will provide a general overview of what fair use is, how it relates to newspaper titles, and how you can complete the necessary research to ensure your desired title for digitization is acceptable. Doing this work gives you not only an expanded scope of potential titles for digitization, but it also provides peace of mind that you won’t hear from any lawyers in the future, besides your institution’s counsel, of course.

Now, before we begin our stroll through copyright, I must say this. I AM NOT A LAWYER . . . nor have I played one on TV. This talk is only an educational overview of what I’ve learned about copyright research for digitizing newspapers. Other materials such as photographs, 3D objects, and written documents may not follow the same procedures or guidelines. It is imperative that you consult your institution’s legal counsel before making any concrete decisions to digitize anything. This saves you a visit from an irate lawyer who is upset that you’ve digitized materials that are still in copyright. And this little disclaimer saves ME a visit from an irate lawyer who got the call from the other one about copyrighted materials. In short, the only lawyer you want visiting your office should come from your institution. Now, with that out of the way, let’s start with fair use.

What Is Fair Use?

In the United States, copyright holders possess considerable legal rights for the protection of their intellectual property. This is a great thing – copyright holders can use their hard work to ensure an income and that scammers will keep their greedy hands off of work that doesn’t belong to them. But there are exceptions. One such exception to US copyright law plays a vital role in our emerging digital landscape: fair use. Fair use, according to the U.S. Copyright Office, “is a legal doctrine that promotes freedom of expression by permitting the unlicensed use of copyright-protected works in certain circumstances.” Essentially, fair use allows someone to use a copyrighted work for a completely different purpose than the copyright holder originally intended, which usually falls in the categories of “criticism, comment, news reporting, teaching, scholarship, and research.” These protections fall under Section 107 of the Copyright Act.

To determine whether or not a use of a copyrighted work is fair use, four general guidelines are followed. The first is the “purpose and character of the use.” Most of the time, if a person is using a copyrighted work for non-profit and/or educational purposes, it generally falls under fair use. This is especially the case if the use is “transformative” meaning that it “add[s] something new, with a further purpose or different character, and do[es] not substitute for the original use of the work.” In NDNP’s case, taking a newspaper which was originally created for immediate public consumption at a profit and transforming it into a digital historical artifact at no cost to the researcher usually falls under fair use. This guideline is not ironclad; sometimes, a copyright holder will object to their work being used in this way. Nevertheless, this guideline is generally applicable to NDNP and newspaper digitization as a whole.

Second, the “nature of the copyrighted work” is considered when determining fair use. This guideline is a little harder to pin down, but it basically means whether or not your use of a copyrighted work is too close to the original to be considered fair use. Specifically, “using a more creative or imaginative work (such as a novel, movie, or song) is less likely to support a claim of a fair use than using a factual work (such as a technical article or news item).” For our purposes, taking informational works such as newspapers and digitizing them for researchers changes the nature of the work, from a paid periodical into a free primary source document. In most cases, this would count as a fair use.

Third, the “amount and substantiality of the portion used in relation to the copyrighted work as a whole” plays a role in deciding fair use. In other words, if a person just blatantly copied the entirety of a copyrighted work and then sold it for their own benefit, it would not be fair use. However, for material that falls under the public domain (more on that below), recreating the entirety of the work is more than fine and falls under fair use. NDNP projects often have syndicated columns and cartoons that are copyrighted but the newspaper as a whole is not copyrighted. In those instances, the amount of non-copyrighted work outweighs the copyrighted work and the digitization of a newspaper is then considered fair use. We will unpack this more in the copyright research section.

Finally, fair use is determined by the “effect of the use upon the potential market for or value of the copyrighted work.” Put simply, does the use of a copyrighted work ruin its value in the marketplace? In the case of digitizing newspapers, a newspaper’s value stemmed from its original sale date, which was years or decades before. If a newspaper title is already in the public domain, its original market value is already gone and can be used by others in a myriad of ways. For NDNP projects, turning a newspaper into a primary source historical document does not destroy the market value of the original paper nor does it harm copyrighted works therein (syndicated columns and cartoons). Potential researchers are using the digitized newspapers for scholarly purposes, not for the resale of copyrighted material. As with the other three guidelines, the “market value” guideline is generally met.

This overview of fair use is not exhaustive. Definitely review material on fair use from the U.S. Copyright Office and the Copyright Alliance for more information.

What is “Public Domain”?

Public Domain Logo

Alongside fair use, a clear conception of public domain is essential for working on NDNP-related projects. Works in the public domain, according to the Stanford University Library, are:

. . . creative materials that are not protected by intellectual property laws such as copyright, trademark, or patent laws. The public owns these works, not an individual author or artist. Anyone can use a public domain work without obtaining permission, but no one can ever own it.

A work enters into the public domain via three avenues: it can’t be copyrighted (i.e., titles, names, facts, ideas, government works), the creator of the work places it in the public domain, or its copyright term has expired. With NDNP, the last of these three is the most important.

Have you ever wondered why the vast majority of NDNP’s content, and most digitized newspaper content, ends around 1923? It’s for a very simple reason: all works published in the United States before 1923 are in the public domain. No copyright research is necessary for this material; it’s free and clear for you to use. However, NDNP announced in 2016 that it has expanded its date range for newspaper titles, from 1836-1922 to 1690-1963. Thus, post-1923 works are in the public domain if a copyright claim was never filed from 1923 through 1977 or if the copyright was never renewed from 1923 through 1963. All NDNP projects that follow these public domain guidelines will easily determine if their potential title is ready for digitization.

To learn more about public domain, visit these online resources from the Stanford University and Cornell University libraries.

Conducting Copyright Research

Now that you know how fair use and the public domain work, you can begin the necessary research to determine the copyright status of a newspaper title. Here in Indiana, we wanted to know the copyright status of one of Indianapolis’s premier papers of the 20^th Century: the Indianapolis Times. The Times ran from 1888 (when it was titled the Sun) until 1965, a pretty impressive run for a daily metropolitan newspaper. From 1922 until its end, the Times was owned and operated by Scripps-Howard, a major publishing corporation based out of Cincinnati, Ohio. Knowing that such an influential publishing company owned the Times from 1922 until 1965 put an increased responsibility on us to make sure that the paper was either in the public domain and/or that its digitization would be considered fair use.

Indianapolis Times, October 11, 1965, Indiana State Library Newspaper Microfilm Collection.

To figure this out, we examined its copyright as a complete title as well as the copyright of individual articles and/or syndicated content, to get a sense of how much material within the newspaper was copyrighted. Three resources allow you to complete this research: the Catalog of Copyright Entries (1906-1977) (published by the Library of Congress), the Public Catalog of Copyright Entries (1978-present) (online; published by the Library of Congress), and the Indianapolis Times newspaper microfilm collection (courtesy of the Indiana State Library).

Catalog of Copyright Entries, Internet Archive.

The Catalog of Copyright Entries (1906-1977) is available at Internet Archive (www.archive.org) in a readable, PDF format. It comes with Optimal Character Recognition (OCR), so it is text-and-word searchable. To begin, view the 1923 Catalog of Copyright Entries, Part 2, which provides the copyright and copyright renewal for all periodicals published in the United States that year. For all the following years, look for the volume devoted to periodicals. In the search field, type the name of your title. If nothing comes up, search the catalog’s index for the title. If nothing is there, check the title within the book in the new copyright section as well as the renewal section. If nothing comes up, your newspaper title filed neither a new copyright nor a copyright renewal and it is in the public domain. Consult all remaining years of the catalog (in the periodical section) for any new copyright notices or copyright renewals. If you do find that your title was published with a copyright notice and a renewal from 1923-1963, it is not in the public domain and will remain under copyright for 95 years after the publication date. However, if the title was published from 1923-1963 with an initial copyright notice but was not renewed during that time, it is in the public domain and you are free to digitize.

Catalog of Copyright Entries, Library of Congress/Internet Archive. This is an example of the periodicals section of the catalog.

If you need to check anything after 1977, use the online Public Catalog of Copyright Entries, which covers 1978 to the present. This search is much easier than combing through the scanned versions at the Internet Archives. All you have to do is type in your title in the search bar; if you get no results, no copyright renewals were filed and you’re good to move forward. If there are copyright renewals, the title will remain under copyright for 95 years after its initial publication date.

Online Catalog of Copyright Entries, Library of Congress.

For our research, we started with 1922, the year that Scripps-Howard Newspapers purchased the Times and the final year it could have been in the public domain (this research was done in 2017, before the public domain covered 1923). According to listings in the Catalog of Copyright Entries and the Public Catalog of Copyright Entries, Scripps-Howard Newspapers never filed the Times for copyright between 1922-1965 or for subsequent renewals from 1965-present. Therefore, the Times as a complete newspaper is within the public domain and eligible for digitization.

Online Catalog of Copyright Entries, Library of Congress. A search for “Indianapolis Times” yields no results, which means that its copyright was never renewed after 1978.

But your search doesn’t end there! The copyright of individual articles and syndicated content also needs to be established. Library of Congress policy for NDNP has generally been that individually-copyrighted content within the “context” of an entire newspaper in the public domain is not a problem, so long as it doesn’t account for over 50% of the entire work. This rule is a recommendation and not an absolute policy. It is still up to you as an NDNP awardee, your institution, and your legal counsel to establish the proper procedures for such content.

Start with the scanned Catalog of Copyright Entries at the Internet Archive. However, instead of viewing the volumes devoted to complete periodicals, look at the volumes usually devoted to books or pamphets. These volumes include copyright information on individual pieces published in periodicals. Then search the online Catalog of Copyright Entries. Remember to check for both an original copyright notice and a copyright renewal. As with the newspaper title as a whole, if the article was published with a copyright notice and a renewal from 1923-1963, it is not in the public domain and will remain under copyright for 95 years after the publication date. Additionally, articles published from 1923-1963 with an initial copyright notice but no renewal are in the public domain and you are free to digitize.

Catalog of Copyright Entries, Library of Congress/Internet Archive. This is an example of the book and/or pamphlet section of the catalog, where copyright information on contributions to periodicals is located.

With our research of the Times, one type of syndicated content that showed up right away within copyright research was the Sunday supplemental, with PARADE magazine being an applicable example in the Times. From 1963-1965, PARADE was published with Sunday issues of the Times; it was copyrighted when it originally ran (and included in the Catalog of Copyright Entries) and was subsequently renewed (and included in the Public Catalog of Copyright Entries). As such, we decided not to include this supplemental in our NDNP deliverables. Regarding individual articles, we found 32 copyright listings in the Catalog of Copyright Entries from 1922-1965; only the initial copyright was listed and no renewals were found. These were then cross-referenced in the online Public Catalog of Copyright Entries to check for post-1978 renewals; none were found. These articles accounted for less than 10% of the entire field of research, way less than the more than 50% threshold for fair use. (So long as you consult your institution and its legal counsel.)

An example of PARADE magazine’s copyright notice from 1964. Supplementals like this are not in the public domain.

Now that you’ve thoroughly gone through the Catalogs, it’s also good policy to review the title’s microfilm. Here’s what we did. We chose three reels from each decade of the Times from 1923 to 1965 and scoured them for copyrighted content. We concluded that the vast majority of material on these reels fell within the public domain, in keeping the Times’s policy on copyright. As for what was copyrighted, it was mostly advertisements for still-existing products (Columbia Records, Bayer Aspirin), syndicated cartoons (individual cartoons scattered throughout the paper as well as one full page an issue), serialized fiction, and syndicated columns. These materials contained a copyright symbol and text, indicating its status. We concluded that these entries constituted a small minority of the newspaper content and largely will not affect the proprietary interests of the copyright holders (seeing as the content in question was digitized from second-generation microfilm, which itself come from first-generation preservation microfilm based photographed pages; the loss in resolution and quality should not urge copyright holders to pursue legal action). You can do more or less with your title’s microfilm than we have, but this should be enough to establish a broad consensus on your title’s copyright status.

A Bayer Aspirin ad from 1925. This was a copyrighted aspect of the Indianapolis Times that we reviewed when combing the microfilm collection.

Once you’ve done all of these procedures, it is best to draft a full report of your research and findings to your NDNP advisory board, as well as your institution’s legal counsel. Make sure to be as detailed as possible – this ensures they fully understand what you’ve done and saves you the trouble of having to answer a bunch of follow-up questions. For our research on the Times, I and my project director drafted our report and then sent it to the aforementioned parties. From there, we received approval to digitize the Times.

An example of syndicated and copyright cartoons from the Indianapolis Times.

An example of copyrighted serialized fiction in the Indianapolis Times.

One more tip for your research: make sure to keep detailed notes of everything you do. You will be going through a lot of newspapers, so it will help you keep things straight. It also provides a paper trail that your institution’s leadership and legal counsel can consult if necessary. I suggest using Google Sheets and Docs to complete this research. It will be in the Cloud and can be easily shared with anyone who would like to see it. If Google is not your fancy, use Microsoft Office and back up your work to the Cloud or another hard drive. You don’t want to work diligently for months to have all of it lost because of computer issues.

Examples of how I documented all my work. You will be going through a lot of newspapers, so it will help you keep things straight. It also provides a paper trail that your institution’s leadership and legal counsel can consult if necessary.

Conclusion

Digitizing newspapers has been one the most rewarding things I’ve worked on in the public history and cultural heritage space. Seeing a title like the Indianapolis Times digitized and made available for researchers to use, for free, has been a real privilege. But all of this could not have happened without doing the long and often-tedious work of copyright research. Researching a title’s copyright ensures that it is free and clear for you to digitize—and a lawyer from King Features or PARADE magazine won’t come knocking on your door. Yet, copyright research can also be very rewarding. It gives you a big-picture view of the title you’re considering for digitization. You’ll see who its original audience may have been, the kinds of stories they covered, and how it fits in the context of your state’s, and the country’s, history. This, among many other things, makes copyright research worth it. Thank you.

Indiana Historic Newspaper Digitization, Project News

New Batch Available!

June 14, 2017 Justin Clark

Hey there Chroniclers!

We have a new batch available for you through Chronicling America: http://chroniclingamerica.loc.gov/.

This batch comprises 977 issues (totaling 9,957 pages) and brings our total page count in Chronicling America to 299,200!

Here’s the paper and dates available:

Richmond Palladium And Sun-Telegram (Daily): April 1, 1912-November 20, 1915.

As always, happy searching!

This project has been assisted by a grant from the National Endowment for the Humanities.

Indiana Historic Newspaper Digitization, Project News

New Batch Available!

April 26, 2017 Justin Clark

Greetings chroniclers!

We have another new batch available for you at Chronicling America.

This batch contains issues from:

Richmond Weekly Palladium (Jan 06, 1875 – Dec 09, 1875)
Richmond Daily Palladium (Nov 21, 1898 – Sep 30, 1907)

This batch adds 1166 issues (8,878 pages), growing Indiana’s total number of pages in Chronicling America to 288,102!

Have fun with all these new pages, and as always, happy searching!

This project has been assisted by a grant from the National Endowment for the Humanities.

Hoosier State Chronicles: Indiana's Digital Newspaper Program

Tag Archives: Newspaper digitization

New Issues Available!

Like this:

Nappanee Advance-News Available on Hoosier State Chronicles

Like this:

Indianapolis Times Available on Hoosier State Chronicles!

Like this:

New Batches Available!

Like this:

Sooth Your Inner Perfectionist: Fixing Searchable Text in Hoosier State Chronicles

Like this:

New Batches Available!

Like this:

Digitizing Newspapers in Indiana: Hoosier State Chronicles

Like this:

Fair Use and Copyright Research for Newspaper Digitization: What You Need to Know

Like this:

New Batch Available!

Like this:

New Batch Available!

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this: