• Print

A National Scan Center: A Public Works Project

In the course of doing research for some recent
testimony before
Congress
on the
National Archives and Records Administration, I was struck by several facts about how our first National
Archivist, Robert D.W. Connor, met some seemingly insurmountable challenges when he took office in
the mid-1930s.

The biggest challenge was the deluge of paperwork, a situation not very different from what our
national institutions face today. Instead of simply moaning the impossibility of swallowing
all the records Connor would need to establish the National Archives, he thought nonlinear. The
result was the invention of several key technologies: the airbrush to clean paper, the laminator
to protect it, and of course, the microphotograph (now known as microfilm or microfiche), a technology so
successful it reduced incoming paper needs by 95%.

The other challenge that Connor faced with the National Archives, a situation again not very different
from what our national institutions face today, was a paucity of skilled labor. Lucky for Connor
though, the National Archives was born in the middle of the last great depression. Connor went
to Harry Hopkins, and together they went to President Roosevelt, and the result was a Works Progress
Administration program that ran until 1942 to survey federal archives. The program put
3,171 people to work in 1,057 communities and created two important reference aids still in
use today, the Historical Records Survey and the Inventory of Federal Archives.

Just before I testified, I read in
the New York Times
that the President of France had just
announced a stimulus package of $50 billion. President Sarkozy pledged 2% of that stimulus package,
a full $1.1 billion, towards scanning and digitizing a national archive. I didn’t use the term
Freedom Scans in my testimony, but the fact that the French were far ahead of the U.S. in putting
paperwork into cyberspace seemed a political opportunity.

In the U.S., we face a similar deluge of paperwork that we faced in the 1930s. A huge backlog
of paper, microfiche, audio, video, and other materials is located throughout the federal government.
Little money has gone from Congress for digitization, and bureaucracies have resorted to a series
of questionable private-public partnerships as a way of digitizing their materials. For example,
the Government Accountability Office shipped 60 million pages of our Federal Legislative Histories
(the record of each law from the initial bill through the hearings and conference reports) off to
Thomson West, but didn’t even get digital copies back. Another example is the recent failed effort
by the Government Printing Office to digitize 60 million pages of the Federal Depository Library
Program, an effort they tried to get through as a “zero dollar cost to the government” effort with
the private sector.

There are no free lunches and there are no “no cost to the government” deals. The costs
involve the government effort to supervise the contract, prepare the materials, and ship them, and
in both the GAO and GPO cases, the government wasn’t getting much back for its effort. What the
government and the people usually get is a lien on the public domain, preventing the public
from accessing these vital materials. Similar efforts are
sprinkled throughout the government. I testified to Congress that I had learned that the
National Archives was contemplating a scan of congressional hearings with LexisNexis under
similar circumstances, and many may be aware of the questionable deal the Archives cut with
Amazon where my favorite online superstore got de facto exclusive rights to 1,899 wonderful
pieces of video.

We can learn much from the French leadership on this issue. After my testimony, I went and visited
senior officials at the Library of Congress and the Smithsonian. They all said that while they
had tried to get more congressional interest in digitization, and had tried to go after stimulus
money, so far nobody had much success. I asked if they had gone hand-in-hand with their
sister institutions to ask for this money, and it was pretty clear that they had not.
Each institution went in one at a time pleading their own special case to congressional staffers
and to officials at the Office of Management and Budget.

There was one more thing I learned about our first National Archivist, which was that he had
backing where he needed it and the political skills to use that backing. One of the big challenges
Archivist Connor faced was getting the
agencies to cooperate with him in giving the National Archives their records. His solution
was leadership: President Roosevelt agreed to host a meeting of a newly-formed National
Archives Council in the Cabinet Room. That, needless to say, got the department secretaries and
agency chiefs to show up, and they elected the Secretary of State as head of the Council. The
Council only met a few times, but that was all it took, and the result were new federal policies
about how agencies should dispose of their records.

There are several agencies in the government that face huge digitization and scanning backlogs,
including the Library of Congress, the Smithsonian Institution, the Government Printing Office,
the National Archives and Records Administration, and the National Technical Information
Service. In addition, there are agencies such as the Government Accountability Office and
the Defense Visual Information Directorate that have valuable archives.

Chairman Wm. Lacy Clay of the the Information Policy, Census and National Archives Subcommittee
asked many very informed questions of the panelists, and one that came my way was about costs
for digitization. Today, the widely accepted cost for scanning a piece of paper and running it
through OCR is about 10 cents per page. These are the numbers that you hear from places like
the Internet Archive and Google Book Search, and that’s what I told the Chairman. But, I also told
the Chairman that
it was my belief that if the government starting scanning at volume, those costs could go down
by half. I also testified about the vastly reduced costs of digitizing video, a task I perform
under a joint venture with the National Technical Information Service using less than $10,000 in
hardware.

If the government invested a mere $100 million of our stimulus package (we’ve already spent over
$72.6 billion), that means 2 billion pages of
paper or microfiche would get scanned. For $500 million, we’re talking a huge chunk of our national backlog
being digitized, a task that would result in an enduring digitial public work for our modern era,
something that would prove
immense use to future generations, and would also save the government tremendous amounts of
money in storage costs and other facilities expenses.

What would it take to get the Library of Congress, the Smithsonian Institution, the Government
Printing Office, the National Archives and Records Administration, and the National Technical
Information Service all singing off the same page and working together? There is a tremendous
opportunity for White House leadership here, bringing the parties together and creating a
compelling case on why we should launch and fund a 5-year $500 million effort to create a
National Scan Center. Both the CIO and the CTO in the Executive Office of the President have
talked about the tremendous “moral authority and convening power” of the White House, and I
believe that this issue is of sufficient importance that it would be worthwhile to pursue.

tags: , ,
  • Anthony Ferrar

    I’ve been hoping for the Library of Congress, Smithsonian Institution, Government Printing Office, National Archives to team up for a microfiche scanning project like this for decades. There really is no excuse not to proceed with conversion especially with the stimulus package.

    With the price of document scanning and microfiche and microfilm scanning decreasing in recent years, the price isn’t an issue either.

    admin@scanningdepot.com

  • Carl Malamud

    > “a paucity of skilled labor.”

    To be clearer, Connor was inventing a new profession (“archivist”) and he is on record as saying that they were all “amateurs” in the sense of having to deal with reinventing their profession.

    To address that task, Connor brought in the most skilled engineers, librarians, historians, conservationists, and many others and together they made substantial contributions to the science of archiving. Today, the National Archives faces challenges from mass digitization to electronic records archiving (and access) that are of similar scope.

  • Kelly Woestman

    You address some important issues and it might interest you to check out the work the National Archives is already doing:

    http://www.archives.gov/era/

    “ERA is the National Archives and Records Administration’s strategic initiative to preserve and provide long-term access to uniquely valuable electronic records of the U.S. Government, and to transition government-wide management of the lifecycle of all records into the realm of e-government.”

    (You’ll note the diverse groups represented on the Advisory Committee for ERA, including the CIO of the Library of Congress and Robert Kahn.)

    It’s important to note that many people do not realize that the National Archives has no enforcement power. They must work with and/or negotiate with each of the government agencies to acquire records of long-term value. As a result, some of the issues involved in preserving our records go well beyond technology.

    As an historian and a member of the Advisory Committee on the Electronic Records Archives, I am excited by the digital technology experience that the recently appointed Archivist of the United States has. You can find more information about David S. Ferriero here: http://www.archives.gov/about/archivist/archivist-biography-ferriero.html

  • bowerbird

    carl, you’re one of the “library leaders” who has been
    missing-in-action on this overall issue for years now.

    and you’re still coming in with far too little, far too late.

    for instance, i suggested firing the librarian of congress:
    > http://onlinebooks.library.upenn.edu/webbin/bparchive?year=2006&post=2006-02-24,1

    and that was nearly 4 years ago now…

    (actually, it’s a suggestion i’ve made for two decades now.)

    librarian leaders from the top 250 universities across the
    entire country should have joined together with authors
    and publishers and _readers_ to present a unified front
    arguing for a national cyberlibrary based on google scans.
    (and google should have been reimbursed for its efforts.)

    but you all just sat on your thumbs and did nothing…

    and now people like you and darnton are coming in with
    worse-than-half-assed ideas (with no partners attached)
    and nobody is listening since the dealing has been done.

    at this late date, it would be better for you to stay silent,
    rather than to remind people of your staggering impotence.

    -bowerbird

  • Merrilee Proffitt

    Carl, thanks for your comments. I have one minor comment which is that the WPA program was very much a survey of existing records. It did not get records into the custody of an archival agency, let alone describe and process then. Those WPA reports are nonetheless valuable, but it’s important to recognize that what was done was only the first step in archival workflows — survey and appraisal, which comes before acquisition (and before description, arrangement, preservation and access functions). In undertaking a national digitization effort, I think we should also aspire to survey records within communities of interest and to bring them into custody. Without continually building collections — digitized or not — we will lack the raw materials and building blocks of history.

  • http://www.aviglatt.com/ kosher new york

    I hope that the Library of Congress, Smithsonian Institution, Government Printing Office, National Archives form a team for microfiche scanning project project like this for decades. There’s really no excuse not to pursue the conversion of particular stimulus package.