Apr 29

Ben Lorica

Ben Lorica

Inside Innovation at Xerox PARC

We were part of a group of journalists and bloggers invited to hear presentations from 10 different research groups within various parts of Xerox, PARC, and Fuji-Xerox. The format was similar to a science fair or a poster session in an academic conference with small groups moving around to hear presentations from the different projects. While other research labs use a large auditorium and parade different researchers in, I thought the smaller, science fair format made for better interactions between the visitors and the researchers.

We saw early prototypes created by the researchers themselves, so the user interfaces were far from polished. Here are some of the highlights from our visit:

Seamless Document Viewer
A J2ME application designed to help solve the problem of viewing documents on small screens (cell phones and other mobile devices), this app automatically segments a document into blocks and displays the keyphrase for each block. The keyphrases are intended to help users navigate to sections of interest quickly. The cell phone demo we saw used a fairly intuitive touchscreen interface that included an interesting way to pan and zoom in and out of sections of a document. Because documents viewed through the application need to be processed and analyzed in advance, it is better suited for viewing PDF's and static documents, not frequently updated web pages.

Hybrid Categorization
Categorizing documents automatically is an old topic in information science. Most tools rely only on the text portion of documents and use a combination of Natural Language Processing and Machine Learning. I was looking forward to this presentation because we use text-only automatic classifiers to help organize some of our data sources.

Hybrid categorization uses both the text and images contained in documents. It isn't clear how scalable their hybrid categorizer is, the results we saw were based on small numbers of documents. Precision measures the accuracy of a categorizer and judging from the results of an academic competition, Xerox' hybrid (text +images) approach may hold some promise.
Erasable Paper
"Reusable paper" refers to paper coated with special materials and a custom printer that shoots UV light onto it. The resulting printed document is designed to fade within 24 hours and the paper can be reused and fed into the printer multiple (10+) times. The printer can even erase the printing on the specially-coated papers, and print an entirely new document on the same sheets of paper. We raised the possibility that a sheet of paper that has nominally erased itself can be reverse engineered to reveal sensitive content: think security agencies or dumpster-diving identity thieves. Surprisingly, the researchers had not seriously investigated the possibility of "recovering erased documents".

The cost of the specially-coated paper is projected to be only 2-3 time the cost of normal paper, while the accompanying printer will cost about the same as a laser printer. Since paper can be reused multiple (10+) times, the obvious environmental benefits also lead to savings. Further savings come from the design of the printer itself: since the printing is done with light (UV LED bar), the printer does not use ink or toner.

Intelligent Redaction
Redaction is the process of removing sensitive information from documents. Popular examples include government/intelligence documents released to the public and medical records. Text redaction is normally a tedious manual process that requires staff possessing significant domain expertise. As an example, privacy rules governing medical records in the U.S. requires redaction of terms associated with HIV/AIDS, mental health and drug/alcohol problems. In the demo we saw, the software tool examined a corpus of documents, automatically came up with terms/phrases associated with the listed illnesses, and redacted them from every document in the corpus.

Other Notables

  • Clean technology: solar concentrators and membrane-less water filtration

  • "Environmentally-friendly" plastic: plastic with more than 30% of its weight made from biomass

  • Cancer detection tools: rare cell detection

  • tags: news from the future  | comments: 8   | Sphere It

    Previous  |  Next

    0 TrackBacks

    TrackBack URL for this entry: http://blogs.oreilly.com/cgi-bin/mt/mt-t.cgi/6462

    Comments: 8

      Gee4orce [04.30.08 01:32 AM]

    I generally find ANTI-ALIASING to be a useful innovation when viewing things on a small screen.... ?! That document viewer screenshot makes me want to poke my eyes out with a blunt stick.

      Pratik Stephen [04.30.08 07:59 AM]

    Erasable Paper!
    It will be useful for reading long blog entries and online articles!

    But then we'll need to reprint stuff everyday!?
    Is there another form of the erasable paper that won't fade the ink out on its own.

      Spud Light [04.30.08 08:45 AM]

    Redaction is now easy on PDF documents using new utilities with Adobe Acrobat 8.

      Ben Lorica [04.30.08 08:56 AM]

    Spud Light: What the Xerox PARC researchers do is automate the redaction task. I should have emphasized this more in my post.

    They can take a term (say HIV/AIDS), look at a corpus, automatically identify the terms that need to be redacted (HIV/AIDS and other related terms). Automatically deriving relevant terms/phrases is the the essence of their research.

      Jordon [04.30.08 10:03 AM]

    Quote: Patrick Stephan
    "But then we'll need to reprint stuff everyday!?"

    I think a reprint that doesn't use ink is great, woot no more $80 refills!!.

    Also this would be great when having a meeting. Everyone wants to hold the meeting miniutes, etc... or a booklet. Now they print it on some papaer that costs a 3rd less (expecting regular 10x use) without any ink usage and then recycle it right after. BRILIANT!!

    This could also be used for time sensative data like an agreement that expires in 3 days would actaully "self dstruct 007" LOL.

    My 2 cents

      bowerbird [04.30.08 10:18 AM]

    > "Reusable paper"

    yes! i've been wanting this for _decades_ now...


      Keith Yager [04.30.08 10:50 AM]

    Re: Erasable paper

    It would be great if you could control the time to fade, say from a few hours to a few days perhaps.

    Or maybe have the fade time start only when exposed to air or light for a short time. For example, mailing a confidential letter to someone, which could take days. It starts to fade soon after the letter is opened.

      Alex Tolley [04.30.08 01:48 PM]

    Wouldn't electronic paper be better than "reusable paper"? I really expect PARC to be more than such a narrow focussed R&D arm of the parent document company.

    Reusable paper is solving the wrong problem. We are reading documents as printed sheets for these reasons:
    1. Current computer/screens do not have good resolution and contrast.
    2. Computers are not yet as portable as paper.
    3. You can very easily markup paper.
    4. Paper is cheap and disposable.

    Make screens better, more portable and interactive and we can start to say goodbye to paper. We are getting there.

    This so reminds me of Xerox's smart forms that they launched in the mid-1990's. You fed a form to a fax machine and the computer at HQ faxed back the document you requested. It was justified by marketers who saw a need for using existing devices to stay connected in a laptop-poor world. Well we know what happened next, and that silly idea died pretty quickly.

    Post A Comment:

     (please be patient, comments may take awhile to post)

    Type the characters you see in the picture above.