Managing Information Overload and Photographs

Wednesday, July 1st, 2009

Documents are not only textual documents. Pictures, photographs, music and videos are taking up an increasing amount of space on our hard disks, and constitute a new breed of documents growing at an incredible rate. this is due to democratization of digital capture devices such as cameras, camcorders, and smartphones, the increasing storage space we have at our disposal, whether offline or online.

It has become really difficult to manage all of these documents - How will I find a picture of my kid in front of the Eiffel tower in 10 years from now? One way is to be disciplined in the way we organize them and add metadata  -But how can they be handled for pictures where metadata is not available? Not to mention this metadata is often subjective (e.g. content of a photograph – should I use Paris, Eiffel, Tower, Trocadero? )

What if the actual content of pictures and photographs could be used as the source of your search…

Computer Vision is progressing, and a number of technologies are under development to automatically analyze the content of a picture, and find other photographs with similar content. There are many technologies for managing large image repositories, at Xerox and elsewhere. I particularly like this example application though, because it’s live and works off a real database – it lets you search in a database of 10 million images for images similar to the one you have selected or uploaded. Try it with your holiday pictures, it’s quite impressive – especially in the accuracy of the results.

Are you suffering from Information Overload Syndrome?

Monday, May 25th, 2009

Are you suffering from Information Overload Syndrome? Check out this funny video to check whether you have any of the symptoms of this growing pathology.

In particular, you might want to check out the personalized video – although it reuses many shots from the main video, it is a very good demonstration of Cross-Media publishing, including personalized content in videos. I had already mentioned how personalized content in images could make your personalized document (”transpromo”) much more powerful, but this goes one step beyond.

Cutting through the clutter of paper and electronic documents

Wednesday, March 25th, 2009

Most major companies are suffering from Information Overload, and in particular dealing with very large amounts of incoming paper and digital information. This information needs to be manually processed, before it can be delivered to the right knowledge worker in your organization – and in many cases, it can take up to 15 days for that information to be delivered to the right person. So think about your typical customer request – by the time you get back to her, she has moved away and is presumably very dissatisfied with your company. Turnaround is key.

This is why Xerox is working on technologies to address Information Overload. One such technique was mentioned in this article in The Times magazine ”Confronting the information overload”. Called the Hybrid Categorizer, this technology automatically sorts and classifies documents. As opposed to existing techniques, which solely rely on “visual” (e.g. shapes) or “textual” (words) to recognize a doctype, Hybrid Categorizer takes into account both the visual and textual information. 

Plus it fully leverages Machine Learning – meaning it “learns” what characterizes each doctype, as opposed to requiring a human to “teach” (often) subjective rules. It is therefore capable of achieving a quantum leap in the Automatic Document Recognition it can achieve – this with minimal setup and errors.

I’ll cover some real use case studies of Hybrid in the future – this technology is used in many applications including the Digital Mailroom, which is part of Mail and Distribution Services. For more information click on this brief illustrative video:

Information Overload podcast

Friday, February 6th, 2009

Some of you might have noticed it on my podcast RSS feed: I have a couple new podcasts which I recorded a late last year. The first one is on Information Overload and can be downloaded here.

It touches upon a variety of topics, the Less Paper Office of course, but also the Digital Universe.

Xerox CTO speaking on Innovation at Xerox

Friday, October 24th, 2008

For those of you (like me) that could not attend the Wired NextFest ”Experience the Future” event and the “Innovation Conversation”, you can find here a very interesting interview of Sophie Vanderbroek, Innovation Thought Leader and Xerox CTO.

Interesting discussion including how, for example, Xerox Innovation can have unexpected “green” applications such as applying inkjet technologies to water purification, or how other technologies can address Information Overload.