July 2nd, 2009
After discovering the great news that Xerox was now considered the top “green” outsourcer (all domains included), I wanted to figure out how we were doing on our legacy market: electronics.
The latest Greenpeace “Guide to Greener Electronics“, published yesterday, nails down companies like HP, Dell and Lenovo, but Xerox does not appear. HP dropped significantly in this quarterly report, from being “middle of the pack” one year ago, to being 14th out of 17th (although for all of its electronics divisions). Unfortunately, this ranking seems to be mostly about PCs, not so much about printers or other electronics, so Xerox does not appear.
However, searching for more information on the topic, I stumbled upon GoodElectronics.org, a very good resource on the topic. This then pointed me to Covalence’s Ethical ranking.
Geneva-based Covalence tracks the ethical reputation of multinationals by sourcing information from the media, civil society, and companies. The most active criteria in 2008 have been: Environmental impact of production, Social sponsorship, Waste management, Information to consumer, Eco-innovative product, International presence, Downsizing, Product environmental risk, Labour standards, and Anti-corruption policy - so Sustainability weights strongly in the final ranking.
Great news: Xerox made it to the 5th position in this year’s ranking, second only to Intel in the “Technology” sector. The full table of results can be found here.
Posted in The Future of Documents | No Comments »
July 1st, 2009
Documents are not only textual documents. Pictures, photographs, music and videos are taking up an increasing amount of space on our hard disks, and constitute a new breed of documents growing at an incredible rate. this is due to democratization of digital capture devices such as cameras, camcorders, and smartphones, the increasing storage space we have at our disposal, whether offline or online.
It has become really difficult to manage all of these documents - How will I find a picture of my kid in front of the Eiffel tower in 10 years from now? One way is to be disciplined in the way we organize them and add metadata -But how can they be handled for pictures where metadata is not available? Not to mention this metadata is often subjective (e.g. content of a photograph - should I use Paris, Eiffel, Tower, Trocadero? )
What if the actual content of pictures and photographs could be used as the source of your search…
Computer Vision is progressing, and a number of technologies are under development to automatically analyze the content of a picture, and find other photographs with similar content. There are many technologies for managing large image repositories, at Xerox and elsewhere. I particularly like this example application though, because it’s live and works off a real database - it lets you search in a database of 10 million images for images similar to the one you have selected or uploaded. Try it with your holiday pictures, it’s quite impressive - especially in the accuracy of the results.
Posted in The Future of Documents | No Comments »
June 26th, 2009
A year or so after the dispute between ODF and OOXML seemed to be settled, the situation is getting more and more confused.
OOXML was fast-tracked as an ISO standard in April 2008 through a questionable procedure, but a few months later Microsoft admitted that “ODF had clearly won” the Document Format Standard war, almost a year ago. However, Microsoft increased its participation in ODF Standards bodies participation, to the point that some open source advocates feared they might take control over the ODF Standards.
After a number of other episodes, the strategy delivers - Microsoft’s OOXML standard implementation is back on the list of recommended formats for government organizations, including France’s RGI (Référentiel général d’interopérabilité). Open Source advocates argue that a second Document Format is useless since ODF has been around for a few years now, and that even Microsoft products do not support the OOXML ISO implementation yet (support for the ISO version of OOXML Microsoft Office in Microsoft Office version is slated for next year). Microsoft counters that OOXML is not their software anymore, since anyone can implement OOXML compliance (after reading the 6000 pages spec of course).
And us, document users, are sitting in the middle, worried that we might not have a single, universal open document format after all, which was the whole point in that ODF-OOXML decision…
Posted in The Future of Documents | No Comments »
June 22nd, 2009
The report published by The Brown-Wilson group positions Xerox as the top ranking green outsourcer ! This report explores how new economic dimensions are impacting the growth of the sustainability technology sector.
Xerox is ranked “greenest” in the Document Process Outsourcing area by its clients, which is not too surprising. The criteria used included sustainability metrics, social and economic principles, environmental principles, LEED (Leadership in Energy and Environmental Design) Green Building Rating system, and Six Sigma.
But even better, when asked to nominate which outsourcing companies are the “greenest”, Xerox comes first with an astounding 440 nominations! That puts Xerox in front of Accenture (429), CSC (403), CapGemini (396) and IBM Global (390). HP / EDS comes 10th, with 259 nominations.
That is a huge progression from last year’s 35th position in that same ranking. That shows that customers now see Xerox as the trusted outsourcing partner that can take them on the journey to the “Less Paper Office” - reducing overall carbon footprint of their infrastructure, using less paper, less energy, generating less waste, but also optimizing their Document Business Processes to remove paper - when appropriate- and improve overall quality.
That’s what we call “Smarter Ways to Green”: click on the video below to learn more.

Posted in The Future of Documents | No Comments »
June 19th, 2009
This interesting article points out that XML-structured document formats such as XBRL (eXtended Business Reporting Language) could ensure a much tighter reporting and control over financial institutions and companies - and maybe avoid the next financial crisis?
XBRL started in the late 90’s and defines a XML schema for the exchange of financial information between companies, accountants and the SEC - including “semantic” information, which can be extracted very easily.
Starting this year, larger companies will have to submit their reports in this document format which can be programatically analyzed and validated - this will be a dramatic change from the current submission of html, pdf, ascii or anything else, which SEC analysts had to parse and analyze manually, and in case of an error get back to the filers much more quickly. Plus, this information will be available to anyone else, since this is public information, for analysis and others.
“Semantic” XML-based vertical document formats will be the next wave for Document formats. HL7 or other formats in the Health Care domain will help dramatically increase the throughput and reduce the errors in Health Management. There are quite a few out there already, but the Future of Documents will be made of many of these vertical schemas which will be a dramatic element in improving Document-Intensive Business Processes.
Posted in The Future of Documents | No Comments »
June 18th, 2009
This press release from the PDF/A Competence Centre confirms that PDF/A is gaining acceptance for records management and long-term archiving. Half of the organizations surveyed had plans to use PDF/A in the next 12 months. At the same time, older archiving formats such as TIFF, JPEG or simple PDF decreased by about 5 percent. “Nearly all archiving projects use PDF/A” is quite a misleading title though, the current penetration of PDF/A is still small, as only 16% of surveyed use it actively (although 75% plan to use it actively).
PDF/A is based on PDF 1.4, and became a published standard on October 1st, 2005. It is a stripped out version of PDF, which is intended for long-term compatibility. It is actually offers two levels of compliance: PDF/A-1b is the predominant one, while PDF/A-1a conserves reading order and adds ”searchability” (e.g. OCR for paper documents). A new version of PDF/A is in the works - PDF/A-2, which will add selected features from later PDF versions (1.5 to 1.7).
So is PDF/A a good long term storage format? Yes, I still think so. The files are relatively large, but they are totally self-contained, which is vital for very long-term conservation - and you have the assurance of having a software to read it twenty years from now. “searchability” are very important for the short term. “Reading Order” can be very important if you want to apply Natural Language Processing to analyze this data - but these technologies are only starting.
Both features are “standard” (and easy) for native electronic documents, but for paper documents, they are not that easy- and in a few decades OCR and Intelligent Document Recognition will have improved so vastly that the image document (embedded in the PDF/A) will be the best source to extract both reading order and text information so well that saving it today does not make so much sense. However, in the meantime, they allow a good indexing and metadata for your documents.
As always though, file format is just part of the story - you also need to make sure you’ll be able to read your archival medium a few years from now… Who still owns a zip or floppy drive?
Posted in The Future of Documents | 1 Comment »
June 12th, 2009
Good coverage by CNET.com on Your e-health future. It touches upon some of the rationales for moving Medical Records to Electronic formats - but also alludes to the major barriers, including cost, complexity, privacy, and security (and less obvious ones, such as the legal use of digital health records by insurance companies to deny membership or hike prices beyond affordability for those with existing medical conditions), while talking about the trends and regulations that affect this trend.
Interesting reading that I’ll let you discover by yourself, but in my opinion one of the main point is the mention that the advantages of electronic medical records come only if older paper records are scanned or incorporated into the new system.
This is a laborious, expensive and error-prone process, which requires technologies like Automatic Classification, Intelligent Extraction, and other advanced technologies that can extract information from this unstructured set of information. And, scanning this huge backlog of information should not be improvised - can I scan all of those records at once? Or should I scan a patient record only in preparation of an appointment, on-demand? What about all this paper trail that we’ll continue generating until Medical Records become fully electronic? Should I use decentralized scanning, bulk scanning, or a mix of both? TYhis can of strategy is best defined with an expert in document management, who can put you on the path to the Less Paper medical office.
Posted in The Future of Documents | 6 Comments »
June 8th, 2009
Just wrapped up a webcast on Money Saving Content Management Strategies. This started as a very interesting discussion with Brian Lincoln and other members of the Docushare team, on how ECM could contribute to some of the core topics covered in my blog - going “Less Paper”, being green, improving productivity - all that while cutting costs.
In this webcast, Brian covers present - working - strategies for Content Management, while I talk about some of the Future of Document Technologies that will have a strong effect - all that in 24 minutes.
In the Attachments tab, you’ll be able to (re)discover a video of Transient Paper, showing how it can be imaged using UV light, vanishes over a few hours, and can be re-printed over and over again.
Enjoy!
Posted in The Future of Documents | 2 Comments »
June 4th, 2009
Every once in a while, a technology comes along, which really change the way we think about technology at large, and pushes the envelope. Google is often behind those disruptive technologies, and Google Wave is no exception.
Google Wave is merging many of the “boundaries” we’ve taken for granted so far. Frontiers between instant messaging and asynchronous messaging; frontiers between Web 2.0, email and traditional document; frontiers between traditional and collaborative realtime editing, even the time frontier…
Replay of “wave” or conversation thread, annotation and highlighting of changes, concurrent online editing, automatic update of blogs or orkut pages, narrow-down by user or paragraph, version control, intelligent spellchecking… There are too many cool features to even scratch the surface here.
Google Wave is as close to the vision of the Future of Documents as it gets - evergreen, social, intelligent. And all of that in any browser, or even on Android phones, using good (not so) old HTML 5.
Watch the video and find out for yourself. It’s long, but it’s well worth the time.
Posted in The Future of Documents | 4 Comments »
June 3rd, 2009
Interesting Wired Article on the Future of Reading book.
The author’s view of the future of reading books resonates strongly with my view of the Future of Documents: in order to move away from paper in many usages, the electronic document needs to provide affordances than the “legacy” format (paper here) does not provide. This includes annotation, but more importantly the capability of allowing these annotations to be shared with some of your colleagues through “Web 2.0″ channels, as is provided by technologies such as WebNotes or reframeit.
As the author concludes: “Taking them digital will unlock their real value: the readers.”
Posted in The Future of Documents | No Comments »