What happens to Scribd?

Friday, August 14th, 2009

Scribd, one of my favourite Document Sharing 2.0 technology (”Youtube for Documents”),  is showing a significant usage slowdown. Traffic has dropped by close to 50% since June’09 and keeps declining.

scribd

In this article, Scribd CEO invokes 3 main reasons. The first one is a summer dip which its competitors do not seem to feel: Docstoc and Issuu are still on the rise- although orders of magnitude lower still  (5 times less estimated traffic for Docstoc).

 It might have a bit to do with technology. Scribd introduced its new iPaper 2 technology recently, which improves the browsing interface significantly and should drive more volume; but at the same time Scribd’s new SEO filter might bring more relevant searches from search engines.

To me, it mostly has to do with the business model – and the content. Moving to a “pay” business model might have not attracted as many attendees as hoped. At the same time  -sadly – the most wanted content must have been copyrighted material, which is being removed and filtered out.

Let’s hope this dip is only temporary, because this sort of technology – and experimentations – are essential for the Future of Documents.

How to Succeed in a Less Paper World?

Friday, July 24th, 2009

This is the topic of my last podcast interview: “How To Succeed in a “Less Paper” World: Recommendations On How To Manage Documents In All Their Changing Forms” – a pretty ambitious title and topic, but I am sure it will be useful to many of you.

You can listen to the full podcast (which is 20 minutes)- or jump to a specific section:

As I will be away two weeks, you should have more than enough time to listen to all of it :-)

Online Industry supporting the Dying Printed Newspaper Industry?

Thursday, July 23rd, 2009

That’s the rather “surprising” idea that a Dutch government report recommended – suggesting to tax ISP subscription to fund dwindling newspapers. That report is about a month old, so I am not sure whether there has been more recent evolutions of this interesting proposal as my dutch is pretty poor these days – but you can see a Google translation of the original article there (in Dutch). Any update is more than welcome.

My first reaction was, like most, that this is a pretty crazy idea. But on second thoughts, How is that different from instituting an ISP  ”tax” for downloading MP3s, which is currently being discussed in France, and I’m sure in many other countries? In fact, the death of the newspapers is directly caused by online media, because journalism had to move – and new content (articles) are now being generated and consumed online. In the case of music, the Internet is just a medium for sharing - so why should we support that other ailing industry (except because of more powerful lobbyists) more than the newspaper industry?

More seriously, a number of industries are dying because of drastic business model shifts - caused by the Internet. Some of them are successful at reinventing themselves, some less so. In any event, such “taxes” will only allow them to survive for some time and is not available in the long term – evolution is required.

State of ePaper and eReaders

Monday, July 20th, 2009

Kindle Review has two very interesting posts for anyone that is interested in ePaper and eBook / eReader technologies.

The State of ePaper lists most (if not all) of the technologies that are currently in play for next-generation eReaders. Beyond eInk’s technology, the this post lists most of the runner-ups, and compares their technology with the current champion. Whether LCD-derived technologies (Pixel Qi), Color (Bridgestone, Fujitsu), or many others, the post explains the differences and illustrates with videos, when applicable.

The State of the eReader lists  the key characteristics of eReaders (price, screen technology, usability, social aspects, and many more) and for each of those reviews the current state (”best-of-breed” products) as well as what the ideal eReader should be.

Excellent and very impressive work putting all this together and analyzing it – definitely worth reading for anyone that has interest in the future of eReaders !

Are students ready for e-Books?

Saturday, July 18th, 2009

Maybe not. In this article, “E-Texts Receive Mixed Reviews From Students“, the Wall Street Journal describes a few experiments with e-books as student textbooks.

Some found the affordances of electronic documents (e.g. keyword searching) a major improvement over hardcopy textbooks. Weight, up-to-date versions of documents, and (at least in theory) lower price were also some of the qualities that were invoked.

However, dozens of the students dropped out of the e-Textbook programs, complaining the devices were awkward and inconvenient, and sometimes too fragile. They are “great if you’re using them on a beach or on an airplane, but not fully functional for a learning environment”, according to some. Even worse, the actual price (including the high entrypoint for the hardware, but also the actual price per book – which ends up often close to the hardcopy version) was also a major turndown.

A study from the Student Public Interest Research Group concluded that 75% of students would still prefer print to digital texts…

However, these are just first generations of textbooks. Future generations, with real annotation capabilities, no “flashing” of displays when refreshing, and closer to real paper physical format, should gain more traction from students… But this will take time.

Microsoft Office Web Applications – finally announced

Wednesday, July 15th, 2009

Microsoft is finally introducing an online, cloud-based version of its Microsoft Office products, as annonced during their Office 2010 Worldwide Partner conference.

These Web Apps, including Microsoft Word, Powerpoint, Excel, and One Note, will be free, but might not be fully featured.

Microsoft is quite late in the Document 2.0 game, compared to Web-based document editing pioneers like Google (for Google Docs) or (lesser known but arguably more powerful) Zoho Docs - to mention only a few mainstream players. Even OpenOffice 3.0 has been available online for quite a few months (on the Ulteo cloud computing infrastructure. Microsoft ensures cross-browser compatibility, and could have a few cards to play through a tight integration with its desktop software and its Azure infrastructure – wait and see.

Managing Information Overload and Photographs

Wednesday, July 1st, 2009

Documents are not only textual documents. Pictures, photographs, music and videos are taking up an increasing amount of space on our hard disks, and constitute a new breed of documents growing at an incredible rate. this is due to democratization of digital capture devices such as cameras, camcorders, and smartphones, the increasing storage space we have at our disposal, whether offline or online.

It has become really difficult to manage all of these documents - How will I find a picture of my kid in front of the Eiffel tower in 10 years from now? One way is to be disciplined in the way we organize them and add metadata  -But how can they be handled for pictures where metadata is not available? Not to mention this metadata is often subjective (e.g. content of a photograph – should I use Paris, Eiffel, Tower, Trocadero? )

What if the actual content of pictures and photographs could be used as the source of your search…

Computer Vision is progressing, and a number of technologies are under development to automatically analyze the content of a picture, and find other photographs with similar content. There are many technologies for managing large image repositories, at Xerox and elsewhere. I particularly like this example application though, because it’s live and works off a real database – it lets you search in a database of 10 million images for images similar to the one you have selected or uploaded. Try it with your holiday pictures, it’s quite impressive – especially in the accuracy of the results.

Xerox voted top green outsourcer in 2009 Green Outsourcing Survey

Monday, June 22nd, 2009

The report published by The Brown-Wilson group positions Xerox as the top ranking green outsourcer ! This report explores how new economic dimensions are impacting the growth of the sustainability technology sector.

Xerox is ranked “greenest” in the Document Process Outsourcing area by its clients, which is not too surprising. The criteria used included sustainability metrics, social and economic principles, environmental principles, LEED (Leadership in Energy and Environmental Design) Green Building Rating system, and Six Sigma.

But even better, when asked to nominate which outsourcing companies are the “greenest”, Xerox comes first with an astounding 440 nominations! That puts Xerox in front of Accenture (429), CSC (403), CapGemini (396) and IBM Global (390). HP / EDS comes 10th, with 259 nominations.

That is a huge progression from last year’s 35th position in that same ranking. That shows that customers now see Xerox as the trusted outsourcing partner that can take them on the journey to the “Less Paper Office” – reducing overall carbon footprint of their infrastructure, using less paper, less energy, generating less waste, but also optimizing their Document Business Processes to remove paper – when appropriate- and improve overall quality.

That’s what we call “Smarter Ways to Green”: click on the video below to learn more.

Could Future Document Formats prevent the next financial crisis?

Friday, June 19th, 2009

This interesting article points out that XML-structured document formats such as XBRL (eXtended Business Reporting Language)  could ensure a much tighter reporting and control over financial institutions and companies – and maybe avoid the next financial crisis?

XBRL started in the late 90’s and defines a XML schema for the exchange of financial information between companies, accountants and the SEC – including “semantic” information, which can be extracted very easily.

Starting this year, larger companies will have to submit their reports in this document format which can be programatically analyzed and validated - this will be a dramatic change from the current submission of html, pdf, ascii or anything else, which SEC analysts had to parse and analyze manually, and in case of an error get back to the filers much more quickly. Plus, this information will be available to anyone else, since this is public information, for analysis and others.

“Semantic” XML-based vertical document formats will be the next wave for Document formats. HL7 or other formats in the Health Care domain will help dramatically increase the throughput and reduce the errors in Health Management. There are quite a few out there already, but the Future of Documents will be made of many of these vertical schemas which will be a dramatic element in improving Document-Intensive Business Processes.

PDF/A growing acceptance for archiving

Thursday, June 18th, 2009

This press release from the PDF/A Competence Centre confirms that PDF/A is gaining acceptance for records management and long-term archiving. Half of the organizations surveyed had plans to use PDF/A in the next 12 months. At the same time, older archiving formats such as TIFF, JPEG or simple PDF decreased by about 5 percent. “Nearly all archiving projects use PDF/A” is quite a misleading title though, the current penetration of PDF/A is still small, as only 16% of surveyed use it actively (although 75% plan to use it actively).

PDF/A is based on PDF 1.4, and became a published standard on October 1st, 2005. It is a stripped out version of PDF, which is intended for long-term compatibility. It is actually offers two levels of compliance: PDF/A-1b is the predominant one,  while PDF/A-1a conserves reading order and adds ”searchability” (e.g. OCR for paper documents). A new version of PDF/A is in the works – PDF/A-2, which will add selected features from later PDF versions (1.5 to 1.7).

So is PDF/A a good long term storage format? Yes, I still think so. The files are relatively large, but they are totally self-contained, which is vital for very long-term conservation – and you have the assurance of having a software to read it twenty years from now.  “searchability” are very important for the short term. “Reading Order” can be very important if you want to apply Natural Language Processing to analyze this data – but these technologies are only starting.

Both features are “standard” (and easy) for native electronic documents, but for paper documents, they are not that easy- and in a few decades OCR and Intelligent Document Recognition will have improved so vastly that the image document (embedded in the PDF/A) will be the best source to extract both reading order and text information so well that saving it today does not make so much sense. However, in the meantime, they allow a good indexing and metadata for your documents.

As always though, file format is just part of the story – you also need to make sure you’ll be able to read your archival medium  a few years from now… Who still owns a zip or floppy drive?