Extract the meaning of documents !
Yesterday was a big press event here at the Xerox Research Centre Europe in Grenoble. it was quite a very exciting day with lots of cool technologies.
For once I will not talk about my own presentation
Rather I’ll speak about one of the exciting announcements that were made on that occasion, about a new technology called FactSpotter, which takes text mining to a different level, owing to very advanced Natural Linguistic Processing technologies: extracting the “meaning” from documents.
FactSpotter is an extremely advanced technology that can analyse the content of a textual document to reconstruct its “semantics”:
- It digs deeper in the content and actual words in a document, in order to find not only the words but also their context and relations.
- It can avoid some of the pitfalls that traditional linguistic techniques fall into, by advanced techniques such as word sense disambiguation (”he sits on the bank” – is bank the financial establishment ?), or can resolve complex linguistic ellipses such as coreferences (to whom does the word ”she” refer to ?) and even more complex ellipses. For example, it could know that ”Paris’ tallest document” in a document actually refers to the Eiffel tower, mentioned earlier.
- FactSpotter can recognize abstract concepts such as “people”, “buildings”, or “locations”.
- This “representation” of knowledge is then extractable using a simple, user-friendly interface, which allows queries to be expressed naturally rather than through complicated rules. For example, you can look for all sentences that talk about a person having said something in a given location.
This announcement was widely covered in Forbes, Gilbane, Yahoo, and many other online magazines. But to me, its importance goes way beyond the anecdotical “next gen web search engine” that many of these articles mention. The real importance to the Future of Documents is the fact that it now becomes possible to mine documents (unstructured by nature), or document collections, in a way very similar to structured data.
This will be key in providing services and analyzing huge sets of documents – address an organization’s document legacy and mine it (almost) as easily as a database. One of the key applications will be e-discovery for litigation, but other obvious domains would include intelligence, risk management, or medical research.

[...] the future with some of its unique “meaning extraction” technologies such as Factspotter. I Love Social [...]