Automatic Text Categorization for e-Discovery


Technology can help in many document-intensive processes – even sometimes in the most difficult cases, where human was until now considered the only option. For example, Xerox Litigation Services is now leveraging Categorix, the Text Categorizer,to expedite the review of documents in a litigation case.

Categorix is a technology developed at the Xerox Research Centre Europe, which uses the textual information in a document. Machine-Learning based, it learns from a number of samples the vocabulary which is representative of each class it has to deal with (here “responsive” vs “non-responsive). Once trained, it can identify “responsive” documents with an accuracy actually higher than the human, and automate the typical review process by automatically tagging the documents where it is confident enough – leaving the more uncertain ones for human confirmation.

When considering a typical litigation involves a million documents, with average review costs around $1, such a technology can accrue major savings – not to mention speed and consistency, of course.

Post a comment

  -- required field
(not displayed publicly)
 

You may use HTML tags for style