Approximately 60% of the law firm's files are PDF files, and 1/3 of these PDFs are Image Only. The content of PDF files which contain only images cannot be searched.
The legal firm asked DMC for assistance with scanning their existing SharePoint Document Repository's 700,000+ files and converting Image Only PDF documents to searchable documents using Optical character recognition (OCR).
In order to help the law firm's staff quickly locate key documents, DMC built an application to first scan all existing documents already in SharePoint to determine which were Image Only PDFs. These documents were then processed by an OCR module built upon the Aquaforest OCR SDK to render the textual content searchable via SharePoint. The legal firm's SharePoint document repository of 700,000 files was scanned and converted in approximately 45 days, with a 96% success rate of adding a searchable text layer to image-only PDF files.
A simple SharePoint keyword search now instantly retrieves a list of all files containing the specified keyword(s), providing quick access to the information in all of the client's document files, saving vital time for their employees and customers.
Since implementing the original SharePoint OCR application, DMC has upgraded the application for compatibility with SharePoint 2010, 2013, 2016, and Office 365 SharePoint Online. Features have also been added to identify newly uploaded PDF files and OCR them multiple times daily, as well as the ability re-scan specific sites and libraries.
For more information on our SharePoint OCR Solution, please Contact Us.