Magic Lamp Software

Capture More Data with the Datacap Cloud OCR Connector

Posted on February 25th, 2021 Blog

The Cloud OCR Connector from MagicLamp provides seamless integration between Datacap and some of the most modern cloud-based recognition engines available on the market. Using the connector, Datacap will retrieve recognition results with lightning-fast turnaround times from Google Vision, Azure Computer Vision, or AWS Textract. The Cloud OCR Connector provides a guided configuration experience using the compiled ruleset interface, which helps to improve ease of configuration. 

Recognition is typically the most CPU intensive task in a Datacap workflow. Using the Cloud OCR Connector, customers can reduce processing time per page and improve the overall throughput of the system by offloading recognition onto a more powerful cloud platform. Decoupling recognition from the Datacap platform also allows Cloud OCR Connector customers to benefit from the continuous improvements being developed by some of the biggest players operating at the bleeding-edge of technology in 2021, without ever needing to install an update. 

The most significant benefit realized by MagicLamp’s Cloud OCR Connector customers is the drastic improvement to recognition quality, especially for handwritten text. The results of our internal testing of the character accuracy for handwritten documents are described in the table below. 

OCR A (out-of-the-box) Google Vision Azure Computer Vision AWS Textract 
17.01% 81.64% 43.18% 59.49% 

 Accuracy of key field characters achieved across 3 hand-written invoice samples. 

The cloud platforms achieve these results by leveraging advanced techniques like AI-driven natural language processing, which provides the recognition engine with context of the surrounding word to further improve accuracy when a character cannot be identified with enough confidence.

To illustrate this, imagine that the recognition engine is a contestant on Wheel of Fortune who is attempting to solve a puzzle where most of the letters are on the board and only a few blank letters remain. The contestant uses the context provided by the other letters to effectively guess the blank characters and solve the word. These sorts of techniques are highly effective for handwritten documents, as the results of our testing show.