Huawei's Pangu Finance OCR Large Model Helps GDRC Easily Recognize Account Information
Optical Character Recognition (OCR) is one of the earliest AI technologies to be used by enterprises to reduce costs and improve efficiency. It allows enterprises to identify and extract massive amounts of unstructured data generated from operations, like documents, tables, and pictures, and convert it into machine-encoded text. This reduces data storage volume, and archives can be analyzed circularly. At present, it is widely used in finance, insurance, healthcare, transportation, education, and other sectors.
Enterprises have developed more diverse requirements for OCR models as practical applications have matured. Take the Guangdong Rural Credit Union (GDRC for short) as an example. Employees have to record all kinds of information by hand, and they must be able to decipher other people's handwriting no matter how poorly written. They need to identify signatures written for deposit and withdrawal, compare specimen signatures with the newest handwritten versions, and identify handwritten credentials for check and remittance business. The combined services provided by traditional OCR vendors lack highly-adaptable algorithms and are simply not sophisticated enough to cope with new situations. Developers have to re-label the data and train a new model when dealing with a different identification scenario. This leads to high development and maintenance costs.
In addition, a lack of high-quality labeled data related to handwriting is one of the biggest challenges to the handwriting recognition precision of OCR models. Compared with general print forms, handwriting is irregular, cursive, and can be illegible. Therefore, it is far more difficult to recognize handwriting with OCR technologies. A large amount of synthetic data is used to improve the algorithm's precision regarding print form recognition, however, we cannot use such a method to train models to recognize handwriting.
To help the GDRC resolve its data difficulties, Huawei Cloud entered into a project with the union and provided it with the Pangu finance OCR large model. Through a unique self-supervised learning method that combines contrastive learning with mask image modeling, the model can learn from and make full use of large-scale unlabeled OCR data to train high-precision handwriting recognition models. This method needs only one-tenth of the labeled data volume demanded by traditional means. The large model improves field recognition precision from 83.9% to 91% compared with the small model used by the industry.
The GDRC can now use one model to identify general text across multiple scenarios. For example, the Pangu finance OCR large model can automatically identify handwritten information on bills such as dates, account numbers, account names, account opening banks, and amounts in both Arabic numerals and Chinese characters. The system automatically records the information before someone manually enters and checks it, which used to be done completely manually. This simplifies workflows and reduces the manual workload. Regarding electronic signatures stored in counter interactive terminals and VTMs/STMs, the large model can also recognize specific characters of signatures or the vector files of their dynamic data, and compare the static signatures with dynamic ones.
The precision of the Huawei Cloud Pangu finance OCR large model has been significantly improved after tests among 11 classic data sets (IIIT5K, SVT, IC13, IC15, SVTP, CUTE, etc.). Compared with the original leading OCR algorithms, the precision of the Pangu finance OCR large model has been improved by upwards of 5% on average.
As the concept of the mobile office is getting more popular, mobile OCR algorithms usually give up precision to pursue faster running speeds. The Huawei Cloud Pangu finance OCR large model allows enterprises to smoothly transfer knowledge from it to other large, medium, and small models with a size 1000 times smaller than it through knowledge distillation. In this way, the model can still run efficiently on various devices and recognize new types of bills, cards, and tables across industries. In addition, the large model can provide secondary training, which enables enterprises to quickly train new models and create OCR services. It reduces model customization costs, shortens the business rollout period, and alleviates some of the strain on employees.