When everyone is talking about going digital, many like me who loved paper books when growing up, will still like to go back to those good old days. I loved the smell of paper and still buy paper books.
However, I have also seen the downside of not going digital. Two years back, when I was in Helsinki to meet a partner I had a handwritten address of the partner’s office. One character was flipped in the handwritten address in such a way that the taxi driver understood it differently, and I landed at other side of the city. My colleagues waiting for me in the partner office had to apologize many times to the partner for my delay.
When I reached the right venue 30 minutes later and showed them the address paper, everyone had a good laugh. It stayed on their Facebook page as the most popular address puzzle for a week. After this incident I stopped using handwritten addresses.
Recently me and my two colleagues Daniel Klar and Alexander Trygg have spent a considerable amount of time to find solutions to help customers move from paper-based and other none digital solutions to digital ones, using robotics process automation (RPA). However, we realized that we needed to create intelligent robots that could think like humans to solve this specific problem. Let’s hear from my colleagues.
Question: Is it possible to make robots that reads as humans?
Not only can robots read for us, they can also act on the information they read for us.
We have found that if we leverage Optical character recognition (OCR) technology within RPA and enhance the robot with the help of machine learning; it becomes an incredibly powerful tool for creating intelligent solutions which can read and react to information in documents and images.
Question: How can we ensure it works? I have used OCR technology before and I sometimes actually can’t comprehend the information if the image quality is not good or the documents are not scanned properly.
Yes, you will have issues comprehending the information in a scanned paper document or an image if the solution is not intelligent enough find a way to locate the information it is looking for. Hence, we wanted to build intelligent robots who can find the information itself as long as we can point it to the right section of the document or image.
The power of RPA while reading documents basically boil down to three parts: handling, reading and acting. I’ll explain the handling and reading part and Daniel will cover the challenges of reading and acting further.
Since RPA has the capability of mimicking human behaviour, it can gather and handle the documents in multiple ways depending on how it looks (just as a human would do). Whether it is getting it from an email, a file server, a piece of paper or going into a system and replicating the human actions necessary to generate the document, RPA can handle it.
The difficulty of reading a document spans a huge spectrum. It can be anything from finding a word in a digital document (which is trivial), to understanding which part of an image is interesting and then extracting and understanding the data in it, which is a bit trickier.
RPA as a tool becomes an important glue to hold together and coordinate different technologies and ways of making sense of the documents. RPA tools like UIPath provide easy interfaces with OCR and Cognitive Services from technology providers like Microsoft and Google by default. This increases the speed of automation when it comes to reading scanned document and images.
Question: Thanks Alexander for this great insight! Let us understand from Daniel now, the challenge of reading and comprehending a scanned document and how we can ensure that the robots is searching for the right information to act on, irrespective of its position in the image or document.
We recently worked on a project which combined RPA, OCR and Machine Learning to make sense of scanned invoices. We found that a benefit of working with invoices was that they usually followed a definite format, with specific information found in roughly the same place on every invoice. At the same time, unlike how you would read a page in a book (from top left to bottom right), an invoice has bundles of information found in different parts of the page. This confused the OCR engine inside our robot as it does not know how to comprehend the extracted text, leaving you with an unreadable text.
Here is where the power of machine learning came into play. By using cluster analysis on the OCR output after reading a lot of different of invoices, we managed to train a machine learning model to find relevant areas of the scanned documents (see illustration below). This meant that the machine learning model could guide the OCR engine to search within a given perimeter for specific information, such as the invoice number or customer address. We then utilized the OCR technology only on these areas to extract the information with fast, accurate and understandable results. The below graph shows the cluster analysis results.
Question: How can robots act after they have understood the information they have read from scanned documents and images?
Daniel Klar: Well, that depends on the case. In the case of the invoices we wanted to execute a process that inserts the data into an ERP-system, which RPA can do efficiently without the occasional errors that might occur if it was human tasked to do this repeatedly.
The strength of RPA is that it is a powerful execution platform which can easily be combined with the analytics part to make a process become fully autonomous or be able to consult humans for the hard cases while managing the heavy lifting.
Want to know more about RPA? Read this article by Neelima Misra about how and why robots fit into your organisation.