Robô de reconhecimento óptico de caracteres – (Parte 12/12)

Optical character recognition robot – (Part 12/12)

June 22, 2024 Roberto Magalhães

This article provides step-by-step instructions to make a robot capable of converting images into a computer-processable format in the form of plain text using Raspberry Pi and a webcam server where we can stream live video over a local network. A software package used in this tutorial for camera interface is Motion, which is open source software with several configuration options that can be changed according to our needs. Here the settings must be made so that it is possible to create a remote webcam that runs on a Raspberry Pi, which will allow viewing from any computer on the local network to control the robot in areas outside the line of sight. Step-by-step instructions for optical character recognition are discussed below.

Prerequisites and equipment:

You will need the following:

A Raspberry Pi Model B or higher.
A USB WiFi adapter (Edimax – 802.11b/g/n wireless nano USB adapter is used here).
A USB webcam with USB microphone/microphone (Logitech USB Webcam is used here).
Robotic accessories (wheels, motors, chassis and motor drive circuits)
An SD card updated with the Raspbian operating system (here's a guide if you need it)
Access to Raspberry via keyboard and monitor or remotely.

In this project, our ultimate goal is to find and solve the different requirements in building a web-controlled robot that recognizes and converts text messages placed in the real world into computer-readable text files. Our goal is to integrate the appropriate techniques to explain and prove such capability, using their limited hardware and software capabilities, and not to develop new character recognition algorithms or hardware to do so. The aim of our work is to provide an internet-controlled mobile robot with the ability to read characters in the image and distribute character sequences. Our approach requires the following techniques:

Web UI control for robotic movement – Using PHP and Java scripts.
Webcam server for live streaming of videos – Using the Motion Software package.
Camera snapshot control – using python script.
Optical character recognition for image-to-text conversion.

UI web control for robotic movements:

The user interface for controlling the motors that control the robot's movement is made using the same technique used in home automation using Raspberry Pi. Java script is used in creating the graphical user interface and PHP used for communication between the GUI and the GPIO of the Raspberry Pi. All the technical details included in this are well discussed in that article.

Figure 2: Web Control Interface GUI on Raspberry Pi using Java Script and PHP

The above image is the GUI of the web control interface, it has 4 blocks which are control block, live broadcast block, text output block and captured image block. The control block includes four direction switches and a stop switch. The live streaming block is used to drive the robot in areas outside the line of sight. The last captured image is shown in the next block. And finally the text output, which can be used digitally, is provided in the last block.

Webcam server for live streaming of videos:

Creating a live streaming server with Raspberry Pi is discussed in detail in the previous article. Here we have to include this server with the newly created interface for the robot. This can be done by including the video url in the index.php file that is created for the server explained in the previous article.

Installing the Optical Character Recognition (OCR) Engine: Disha Karnataki

The OCR engine converts the image file that we capture in real time into a text file. We are using the Tesseract OCR Engine. It is compatible with Raspberry Pi and does not require an online connection to convert images to text.

First, install tesseract and type the following command:

sudo apt-get install tesseract-ocr

Next, test the OCR engine.

Select a good image that contains a piece of text and test tesseract by following the command:

image tesseract.jpg

Where image.jpg is the image taken by the Raspberry Pi camera for testing purposes and o is the file in which the output text will be saved in text format, Tesseract will make it o.txt, so there is no need to add the extension . Now you have to wait a few minutes, OCR consumes a lot of processing power. When finished processing, open the.txt. If OCR didn't detect any text, try rotating the image and running tesseract again.

Configuring python code for OCR functions:

OCR includes a series of steps that must be performed one by one which makes PHP coding difficult, so we are choosing python script with a web server which is capable of performing all the functions with one click on the PHP web server. Python script can be written to handle system commands and read shell output by system function by importing operating system library. Copy the camera_python encoding folder to your Raspberry Pi home folder and run the command using the following command. (Make sure the camera is connected to the Raspberry Pi's USB port. This python script includes the following actions.

Motion service stop (live broadcast).
Taking a snapshot.
Running OCR engine.
Starting the move service.

As our Apache server will start automatically after every restart. The pending work is to automatically start the python script after each restart. Making the python script run at startup is well explained in this article.

Doing it all together:

STEP 1: Each distinct part of the robot is complete, it is time to integrate them to function as a whole part. First is to start the Apache server, discussed in previous articles. Download the OCR_Robo encoding files, extract and place them in the apache web server folder which is /var/www and check the availability of the web server. Note: There are different techniques discussed in the previous articles to know the IP address. Go through each line of all encoding files and make changes in the IP address field to your own IP address.

Fig. 3: OCR_Robo encoding files

STEP 2: Download the Camera_python encoding files and place them in the Raspberry Pi home folder. Create a file called launcher.sh using the following command.

sudo nano launcher.sh

And enter the following code as shown in the image:

#!/bin/sh

CD /

cd home/pi/camera_python

sudo python camera.py

CD /

And make it executable and able to start on startup by following the steps given in the previous article.

STEP 3: Make connections according to the circuit diagram. Optocouplers are used to protect the Raspberry Pi from overvoltage risks.

Now the robot that can be controlled by web interfaces and capable of optical character recognition is ready.