Form recognizer is one of the cognitive services provided by Azure Platform. It helps you avoid building your own OCR/ICR and ML models from scratch to extract information out of form documents. It uses machine learning technology to identify and extract text, key/value pairs and table data from form documents. It ingests text from forms and outputs structured data that includes the relationships in the original file. Form Recognizer is comprised of custom models, the pre-built receipt model, and the layout API.
Form Recognizer custom models train using your own data. You can train custom models using both Unsupervised and Supervised Learning. That is to say, in supervised learning you just need to provide sample form documents and it will learn the structure of documents on it own, while in Supervised learning along with sample documents you also provide labels for the information. The models trained with labels perform better and also able to produce good results in complex scenarios.
For the purpose of supervised learning, form recognizer provides a sample labeling tool which is an application that provides a simple user interface, which you can use to manually label forms (documents). You can either run this tool using docker locally on a machine or can deploy it on an an Azure Container Instance. In this post we are going to learn how to run sample labeling tool locally on Windows 8/8.1.
If you want to deploy Sample Label Tool on Azure Container Instance, read this post.
Step 1: Install Docker Toolbox
Docker Toolbox is an installer for setting up Docker environment on older Window Systems i.e Windows 7 and 8. If you have windows 10 in your system, please install Docker Desktop.
Your machine should have,
- 64 bit operating system
- Windows 7 or higher
- Virtualization enabled
1. To check whether your machine has virtualization enabled, visit this link. After you have made sure that your machine meet the requirements mentioned above, please visit Docker Toolbox github page and download the latest .exe file.
2. Start the installation of docker toolbox by double clicking the downloaded installer.
3. Press Next, with either checkbox marked or unmarked as you like.
4. Click Next, with default location or choose a directory as per your wish.
5. Click Next, with all the defaults selected and then Install. The installer will takes a few minutes to install all the components.
After installation is finished, installer will add Docker Toolbox, VirtualBox, and Kitematic to your Applications folder. Here VirtualBox will provide the Virtualization which is natively provided by windows 10 but not by windows prior to that and Kitematic provides an intuitive graphical user interface (GUI) for running Docker container.
Step 2: Verify Installation
1. On your Desktop, find the Docker QuickStart Terminal icon.
2. Click the Docker QuickStart icon to launch a pre-configured Docker Toolbox terminal.
If the system displays a User Account Control prompt to allow VirtualBox to make changes to your computer. Choose Yes.
The terminal does several things to set up Docker Toolbox for you and will take few minutes. When it is done, the terminal displays the
To verify if everything has been set up correctly, type below given command and press Enter.
docker run hello-world
The command does some work for you, if everything runs well, the command’s output looks like shown below.
Step 3: Pull Label Tool Image and run Container
1. Get the sample labeling tool image from Microsoft repository by running command given below.
Form Recognizer v2.0
docker pull mcr.microsoft.com/azure-cognitive-services/custom-form/labeltool
Form Recognizer v2.1 Preview
docker pull mcr.microsoft.com/azure-cognitive-services/custom-form/labeltool:2.1.012970002-amd64-preview
2. After label tool image pull is complete, run the below command to see the docker images available locally.
docker image ls
3. Note down the IMAGE ID for just pulled label tool image. Replace the <IMAGE ID> in below given command and run it in Docker Toolbox.
docker run -it -p 3000:80 <IMAGE ID> eula=accept
This command will make the sample labeling tool available through a web browser by running labeling tool image in a container.
4. Now run the Kitematic application by clicking its icon on Desktop.
5. Click on Use Virtualbox button.
6. If you have account in Docker Hub, you can login using provided screen or can click on Skip For Now button. Next screen will show you all the containers that have been created in docker environment.
7. Kitematic home screen will show running container in Green color which in our case is for Form Recognizer Label Tool. Click on Web Preview on right hand side, it will open Label Tool in web browser.
We will discuss how to setup credentials in Label Tool in a separate post. Going forward you don’t need to execute command in Docker Toolbox, you can stop or start the label tool container using Kitematic GUI.
We covered every step to set up a Form Recognizer Label Tool locally on a Windows 7 or Windows 8 machine in this post. If you face any problem, feel free to contact me through any of the provided social channels.