05 November 2021

Extractor and Plotter are both Sudoscan components responsible for acting on a given image. The Extractor is responsible to "parse" an image and generate a computational representation of the Sudoku Puzzle. While the plotter is responsible to draw the puzzle’s solution back to the original image.

Both components dependens on Image Processing tasks. This is a process of transforming an image into a digital form and performing certain operations to get some useful information from it.

OpenCV is an open-source Image processing library that includes several hundreds of computer vision (CV) algorithms. It is essentially a C++ API, it can perform some real-time operations because it is very fast and lightweight.

Since OpenCV is a C++ API, it can’t be used directly on JVM, however there are some projects that wraps its native interfaces. The chosen project chosen for Sudoscan was JavaCV.

JavaCV uses wrappers from the JavaCPP of commonly used libraries by researchers in the field of computer vision and provides utility classes to make their functionality easier to use on the Java platform, including Android.

The most appealing reason to use JavaCV among all possible alternatives is the use of JavaCPP for packing native dependencies (binaries) for different platforms/architectures. To build an application using native binaries for a specific platform, it’s just necessary to add an extra configuration during build time. For instance, the command gradle -PjavacppPlatform=linux-x86_64 build with only linux-x86_64 binaries on the generated artifact.

The code for both Extractor and Plotter can found be found on sudoscan-api sub module.

Extractor

As mentioned above, the Extractor is responsible for "parsing" an image. For simplicity, let’s assume the given image is the image below:

Unsolved

The "parsing" process can be diveded into 4 different phases:

  1. Pre-process image;

  2. Puzzle detection and and frontal perspective;

  3. Grids removal;

  4. Cell split and content enhance.

These phases are described in more detail in the sections below.

Pre-process image

The first step of the entire extraction process consists of simplifying the given image. To achieve this state, a series of filters are applied to the image, they are:

  • Convert to Grayscale: subsequent steps don’t rely on color;

  • Blur the image: a gaussian blur is applied to reduce noise obtained in the thresholding algorithm (adaptive thresholding);

  • Apply Thresholding: separate an image into regions (or their contours). The process is called segmentation. Thresholding is a way to segment such regions;

  • Invert colors: converts white into black and black into white;

  • Dilation: the noise reduction of the applied Gaussian Blur shrinks the object. The dilation is a way to "revert" the shrinkage.

After applying those filters, the image should be transformed into this:

Filtered

Puzzle detection and frontal perspective

After preprocessing the image, it’s time to detect the Sudoku Puzzle in the image. To find it, it was used the OpenCv's findContours function. This function will find external contours (boundaries of shapes having the same intensity). The function will find a list of potential objects (polygons) that can be found in the image. However, it’s assumed that the given image is focused on a Sudoku Puzzle, so the object with the largest area is selected.

With the largest polygon selected, it’s time to crop it from the original image and transform the cropped image into a frontal perspective. To achieve the frontal perspective, it was used OpenCv's getPerspectiveTransform and warpPerspective functions.

After those transformations, the image looks like this:

Perspective

Grids removal

Now is time to reduce the "noise" from the image. Since the objective is to identify all cells' content from the image (an empty cell or a numeric cell), the grids are extra visual information that’s not important. That being said, it’s useful to get rid of the grid lines.

A structuring element is passed to erode and dilate in order to clean up the grid lines. The function findContours is again used, but this time with the help of approxPolyDP to identify the vertical and horizontal lines. With an identified line, a thicker empty rectangle is drawn in order to override the line.

The resulting image of this phase is:

NoGrids

Cell split and content enhance

At this point, the initial image is completed and prepared to be split. The image is divided into 81 (9x9) smaller blocks. Every block is used to create a SudokuCell.

A SudokuCell is a computational representation (a Kotlin class) of a Sudoku cell. This component is capable of identifying if a cell is empty (in case at least 10% of the total area of the cell has any content) and enhancing its content in case it’s not empty.

Merging all cells together after the enchacement generates the following image:

Enhanced

A set of SudokuCell is called a PuzzleCells. This component can generate a Puzzle representation with the help of a Recognizer.

Plotter

While the Extractor uses an image to generate a computational representation (Puzzle), the Plotter generate an image from the computational representation, change it’s perspective to the same of the original image and finally "paste" the result over the original image.

Generate the solution image

After the final solution is found by a Solver, the original empty cells are known numbers. With this information, a new image (with the same size as the frontal perspective image) is generated with the new numbers found in the solution.

The solution image:

Solved

Revert perspective

After the solution image is generated, it’s perspective must change to "fit" the original perspective of the original image. Again, using OpenCv's getPerspectiveTransform and warpPerspective functions, a transformed image is generated.

The reverted perspective of the solution image is:

Solved

Overlapping solution

The last step of the plotting consists of overlapping the original image with the reverted perspective image. A simple use of OpenCv's bitwiseAnd function can handle this operation.

Finally, the final solution can be viewed as:

Solved