System Description

In this section, you will learn what the recognition system is, its basic features and the principles of its operation.

Our system is supplied as a compiled library with C++ header files.

There are various ways of library integration:

As services and desktop applications;
Mobile applications (native and based on popular frameworks);
PWA and web pages (WebAssembly module).

The C++ interface allows you to create wrappers for any popular programming language.
We provide wrappers for Java, C#, ObjC/Swift, Python3, PHP7/8 etc., for any particular environment.

The following wrappers are used:

SWIG — for С#, Java, Python, PHP;
Emscripten — for WASM.G;
No wrapper is required for ObjC.

Delivery Package

API in C, С++, C#, Java, Python, PHP, Javascript/webAssembly;
Samples in C/C++/C#/Java/PHP/Python/Javascript;
Frameworks: React Native, Flutter;
REST API for low-code.

Library Interface

The interface of our products consists of two parts:

The primitives interface — secommon. It is common for all the products;
The recognition interface. It is individual for each product.

The primitives interface secommon is used for:

Handling images (creating, cropping, transforming, projective correction, masking areas);
Working with primitives. Creating primitives for cropping, extracting primitives from the recognition result;
Specifying string iterators;
Outputting errors.

The recognition interface is designed according to the same principle for all products. It includes four basic objects:

Recognition engine;
Creating an Image Object;
Recognition session;
Session settings;
Recognition result.

Recognition Engine

Recognition Engine is the object where all recognition tools are stored and initialized. It is created using the appropriate configuration bundle containing all possible settings: a list of documents supported by a specific SDK, their fields, a list of possible authentication checks, and so on.

INFO

In special cases, the bundle is not supplied separately, but as part of (inside) the library.

Initialize the engine in a single instance. One instance allows you to create multiple recognition sessions. However, creating multiple instances of the engine is also possible.

The initialization process is a resource-intensive operation (as is the image analysis itself), so perform it off the main UI thread.

When Initialization Starts

At the start of the application

If you expect to use the engine constantly, it is convenient to start initialization at the start of the application. This is a standard option for server-side solutions.

At the first access to the recognition resources

This may be the moment when the device camera opens or when a dialog box offering to select a file from the device file system appears.
Opening the screen with the camera takes some time, and in many cases it is done simultaneously with the engine initialization. On lower-spec devices and if the used bundle is large, it may take the engine several seconds to get ready for recognition.

Types of Initialization

Lazy — initialization of all the resources involved at the first access to them (as a rule, when the first frame is processed);
Delayed — initialization of all the resources involved when the recognition session is created;
Not lazy — initialization of all the resources involved when the engine is created.

Bundle

Configuration bundle file is a file containing all the resources needed for the recognition library to work. Bundles are interchangeable within the same product of the same version, and can also be embedded inside the library if necessary (for example, if access to the device file system is limited).

Creating an Image Object

Pass an image of the special class se.common.image to the system for recognition. You can create it using the following image formats:

jpeg, png;
tiff (✔️TIFF_LZW, ✔️TIFF_PACKBITS,✔️TIFF_CCITT);
base64 (above mentioned formats);
file buffer with a preliminary indication of the color scheme, width\height\number of channels.

The maximum allowed image size by default is 15000x15000px. You can change the maximum image size.

Handling a HEIC file

A HEIC file in the mobile SDK are handled similarly to other image formats. The HEIC is read using system tools.

In the server SDK, open the HEIC format using external tools and convert it either to one of the formats we support, or transfer the raw pixels directly as an RGB buffer (this is recommended).

Recognition Session

Recognition session is the process of recognizing a physical document, such as a specific passport, driver's license, and so on.

The process of recognizing a physical document means handling one or a series of images of the same document.

The system can recognize a document with high accuracy even using a single image. However, using the results of recognition of multiple images significantly improves quality. The system combines recognition results from different frames. This allows you to recognize documents in conditions of poor lighting, glare, and other adverse factors with a high confidence. The mechanism that decides whether to recognize additional frames or to stop the process is called terminality.

Terminality is automatic stop of the recognition process in video stream. It takes the true value in two cases:

If new frames are added the recognition result will not change;
The session timeout specified in the settings has been reached. The recognition session is created with settings prepared in advance. Sessions are independent of each other, what allows you to use multiple sessions simultaneously if necessary (using the same or different settings).

Session Settings

A personal signature is provided to the customer with the product. It is contained in the README.html file in the /doc directory.

Each time an instance of the recognition session is created, the signature must be passed as one of the arguments. This confirms the caller's right to use the library and unlocks it.

Signature is verified offline. The library does not access any external resources.

Session settings is an object storing:

The list of the supported documents for recognition, grouped by internal engines. It is set in the configuration bundle with which the engine was created (read-only);
Advanced information about documents, including links to PRADO (read-only, used in Smart ID Engine only);
The list of documents submitted for recognition (* by default);
The list of expected fields for recognition (all by default);
The list of document sets (the mode parameter, default by default);
The special session options: the number of recognition threads, the expansion of the field list, the session timeout, and so on. You can find the full list of the options in our documentation.

Internal engines are groups of documents organized in such a way that their search and recognition algorithm works as efficiently as possible.

The list of documents for recognition is generated within the configuration bundle. Inside it, documents can be grouped by internal engines. For example:

All the documents of one country;
The passports of all countries;
All the ID documents of the CIS countries and so on.

Bundles may contain multiple documents located in different internal engines. If such documents are specified in the session settings, the system will not be able to determine independently which internal engine to select.

To solve this problem, the mode parameter is specified for each internal engine. Create a session setting this parameter with the list of recignized documents.

Attention!

You can specify multiple document types in one session only if they belong to the same internal engine. In other words, one recognition session can only work with one internal engine.

You can obtain information about the internal engines, their set and modes, as well as in the documentation for each specific SDK from the session settings.

Recognition Result

Recognition result is an interface object that contains all the available information about the recognition result. Each parameter is output using a separate method, which allows you to use only the required data in integration. The set of the object fields depends on the type of the session in which recognition was performed, and this depends on the functionality of the product used.

Feedback

Recognition in video stream is a complex and often not quick process that requires user interaction. Providing information about the progress of recognition on the screen during operation, we help the user to understand the system operation stages better. For this purpose machine vision technologies are often complemented by augmented reality elements, what enhances the user experience and makes interaction more intuitive.

The recognition result object contains information about the coordinates of the quadrilateral of the document template (document borders) and the coordinates of the quadrilaterals of the document field templates. For example, using these coordinates, you can draw the found document borders on the screen using your camera preview.

Confidence

Despite the fact that the structure of the result is determined by the session type, all response fields have a common property — confidence.

Confidence is an estimate of the recognition result — a float number from zero to one. The higher confidence value, the more likely the system finds the document location in the image properly (it is called the template confidence) or recognizes the characters correctly (the fields and characters confidence).

isAccepted

isAccepted is a simplified version of confidence, which takes the true value if the system is sure in the recognition result.

Attributes

Each recognized element has a set of attributes representing metadata. For example, for a found template, its real DPI is calculated, for barcodes — the source encoding, and so on.

The General Workflow of Document Recognition

The cycle of object processing — recognizing a document (or an object, or just text in an image), comparing faces with each other, or checking liveness is common to all products and implies the following steps.

Creating a recognition engine;
Setting the recognition session options;
Creating the recognition session or a set of them;
Sending the image to the recognition session;
Evaluating the recognition result;
Transferring the next image if necessary;
Working with the final recognition result.

Session Types

Each product supports individual session type(s).

Smart ID Engine

The following session types are supported:

IdSession — for recognizing identification documents;
IdFileAnalysisSession — for analyze images for interference;
IdFaceSession — for non-biometric verification of faces and verification of liveness;
IdVideoAuthenticationSession — for various complex scenarios such as double-sided document scanning.

Smart Code Engine

CodeEngineSession session type is used to recognize codified objects. The following object types can be selected via session settings:

Bank cards;
Barcodes;
Phone numbers;
Bank card and account numbers;
Car license plates;
Readings of water/electricity/gas/heat meters and so on;
Various combinations from the list above.

Smart Document Engine

DocSession is used for recognizing all supported rigid and flexible shapes.

Smart Text Engine

TextSession is used for recognizing texts in images. Using session settings, you can choose either to search for text strings and recognize them, or to perform search basing on geometry features (for instance, if a photo of a piece of paper with text is submitted).

System Description ​

Delivery Package ​

Library Interface ​

Recognition Engine ​

When Initialization Starts ​

Types of Initialization ​

Bundle ​

Creating an Image Object ​

Recognition Session ​

Session Settings ​

Recognition Result ​

Feedback ​

Confidence ​

isAccepted ​

Attributes ​

The General Workflow of Document Recognition ​

Session Types ​

Smart ID Engine ​

Smart Code Engine ​

Smart Document Engine ​

Smart Text Engine ​