Ukrainian startup has created a neural network that is looking for faces without masks in the crowd.




The Ukrainian IT company Fulcrum has developed a neural network that can recognize people without medical masks in a crowd.

The company posted the project code on GitHub, and also described its creation process and technology. According to backend developer Sergei Kalashnikov, who worked on the project, the idea was to check whether faces without masks can only be found on webcams. This is a non-profit project, according to the developer, the team was rather driven by curiosity.

It took two weeks to train the neural network. The developers described the process of its creation on the blog. In short, he looked like this:

    The final version of the neural network used TensorFlow 2 Nightly, OpenCV 2, Keras, Yolov3. OpenCV - for image processing and creating "squares" with masks. Yolov3 - the "brain" of the neural network.
    We started with a simpler task: to train the network to find masks on images, and then move on to video processing. In the process, we created two applications. The first is written in Node.js, used to create shortcuts. It helps to compile data sets and transform the coordinates of the location of objects in the picture from Labelbox JSON into XML Yolov3-format.


 To begin with, it was necessary to determine the exact position of the mask (or any object). To do this, use the Labelbox website. It is convenient because it generates a file with the necessary settings: the location of the mask, the size of the image, the time spent on the image, etc. These files later fall into one of the mentioned programs.

For Labelbox, they wrote code that parses data from there. They are later distributed among other files, in the view necessary for the neural network to work with them. The program also creates anchors based on this data. They are used to determine the height and width of the mask, and how to scale it. The result is a final data set with images and captions to them.
    The second program is written in Python, it includes Yolov3 and teaches the neural network. Using this application, developers created their own model for recognizing objects in an image.



The maximum size for the recognized piece of image was set to 288 px. This number may be larger, it was chosen small in order to increase processing speed.
     num.epoch indicates the number of steps to learn. 30 training steps took 12 hours (with an image size of 288 px).


I had to write a separate script for the video. But he worked on the same principles that were used for image analysis. It is based on Yolov3. The team also set up openCV to download video and search for frames at a specific frequency. The program works like this: a video file is uploaded to a specific folder, and it starts to process the video frame by frame.
    Webcams usually record short videos of 10-15 minutes. These videos could be sent to a server where similar software would process them. This can be useful if a company or organization, for example, wants to ensure that all its employees wear masks.

The results of the network can be viewed on the video:
 
 



 

Comments

Popular posts from this blog

Svetlana Loboda and Till Lindemann together: the hottest and most tender photos of a sweet couple

List of old and new names of streets Kiev.