Facial Identification With Savant, YOLOV5-Face, AdaFace and HNSWLIB

Facial re-identification is a commodity task in the CV field: there is no rocket science in doing that, at least academically. However, the commercial efficiency of such a solution is still a concern for customers. The article presents a high-performance pipeline developed with the Savant framework, which can be used in doorbell security or video content annotation systems.

Many people have ethical concerns about facial recognition systems applications in public spaces. We believe that governments and commercial organizations must steer clear of private life and avoid using such systems for illegal activities. Despite those unethical use cases, facial recognition software is still a crucial component of modern security systems used in enterprises, homes, and other spaces with higher security requirements like airports, bus and train stations, etc.

The Demo

The pipeline focuses on two tasks:

  • Building the ReID database using photo archive as a source;
  • Matching people in the video versus the database for identification.

Both tasks are built on the same technology stack:

  • Savant Framework;
  • YOLOV5-Face with landmarks for facial detection;
  • OpenCV CUDA for facial alignment based on the landmarks;
  • AdaFace for ReID generation;
  • HNSWlib for ReID storage and lookups.

Warning: Savant uses DeepStream and TensorRT for efficient inference, so the first launch of the pipeline requires ONNX compilation into TRT format. Models of the YOLO family are known to take an enormous amount of time to compile. Specifically, YOLOV5-Face takes up to 40 minutes to compile 🐒. Be patient, and don’t worry, CPU load indication is your friend:

When the model is built, the CPU load will slump.

What is Savant

Savant is an open-source, high-level framework for building real-time, streaming, highly efficient multimedia AI applications on the Nvidia stack. It helps develop dynamic, fault-tolerant inference pipelines that utilize the best Nvidia data center and edge technologies.

Savant is built on DeepStream and provides a high-level abstraction layer for building inference pipelines. It is designed to be easy to use, flexible, and scalable. It is an excellent choice for building advanced CV and video analytics applications for cities, retail, manufacturing, and more.

Facial Database Construction

The database is built with the following index builder command:

# if x86
docker compose -f docker-compose.x86.yml --profile index up

# if Jetson
# currently not supported
docker compose -f docker-compose.l4t.yml --profile index up

# Ctrl+C to stop running the compose bundle

The assets used for the index are located in the directory assets/gallery .

A person can have multiple entries to improve identification quality; follow the abovementioned notion.

Warning: When launching the index builder for the first time, it compiles the models to TRT format. Models of the YOLO family are known to take an enormous amount of time to compile. Specifically, YOLOV5-Face takes up to 40 minutes to compile 🐒. It may cause the image source adapter to crash. Just start the index builder again. The following

Facial Identification

The demo can be launched with the following command:

docker compose -f docker-compose.x86.yml --profile demo up

Then, visit http://127.0.0.1:888/stream with your browser or open rtsp://127.0.0.1:554/stream favorite video player. You should see the resulting video:

Source Code

The demo is published in the Savant repository on GitHub: https://github.com/insight-platform/Savant/tree/develop/samples/yolov8_seg

Along with the demo, you can find other samples covering how to use various models and build pipelines with Savant:

https://github.com/insight-platform/Savant/tree/develop/samples

Consider visiting our GitHub, and don’t forget to subscribe to our Twitter (X) to receive updates on Savant. Also, we have Discord, where we help users to onboard.