Facial re-identification is a commodity task in the CV field: there is no rocket science in doing that, at least academically. However, the commercial efficiency of such a solution is still a concern for customers. The article presents a high-performance pipeline developed with the Savant framework, which can be used in doorbell security or video content annotation systems.
Many people have ethical concerns about facial recognition systems applications in public spaces. We believe that governments and commercial organizations must steer clear of private life and avoid using such systems for illegal activities. Despite those unethical use cases, facial recognition software is still a crucial component of modern security systems used in enterprises, homes, and other spaces with higher security requirements like airports, bus and train stations, etc.
The pipeline focuses on two tasks:
- Building the ReID database using photo archive as a source;
- Matching people in the video versus the database for identification.
Both tasks are built on the same technology stack:
- Savant Framework;
- YOLOV5-Face with landmarks for facial detection;
- OpenCV CUDA for facial alignment based on the landmarks;
- AdaFace for ReID generation;
- HNSWlib for ReID storage and lookups.
Warning: Savant uses DeepStream and TensorRT for efficient inference, so the first launch of the pipeline requires ONNX compilation into TRT format. Models of the YOLO family are known to take an enormous amount of time to compile. Specifically, YOLOV5-Face takes up to 40 minutes to compile 🐢. Be patient, and don’t worry, CPU load indication is your friend:
When the model is built, the CPU load will slump.
What is Savant
Savant is an open-source, high-level framework for building real-time, streaming, highly efficient multimedia AI applications on the Nvidia stack. It helps develop dynamic, fault-tolerant inference pipelines that utilize the best Nvidia data center and edge technologies.
Savant is built on DeepStream and provides a high-level abstraction layer for building inference pipelines. It is designed to be easy to use, flexible, and scalable. It is an excellent choice for building advanced CV and video analytics applications for cities, retail, manufacturing, and more.
Facial Database Construction
The database is built with the following index builder command:
# if x86
docker compose -f docker-compose.x86.yml --profile index up
# if Jetson
# currently not supported
docker compose -f docker-compose.l4t.yml --profile index up
# Ctrl+C to stop running the compose bundle
The assets used for the index are located in the directory
A person can have multiple entries to improve identification quality; follow the abovementioned notion.
Warning: When launching the index builder for the first time, it compiles the models to TRT format. Models of the YOLO family are known to take an enormous amount of time to compile. Specifically, YOLOV5-Face takes up to 40 minutes to compile 🐢. It may cause the image source adapter to crash. Just start the index builder again. The following
The demo can be launched with the following command:
docker compose -f docker-compose.x86.yml --profile demo up
http://127.0.0.1:888/stream with your browser or open
rtsp://127.0.0.1:554/stream favorite video player. You should see the resulting video:
The demo is published in the Savant repository on GitHub: https://github.com/insight-platform/Savant/tree/develop/samples/yolov8_seg
Along with the demo, you can find other samples covering how to use various models and build pipelines with Savant: