This short article discusses the MJPEG USB camera simulation in Linux with FFmpeg and a V4L2 loopback device. USB and CSI cameras, alongside GigE Vision cameras, are the main visual data source in robotics and other industrial applications.
They have significant advantages over RTSP cameras:
Everybody loves benchmarking, and we love it, too! We always claim that Savant is fast and highly optimized for Nvidia hardware because it uses TensorRT inference under the hood. However, without numbers and benchmarks, the declaration may sound unfounded. Thus, we decided to publish a benchmark demonstrating the inference performance for three technologies:
PyTorch on CUDA + video processing with OpenCV;
PyTorch on CUDA + hardware accelerated (NVDEC) video processing with Torchaudio (weirdly, video processing primitives lie in the Torchaudio library);
The 1st is what most developers usually use as a de-facto approach. The 2nd is used rarely because it requires a custom build, and developers often underestimate hardware-accelerated video decoding/encoding as the critical enabling factor for CUDA-based processing.
Savant 0.2.7 was released on February 7, 2024. The release includes several bug fixes, four new demos, and other enhancements, including documentation and benchmarking.
Savant crossed the 400-star band on GitHub, and Discord is now the place must-have-to-join. The work on the release took three months. In the following sections, we will cover essential parts of the release in detail.
IMPORTANT: Savant 0.2.7 is the last feature release in the 0.2.X branch. The following releases in the 0.2.X branch will be maintenance and bugfix releases. The feature development switches to the 0.3.X branch based on DeepStream 6.4 and WILL NOT support the Jetson Xavier family because Nvidia does not support them with DS 6.4.
Savant gives developers a highly efficient inference based on TensorRT, which you usually must use when developing efficient pipelines. However, because of the particular need, you may need to integrate the Savant pipeline with another inference technology. In the article, we show how Savant integrates with GPU-accelerated PyTorch inference.
You can also use the approach if you are PyTorch-centric and happy with it but need efficient infrastructure for video processing: transfer, decoding, and encoding.
When deep neural networks are evaluated with the CUDA runtime, the input of the model and its output are allocated in the GPU memory. The next step is to extract high-level data like bounding boxes, attributes, or masks from raw GPU-allocated tensors.
Transformer models become gradually more popular in computer vision. Even a couple of years ago, nobody broadly used transformers for computer vision. However, transformers have significantly changed the landscape of sophisticated deep learning solutions, primarily in natural language processing and generative AI.
Savant uses the TensorRT engine for model serving. TensorRT is an optimization ecosystem compiling models into highly efficient engines optimized for particular hardware, batch size, and precision. After that procedure, the model’s inference speed increases significantly, but the process is resource- and time-consuming.
Monitoring as a part of software observability is crucial for understanding the state and health of the system. Video analytical and computer vision pipelines also benefit from monitoring, allowing SRE engineers to understand and predict system operation and reason about problems based on anomalies and deviations.
Computer vision pipelines represent complex software working in a wild environment, requiring continuous observation to understand trends and correlations between internal and external factors.
TensorRT is a high-performance deep learning inference optimizer and runtime library developed by NVIDIA. It’s specifically designed for production environments and optimized for NVIDIA GPUs. The primary goal of TensorRT is to accelerate deep learning inference, which is the process of using a trained neural network model to make predictions based on new data.
NVIDIA DeepStream SDK is a game-changer technology for deep neural network inference in computer vision served with NVIDIA hardware. The optimized architecture accounts for the specifics of NVIDIA accelerators and edge devices, making pipelines work blazingly fast. The core of the technology is TensorRT, which consists of two major parts: the model optimizer, transforming the model into an “engine” heavily optimized for particular hardware, and the inference library, allowing for rapidly fast inference.
Another DeepStream’s killer feature is connected with the CUDA data processing model: computations are carried on with SIMD operations over the data in a separate GPU memory. The advantage is that the GPU memory is heavily optimized for such operations, but you need to pay for it by uploading the data to the GPU and downloading the results. It can be a costly process involving delays, PCI-E bus saturation, and CPU and GPU idling. In the ideal situation, you upload a moderate amount of data to the GPU, handle it intensively, and download a moderate amount of results from the GPU at the end of the processing. DeepStream is optimized and provides developers with tools for implementing such processing efficiently.
So why do developers hesitate to use DeepStream in their computer vision endeavors? There are reasons for that we will discuss in further sections, and find out how to overcome them.