NVIDIA DeepStream SDK is a game-changer technology for deep neural network inference in computer vision served with NVIDIA hardware. The optimized architecture accounts for the specifics of NVIDIA accelerators and edge devices, making pipelines work blazingly fast. The core of the technology is TensorRT, which consists of two major parts: the model optimizer, transforming the model into an “engine” heavily optimized for particular hardware, and the inference library, allowing for rapidly fast inference.
Another DeepStream’s killer feature is connected with the CUDA data processing model: computations are carried on with SIMD operations over the data in a separate GPU memory. The advantage is that the GPU memory is heavily optimized for such operations, but you need to pay for it by uploading the data to the GPU and downloading the results. It can be a costly process involving delays, PCI-E bus saturation, and CPU and GPU idling. In the ideal situation, you upload a moderate amount of data to the GPU, handle it intensively, and download a moderate amount of results from the GPU at the end of the processing. DeepStream is optimized and provides developers with tools for implementing such processing efficiently.
So why do developers hesitate to use DeepStream in their computer vision endeavors? There are reasons for that we will discuss in further sections, and find out how to overcome them.
All the obstacles can be divided into two groups: technical and organizational. Let us begin with the first group because it is a real blocker preventing broad adoption.
You cannot just write a linear declarative program in DeepStream because it is just a set of plugins on GStreamer: you need to learn and investigate a cumbersome GStreamer API to create a non-trivial pipeline. The GStreamer pipeline is a multi-threaded, asynchronous program controlled by signals passing through the list of elements. All those elements must negotiate their inputs and outputs and behave predictably.
There are plenty of difficulties related to GStreamer programming:
- Complex Architecture: GStreamer has a complex architecture involving elements, pads, pipelines, and buses. Understanding how these components interact requires a steep learning curve, especially for those new to multimedia processing.
- Multimedia Concepts: GStreamer is used for processing audio and video data, which involves intricate knowledge of codecs, containers, streaming protocols, and other multimedia concepts. This can be daunting for developers without a background in multimedia technologies.
- Documentation and Examples: While GStreamer has documentation, it can be overwhelming and sometimes lacks clear, practical examples. This can make it difficult for beginners to get started or for experienced developers to solve specific problems.
- Debugging Challenges: Debugging GStreamer applications can be complex due to the asynchronous and real-time nature of multimedia processing. Identifying and fixing issues in a pipeline can be time-consuming and requires a deep understanding of both GStreamer and multimedia processing.
- API Complexity: GStreamer provides a comprehensive API to cater to various multimedia needs. However, this breadth can make the API seem complex and intimidating, especially for those only interested in a subset of its functionality.
- Performance Optimization: Achieving optimal performance with GStreamer requires a good understanding of the framework and the underlying system. This includes knowledge of threading, buffering, and hardware acceleration, which can be challenging to master.
- Cross-Platform Issues: While GStreamer is cross-platform, developing applications that work seamlessly across different operating systems and environments can introduce additional complexities.
- Community Support: Although there is a community around GStreamer, it may not be as large or as active as other more popular frameworks, sometimes making finding help or resources more challenging.
So, imagine that someone with a Python and PyTorch background is required to dive into an additional framework like GStreamer. It is a natural blocker for many if not all, developers. They just do not have enough time and motivation for it.
Low-level C++ API
How often do you meet a deep learning engineer familiar with C++? Knowledgeable people can object that DeepStream SDK provides Python bindings, but let us be clear – they just reflect and extend the GStreamer’s C++ bindings. Thus, it is possible to implement pipelines with Gst-python but nothing more; you still need to understand the low-level GStreamer programming model in depth.
Lack of Application-Level APIs
NVIDIA is a chipmaker company. However, understanding that developers use hardware to create something meaningful, they develop various software, enabling access to gaming, science, and artificial intelligence hardware functions. However, they focus primarily on fundamental functionality, crucially required by system software. DeepStream is a core system that enables efficient access to hardware capabilities, including hardware-assisted video decoding and encoding, geometry operations, and TensorRT API. They assume that skilled engineers who know GStreamer and code in C or C++ are provided with enough utensils to craft the missing pieces of the whole puzzle. This is a fair point, and we cannot blame them for that, but it limits the adoption of DeepStream significantly.
People in the industry need APIs, which support handling real-world scenarios without extra hassle involving low-level programming. In computer vision and deep learning, engineers wish to use familiar technologies but not to become game developers or system programmers. DeepStream gives no design for integrating with external systems (SDK, not a framework) and almost ignores developer experience. Such an approach limits its use to cases where the interested party no longer sees options other than switching to DeepStream because other technologies cannot succeed in their performance endeavors.
Such a situation forms a gossip that deep learning is slow and expensive – it requires a lot of hardware and enormous optimization effort, both financial and time. Deep learning engineers use inefficient technologies for inference because they do not have an affordable option. As a result, the business pays a lot, particular computer vision solutions will never have a chance to be commercially efficient, and the whole industry will not democratize and develop less efficiently.
Immature Developer Ecosystem
DeepStream does not give a developer tools to help him develop pipelines efficiently. Let me show you what it is about…
Imagine you stay 15 feet (or 5 meters) away from a light switch and try to turn it on with a long stick: this is what you experience developing with DeepStream.
DeepStream development mainly consists of a write-compile-launch-fail-analyze cycle with time-wasting on pipeline launching, waiting when it is ready, processing data, and analyzing console logs. NVIDIA does not provide not only tools for comfortable development but also a methodology and guidelines for development to help engineers use the best practices.
In such conditions, the technology cannot become mainstream. Now, we are ready to move to organizational problems.
NVIDIA is traditionally weak in the field of customer support. Without hesitation, they direct the partners’ principal engineers to the developer forum whenever the last ones have questions and problems. With such a complex technology, it is an additional blocker, making the learning curve very steep, especially for average developers.
Additionally, they do not actively participate in user code analysis because native DeepStream API requires C++, and the end-to-end code analysis is complex.
Bug Report Management
Bugs reported on the developer forum are often ignored or declined. There is no publicly available bug tracker where reporters and users can track the dynamics of bug fixes and monitor release efforts.
We reported bugs multiple times, and often, their approach is ridiculous: even if we provide a fully reproducible scenario with logs and a turnkey environment, the first reaction is negation.
Check the ticket by the following link to learn how it works.
No Transparent Roadmap And Feature Promotion Process
The community does not have an established process to influence the development roadmap. Nobody knows what will be in the following release and why the feature will have been landed.
Summary And Our Solution
DeepStream is a highly efficient and optimized game-changer technology, but the development environment is low-level, and support is a weak point. As a result, DeepStream adoption is lower than it can be because the SDK is not affordable to an average deep learning engineer in computer vision.
Savant strives to utilize the best of DeepStream’s advantages and overcome the difficulties to help it spread across the industry and help deep learning engineers craft highly efficient computer vision pipelines without hassle.
Our solution is to build a higher-level framework on DeepStream that looks familiar to deep learning engineers, is programmable with object-oriented Python SDK, provides ready-to-use design patterns and external integration, and is cloud-native. Let us discuss the distinctive features of Savant, making it the weapon of choice for a computer vision specialist.
How Savant Fights DeepStream Weaknesses
In detail, the advantages of Savant are described in the separate article “Ten Reasons To Consider Savant For Your Computer Vision Project,” but let us briefly enumerate distinctive features in the context of the discussion.
Performance. Savant utilizes DeepStream under the hood but hides excessive details with straightforward API: we use every piece required to support a highly efficient processing model with clearly defined architecture, allowing users to craft pipelines easily by filling a predefined architecture with processing blocks solving particular problems.
Developer-friendly solution. Savant is a Python-first framework. The developers can use all familiar tools like PyTorch, NumPy, CuPy, OpenCV, and OpenCV CUDA; users implement custom functionality with familiar OOP concepts by deriving from API classes. The pipeline is a YAML configuration file where developers specify the stages using a high-level syntax. Developers do not care about DeepStream or GStreamer low-level concepts related to signal-based multimedia processing.
We also provide developers with an advanced profiling solution based on OpenTelemetry: instrument every frame processing, implement custom tracing spans, add attributes, and log records.
We provide Python-based SDK to interact with Savant pipelines (ingest and receive data). It enables simple integration with 3rd-party services. Client SDK is integrated with OpenTelemetry, providing programmatic access to the pipeline traces and logs.
API and architecture. Savant predefines pipeline architecture: developers only need to fill it with sense, but even an empty pipeline can successfully work. This is because Savant is a framework, not a library or SDK. It is like a model serving server optimized for computer vision.
Its external API provides an isolated and abstract mechanism for data access. Pipeline communicates with the outer world with interchangeable adapters, so it is easy to develop using file-based data sources and then switch to RTSP or even Kafka-based video processing. Depending on your needs, you can mix and match adapters, modules, and bridges to build scalable, compute-efficient, or low-latency architectures reflecting technical limitations and regulatory requirements like GDPR.
Integration capabilities. With adapters opening excellent integration capabilities, Savant allows developers to craft arbitrary integration with 3rd-party systems with Python ClientSDK, extendable data protocol, and Etcd support.
Cloud-native. Adapter-based, API-first architecture and Dockerized deployment make it easy to roll out Savant on the edge and in the core and tie pieces together into a distributed pipeline either with plain Docker, Compose, or K8s.
OpenTelemetry tracing can be configured for end-to-end tracing for every frame or in sampled mode. With it, DevOps and SRE teams can observe pipeline health and scale hardware to react to operational changes.
Open-source development process. We develop Savant open-source under the Apache2 license on GitHub. Developers can contribute, investigate the code, and track how issues are handled and merged upstream.
Samples and design solutions. We provide 22+ easy-to-use samples covering multiple areas of the industry, helping users develop their pipelines quickly and address various questions by sharing our vision.
Friendly communication. We maintain a Discord server where everybody is welcome to ask questions. Also, we publish information about new Savant features upstream as soon as they are merged.