Serving PyTorch Models Efficiently and Asynchronously With Savant

Savant gives developers a highly efficient inference based on TensorRT, which you usually must use when developing efficient pipelines. However, because of the particular need, you may need to integrate the Savant pipeline with another inference technology. In the article, we show how Savant integrates with GPU-accelerated PyTorch inference.

You can also use the approach if you are PyTorch-centric and happy with it but need efficient infrastructure for video processing: transfer, decoding, and encoding.

Continue reading Serving PyTorch Models Efficiently and Asynchronously With Savant