Savant natively uses models in the NVIDIA TensorRT format, optimized for a particular hardware platform. However, users do not need to convert models manually; we encourage them to use the ONNX format, which allows Savant to build TensorRT engines internally.
Once built, the models are cached and loaded quickly unless the cache is moved to a GPU of a different GPU family (e.g., Turing to Ampere) or the batch size changes, which causes Nvinfer, used internally by Savant, to rebuild them. You may also want to rebuild the model for a particular GPU, even within the same family, to ensure it is optimal, because, depending on GPU properties, you can get a better-optimized model for that GPU, especially if you allow TensorRT to use more memory. It’s worth trying if you care about performance maximization. In this manual, we walk through exporting an Ultralytics model to the ONNX format for use in Savant.
Continue reading How to prepare an Ultralytics model for the use in Savant


