New Feature: Neural Network Compilation Mode

Savant uses the TensorRT engine for model serving. TensorRT is an optimization ecosystem compiling models into highly efficient engines optimized for particular hardware, batch size, and precision. After that procedure, the model’s inference speed increases significantly, but the process is resource- and time-consuming.

Savant builds engines specified in a pipeline during the first launch; the consecutive launches use prebuilt cached engines to bootstrap faster. However, the build process can take plenty of minutes for up-to-date models like YOLOV8. Thus, the user may feel insecure if everything is correct or if a pipeline is stuck in the middle during the model compilation.

To help overcome the problem, we have implemented a new function that optionally separates the build and run phases. With a new feature, you can launch the build phase with shutdown after completing the compilation.

You can use a helper script and pass a module configuration as an argument.

./scripts/run_module.py --build-engines samples/animegan/module.yml

We upgraded Savant samples to allow users to build models as an optional preparation step.