Benchmark¶
Here we have presented benchmark results for some selected neural networks.
Our benchmark is an open simple benchmark that measures inference time
of ONNX
models on ENOT-Lite
backends versus PyTorch
native inference time
and transforms it to FPS
(frame-per-second, the bigger the better) metric.
All values in tables below are given in FPS
. For natural language processing neural networks FPS = QPS
.
- Benchmarks:
ResNet-50¶
input: (batch_size, 3, 224, 224)
Device / Backend |
TensorRT Float |
TensorRT Float16 |
TensorRT Int8 |
ONNX CUDA |
Torch CUDA |
---|---|---|---|---|---|
RTX 3060 Ti |
473.4 |
1479.7 |
1849.6 |
340.6 |
209.1 |
RTX 2080 Ti |
514.1 |
1142.7 |
1463.7 |
317.4 |
215.7 |
GTX 1080 Ti |
439.5 |
438.9 |
882.7 |
282.5 |
231.2 |
Device / Backend |
TensorRT Float |
TensorRT Float16 |
TensorRT Int8 |
ONNX CUDA |
Torch CUDA |
---|---|---|---|---|---|
RTX 3060 Ti |
1179.1 |
3454.8 |
6381.4 |
973.8 |
624.3 |
RTX 2080 Ti |
1198.6 |
3220.8 |
4713.9 |
842.3 |
770.4 |
GTX 1080 Ti |
994.4 |
970.4 |
2620.5 |
935.1 |
595.6 |
MobileNetV2¶
input: (batch_size, 3, 224, 224)
Device / Backend |
TensorRT Float |
TensorRT Float16 |
TensorRT Int8 |
ONNX CUDA |
Torch CUDA |
---|---|---|---|---|---|
RTX 3060 Ti |
1188.4 |
2524.5 |
2934.6 |
779.1 |
254.0 |
RTX 2080 Ti |
1134.6 |
1952.4 |
2287.7 |
658.3 |
203.0 |
GTX 1080 Ti |
1122.2 |
1123.9 |
1647.9 |
649.3 |
343.3 |
Device / Backend |
TensorRT Float |
TensorRT Float16 |
TensorRT Int8 |
ONNX CUDA |
Torch CUDA |
---|---|---|---|---|---|
RTX 3060 Ti |
3567.6 |
7117.5 |
11275.7 |
1855.5 |
1746.2 |
RTX 2080 Ti |
3171.4 |
5239.8 |
6434.3 |
2038.3 |
1855.3 |
GTX 1080 Ti |
3120.8 |
3113.4 |
6305.3 |
1411.6 |
1344.6 |
MobileNetV2-SSD¶
input: (batch_size, 3, 224, 224)
Device / Backend |
TensorRT Float |
TensorRT Float16 |
TensorRT Int8 |
ONNX CUDA |
Torch CUDA |
---|---|---|---|---|---|
RTX 3060 Ti |
411.1 |
597.4 |
622.7 |
105.7 |
119.3 |
RTX 2080 Ti |
369.6 |
451.4 |
445.8 |
107.3 |
79.1 |
GTX 1080 Ti |
421.2 |
419.3 |
483.5 |
159.6 |
126.2 |
Device / Backend |
TensorRT Float |
TensorRT Float16 |
TensorRT Int8 |
ONNX CUDA |
Torch CUDA |
---|---|---|---|---|---|
RTX 3060 Ti |
1520.1 |
2045.1 |
2419.5 |
211.9 |
230.6 |
RTX 2080 Ti |
1111.8 |
1349.5 |
1411.0 |
238.0 |
222.9 |
GTX 1080 Ti |
1485.7 |
1482.8 |
2128.6 |
275.3 |
256.7 |
YOLOv5s¶
input: (batch_size, 3, 640, 640)
Device / Backend |
TensorRT Float |
TensorRT Float16 |
TensorRT Int8 |
ONNX CUDA |
Torch CUDA |
---|---|---|---|---|---|
RTX 3060 Ti |
236.5 |
514.9 |
601.3 |
158.8 |
148.5 |
RTX 2080 Ti |
255.1 |
412.5 |
441.4 |
172.0 |
84.5 |
GTX 1080 Ti |
201.8 |
201.4 |
281.9 |
127.3 |
111.6 |
Device / Backend |
TensorRT Float |
TensorRT Float16 |
TensorRT Int8 |
ONNX CUDA |
Torch CUDA |
---|---|---|---|---|---|
RTX 3060 Ti |
316.4 |
637.9 |
777.9 |
196.0 |
120.7 |
RTX 2080 Ti |
331.6 |
573.8 |
649.4 |
243.6 |
126.5 |
GTX 1080 Ti |
268.9 |
268.1 |
440.4 |
170.0 |
138.4 |
ViT¶
Vision Transformer (ViT), patch = 16
, resolution = 224
.
input: (batch_size, 3, 224, 224)
Device / Backend |
TensorRT Float |
TensorRT Float16 |
TensorRT Int8 |
ONNX CUDA |
Torch CUDA |
---|---|---|---|---|---|
RTX 3060 Ti |
175.2 |
318.1 |
310.6 |
172.9 |
175.7 |
RTX 2080 Ti |
159.6 |
373.2 |
374.9 |
175.3 |
132.6 |
GTX 1080 Ti |
123.1 |
122.5 |
122.3 |
135.5 |
108.3 |
Device / Backend |
TensorRT Float |
TensorRT Float16 |
TensorRT Int8 |
ONNX CUDA |
Torch CUDA |
---|---|---|---|---|---|
RTX 3060 Ti |
283.6 |
592.8 |
595.3 |
291.3 |
279.7 |
RTX 2080 Ti |
127.4 |
435.8 |
410.3 |
153.7 |
123.7 |
GTX 1080 Ti |
169.6 |
168.4 |
166.4 |
182.7 |
166.0 |
BERT¶
input length: 1941 characters
Device / Backend |
TensorRT Float |
TensorRT Float16 |
ONNX CUDA |
Torch CUDA |
---|---|---|---|---|
RTX 3060 Ti |
630.2 |
846.9 |
94.0 |
90.5 |
RTX 2080 Ti |
100.0 |
257.0 |
94.7 |
73.4 |
GTX 1080 Ti |
43.9 |
39.4 |
21.8 |
25.1 |