Benchmark
Here we have presented benchmark results for some selected neural networks.
Our benchmark is an open simple benchmark that measures inference time of ONNX models on ENOT Lite backend versus PyTorch native inference time and transforms it to FPS (frame-per-second, the bigger the better) metric.
All values in tables below are given in FPS. For natural language processing neural networks FPS = QPS.
- Benchmarks:
- CPU Benchmarks:
ResNet-50
input: (batch_size, 3, 224, 224)
Device / Backend |
ENOT Lite |
TensorRT |
ONNX CUDA |
Torch CUDA |
---|---|---|---|---|
RTX 3090 |
2120.8 |
657.4 |
426.3 |
226.0 |
RTX 3080 Ti |
2025.7 |
639.9 |
424.2 |
208.9 |
RTX 2080 Ti |
1346.2 |
501.4 |
318.2 |
136.1 |
GTX 1080 Ti |
823.8 |
446.0 |
278.8 |
245.0 |
Device / Backend |
ENOT Lite |
TensorRT |
ONNX CUDA |
Torch CUDA |
---|---|---|---|---|
RTX 3090 |
8256.6 |
2058.4 |
1726.3 |
1155.3 |
RTX 3080 Ti |
7027.6 |
2033.0 |
1667.3 |
1138.8 |
RTX 2080 Ti |
4183.2 |
1216.4 |
839.3 |
803.0 |
GTX 1080 Ti |
2248.1 |
963.6 |
899.6 |
564.8 |
MobileNetV2
input: (batch_size, 3, 224, 224)
Device / Backend |
ENOT Lite |
TensorRT |
ONNX CUDA |
Torch CUDA |
---|---|---|---|---|
RTX 3090 |
3191.4 |
1505.9 |
932.6 |
294.2 |
RTX 3080 Ti |
2854.8 |
1414.7 |
888.5 |
275.7 |
RTX 2080 Ti |
2181.9 |
1148.0 |
695.1 |
186.8 |
GTX 1080 Ti |
1838.6 |
1099.1 |
630.6 |
393.5 |
Device / Backend |
ENOT Lite |
TensorRT |
ONNX CUDA |
Torch CUDA |
---|---|---|---|---|
RTX 3090 |
15456.1 |
6109.1 |
3333.2 |
3204.6 |
RTX 3080 Ti |
11295.7 |
5547.5 |
3129.5 |
3038.6 |
RTX 2080 Ti |
6476.5 |
3485.7 |
2077.3 |
1928.9 |
GTX 1080 Ti |
5316.1 |
2780.7 |
1318.7 |
1238.7 |
MobileNetV2-SSD
input: (batch_size, 3, 224, 224)
Device / Backend |
ENOT Lite |
TensorRT |
ONNX CUDA |
Torch CUDA |
---|---|---|---|---|
RTX 3060 Ti |
622.7 |
411.1 |
105.7 |
119.3 |
RTX 2080 Ti |
451.4 |
369.6 |
107.3 |
79.1 |
GTX 1080 Ti |
483.5 |
421.2 |
159.6 |
126.2 |
Device / Backend |
ENOT Lite |
TensorRT |
ONNX CUDA |
Torch CUDA |
---|---|---|---|---|
RTX 3060 Ti |
2419.5 |
1520.1 |
211.9 |
230.6 |
RTX 2080 Ti |
1411.0 |
1111.8 |
238.0 |
222.9 |
GTX 1080 Ti |
2128.6 |
1485.7 |
275.3 |
256.7 |
YOLOv5s
input: (batch_size, 3, 640, 640)
Device / Backend |
ENOT Lite |
TensorRT |
ONNX CUDA |
Torch CUDA |
---|---|---|---|---|
RTX 3090 |
739.5 |
270.2 |
188.4 |
167.3 |
RTX 3080 Ti |
648.0 |
245.9 |
178.8 |
148.0 |
RTX 2080 Ti |
392.3 |
172.9 |
131.4 |
78.8 |
GTX 1080 Ti |
284.7 |
163.3 |
107.3 |
111.9 |
Device / Backend |
ENOT Lite |
TensorRT |
ONNX CUDA |
Torch CUDA |
---|---|---|---|---|
RTX 3090 |
1212.1 |
331.4 |
251.9 |
159.1 |
RTX 3080 Ti |
1096.2 |
295.8 |
229.7 |
144.7 |
RTX 2080 Ti |
718.1 |
177.6 |
147.5 |
107.2 |
GTX 1080 Ti |
458.0 |
173.8 |
125.1 |
130.2 |
ViT
Vision Transformer (ViT), patch = 16
, resolution = 224
.
input: (batch_size, 3, 224, 224)
Device / Backend |
ENOT Lite |
TensorRT |
ONNX CUDA |
Torch CUDA |
---|---|---|---|---|
RTX 3090 |
371.1 |
371.9 |
292.2 |
244.1 |
RTX 3080 Ti |
363.6 |
370.6 |
288.3 |
220.7 |
RTX 2080 Ti |
305.2 |
205.2 |
187.2 |
156.5 |
GTX 1080 Ti |
142.5 |
133.1 |
128.6 |
104.2 |
Device / Backend |
ENOT Lite |
TensorRT |
ONNX CUDA |
Torch CUDA |
---|---|---|---|---|
RTX 3090 |
1301.9 |
646.0 |
563.4 |
516.6 |
RTX 3080 Ti |
1283.9 |
639.2 |
561.3 |
507.8 |
RTX 2080 Ti |
906.2 |
245.6 |
224.9 |
204.0 |
GTX 1080 Ti |
195.8 |
194.9 |
178.4 |
167.7 |
BERT
input length: 1941 characters
Device / Backend |
ENOT Lite |
TensorRT |
ONNX CUDA |
Torch CUDA |
---|---|---|---|---|
RTX 3090 |
401.2 |
199.8 |
158.9 |
148.5 |
RTX 3080 Ti |
399.1 |
200.5 |
156.0 |
146.8 |
RTX 2080 Ti |
298.0 |
110.5 |
98.0 |
82.8 |
GTX 1080 Ti |
66.8 |
67.5 |
64.4 |
53.7 |
ResNet-50 CPU
input: (batch_size, 3, 224, 224)
Device / Backend |
ENOT Lite |
ONNX CPU |
Torch CPU |
---|---|---|---|
11th Gen Intel(R) Core(TM) i7-11700K @ 3.60GHz |
268.4 |
101.5 |
46.2 |
Device / Backend |
ENOT Lite |
ONNX CPU |
Torch CPU |
---|---|---|---|
11th Gen Intel(R) Core(TM) i7-11700K @ 3.60GHz |
254.2 |
100.4 |
50.0 |
MobileNetV2 CPU
input: (batch_size, 3, 224, 224)
Device / Backend |
ENOT Lite |
ONNX CPU |
Torch CPU |
---|---|---|---|
11th Gen Intel(R) Core(TM) i7-11700K @ 3.60GHz |
1535.7 |
842.2 |
135.5 |
Device / Backend |
ENOT Lite |
ONNX CPU |
Torch CPU |
---|---|---|---|
11th Gen Intel(R) Core(TM) i7-11700K @ 3.60GHz |
2176.9 |
453.0 |
139.8 |
YOLOv5s CPU
input: (batch_size, 3, 224, 224)
Device / Backend |
ENOT Lite |
ONNX CPU |
Torch CPU |
---|---|---|---|
11th Gen Intel(R) Core(TM) i7-11700K @ 3.60GHz |
82.8 |
33.2 |
22.6 |
Device / Backend |
ENOT Lite |
ONNX CPU |
Torch CPU |
---|---|---|---|
11th Gen Intel(R) Core(TM) i7-11700K @ 3.60GHz |
45.1 |
22.1 |
18.8 |
ViT CPU
Vision Transformer (ViT), patch = 16
, resolution = 224
.
input: (batch_size, 3, 224, 224)
Device / Backend |
ENOT Lite |
ONNX CPU |
Torch CPU |
---|---|---|---|
11th Gen Intel(R) Core(TM) i7-11700K @ 3.60GHz |
32.8 |
15.5 |
14.9 |
Device / Backend |
ENOT Lite |
ONNX CPU |
Torch CPU |
---|---|---|---|
11th Gen Intel(R) Core(TM) i7-11700K @ 3.60GHz |
29.0 |
17.4 |
16.6 |
BERT CPU
input length: 1941 characters
Device / Backend |
ENOT Lite |
ONNX CPU |
Torch CPU |
---|---|---|---|
11th Gen Intel(R) Core(TM) i7-11700K @ 3.60GHz |
10.6 |
10.8 |
7.8 |