Benchmark

Here we have presented benchmark results for some selected neural networks.

Our benchmark is an open simple benchmark that measures inference time of ONNX models on ENOT Lite backend versus PyTorch native inference time and transforms it to FPS (frame-per-second, the bigger the better) metric.

All values in tables below are given in FPS. For natural language processing neural networks FPS = QPS.

Benchmarks:
CPU Benchmarks:

ResNet-50

input: (batch_size, 3, 224, 224)
batch_size = 1

Device / Backend

ENOT Lite

TensorRT

ONNX CUDA

Torch CUDA

RTX 3090

2120.8

657.4

426.3

226.0

RTX 3080 Ti

2025.7

639.9

424.2

208.9

RTX 2080 Ti

1346.2

501.4

318.2

136.1

GTX 1080 Ti

823.8

446.0

278.8

245.0

batch_size = 16

Device / Backend

ENOT Lite

TensorRT

ONNX CUDA

Torch CUDA

RTX 3090

8256.6

2058.4

1726.3

1155.3

RTX 3080 Ti

7027.6

2033.0

1667.3

1138.8

RTX 2080 Ti

4183.2

1216.4

839.3

803.0

GTX 1080 Ti

2248.1

963.6

899.6

564.8

MobileNetV2

input: (batch_size, 3, 224, 224)
batch_size = 1

Device / Backend

ENOT Lite

TensorRT

ONNX CUDA

Torch CUDA

RTX 3090

3191.4

1505.9

932.6

294.2

RTX 3080 Ti

2854.8

1414.7

888.5

275.7

RTX 2080 Ti

2181.9

1148.0

695.1

186.8

GTX 1080 Ti

1838.6

1099.1

630.6

393.5

batch_size = 16

Device / Backend

ENOT Lite

TensorRT

ONNX CUDA

Torch CUDA

RTX 3090

15456.1

6109.1

3333.2

3204.6

RTX 3080 Ti

11295.7

5547.5

3129.5

3038.6

RTX 2080 Ti

6476.5

3485.7

2077.3

1928.9

GTX 1080 Ti

5316.1

2780.7

1318.7

1238.7

MobileNetV2-SSD

input: (batch_size, 3, 224, 224)
batch_size = 1

Device / Backend

ENOT Lite

TensorRT

ONNX CUDA

Torch CUDA

RTX 3060 Ti

622.7

411.1

105.7

119.3

RTX 2080 Ti

451.4

369.6

107.3

79.1

GTX 1080 Ti

483.5

421.2

159.6

126.2

batch_size = 16

Device / Backend

ENOT Lite

TensorRT

ONNX CUDA

Torch CUDA

RTX 3060 Ti

2419.5

1520.1

211.9

230.6

RTX 2080 Ti

1411.0

1111.8

238.0

222.9

GTX 1080 Ti

2128.6

1485.7

275.3

256.7

YOLOv5s

input: (batch_size, 3, 640, 640)
batch_size = 1

Device / Backend

ENOT Lite

TensorRT

ONNX CUDA

Torch CUDA

RTX 3090

739.5

270.2

188.4

167.3

RTX 3080 Ti

648.0

245.9

178.8

148.0

RTX 2080 Ti

392.3

172.9

131.4

78.8

GTX 1080 Ti

284.7

163.3

107.3

111.9

batch_size = 16

Device / Backend

ENOT Lite

TensorRT

ONNX CUDA

Torch CUDA

RTX 3090

1212.1

331.4

251.9

159.1

RTX 3080 Ti

1096.2

295.8

229.7

144.7

RTX 2080 Ti

718.1

177.6

147.5

107.2

GTX 1080 Ti

458.0

173.8

125.1

130.2

ViT

Vision Transformer (ViT), patch = 16, resolution = 224.

input: (batch_size, 3, 224, 224)
batch_size = 1

Device / Backend

ENOT Lite

TensorRT

ONNX CUDA

Torch CUDA

RTX 3090

371.1

371.9

292.2

244.1

RTX 3080 Ti

363.6

370.6

288.3

220.7

RTX 2080 Ti

305.2

205.2

187.2

156.5

GTX 1080 Ti

142.5

133.1

128.6

104.2

batch_size = 16

Device / Backend

ENOT Lite

TensorRT

ONNX CUDA

Torch CUDA

RTX 3090

1301.9

646.0

563.4

516.6

RTX 3080 Ti

1283.9

639.2

561.3

507.8

RTX 2080 Ti

906.2

245.6

224.9

204.0

GTX 1080 Ti

195.8

194.9

178.4

167.7

BERT

input length: 1941 characters

Device / Backend

ENOT Lite

TensorRT

ONNX CUDA

Torch CUDA

RTX 3090

401.2

199.8

158.9

148.5

RTX 3080 Ti

399.1

200.5

156.0

146.8

RTX 2080 Ti

298.0

110.5

98.0

82.8

GTX 1080 Ti

66.8

67.5

64.4

53.7

ResNet-50 CPU

input: (batch_size, 3, 224, 224)
batch_size = 1

Device / Backend

ENOT Lite

ONNX CPU

Torch CPU

11th Gen Intel(R) Core(TM) i7-11700K @ 3.60GHz

268.4

101.5

46.2

batch_size = 8

Device / Backend

ENOT Lite

ONNX CPU

Torch CPU

11th Gen Intel(R) Core(TM) i7-11700K @ 3.60GHz

254.2

100.4

50.0

MobileNetV2 CPU

input: (batch_size, 3, 224, 224)
batch_size = 1

Device / Backend

ENOT Lite

ONNX CPU

Torch CPU

11th Gen Intel(R) Core(TM) i7-11700K @ 3.60GHz

1535.7

842.2

135.5

batch_size = 8

Device / Backend

ENOT Lite

ONNX CPU

Torch CPU

11th Gen Intel(R) Core(TM) i7-11700K @ 3.60GHz

2176.9

453.0

139.8

YOLOv5s CPU

input: (batch_size, 3, 224, 224)
batch_size = 1

Device / Backend

ENOT Lite

ONNX CPU

Torch CPU

11th Gen Intel(R) Core(TM) i7-11700K @ 3.60GHz

82.8

33.2

22.6

batch_size = 8

Device / Backend

ENOT Lite

ONNX CPU

Torch CPU

11th Gen Intel(R) Core(TM) i7-11700K @ 3.60GHz

45.1

22.1

18.8

ViT CPU

Vision Transformer (ViT), patch = 16, resolution = 224.

input: (batch_size, 3, 224, 224)
batch_size = 1

Device / Backend

ENOT Lite

ONNX CPU

Torch CPU

11th Gen Intel(R) Core(TM) i7-11700K @ 3.60GHz

32.8

15.5

14.9

batch_size = 8

Device / Backend

ENOT Lite

ONNX CPU

Torch CPU

11th Gen Intel(R) Core(TM) i7-11700K @ 3.60GHz

29.0

17.4

16.6

BERT CPU

input length: 1941 characters

Device / Backend

ENOT Lite

ONNX CPU

Torch CPU

11th Gen Intel(R) Core(TM) i7-11700K @ 3.60GHz

10.6

10.8

7.8