Here I show how to load and run an ONNX model using Python in the ONNX Runtime.
In this article in our series about using portable neural networks in 2020, you’ll learn how to install ONNX on an x64 architecture and use it in Python.
Microsoft co-developed ONNX with Facebook and AWS. Both the ONNX format and ONNX Runtime have industry support to make sure that all the important frameworks are capable of exporting their graphs to ONNX and that these models can run on any hardware configuration.
The ONNX Runtime is an engine for running machine learning models that have been converted to the ONNX format. Both traditional machine learning models and deep learning models (neural networks) can be exported to the ONNX format. The runtime can run on Linux, Windows, and Mac, and can run on a variety of chip architectures. It can also take advantage of hardware accelerators such as GPUs and TPUs. However, there is not an install package for every combination of OS, chip architecture, and accelerator, so you may need to build the runtime from source if you are not using one of the common combinations. Check the ONNX Runtime website to get installation instructions for the combination you need. This article will show how to install ONNX Runtime on an x64 architecture with a default CPU and an x64 architecture with a GPU.
In addition to being able to run on many hardware configurations, the runtime can be called from most popular programming languages. The purpose of this article is to show how to use ONNX Runtime in Python. I’ll show how to install the onnxruntime package. Once ONNX Runtime is installed, I’ll load a previously exported MNIST model into ONNX Runtime and use it to make predictions.
Installing and Importing the ONNX Runtime
Before using the ONNX Runtime, you will need to install the onnxruntime package. The following command will install the runtime on an x64 architecture with a default CPU:
pip install onnxruntime
To install the runtime on an x64 architecture with a GPU, use the command below:
pip install onnxruntime-gpu
Once installed, it can be imported into your modules with the following import
statement:
import onnxruntime
Loading ONNX Models
Loading an ONNX model into ONNX Runtime is as straightforward as the conversion and is really just one line of code. The function below shows how to load a ONNX model into ONNX Runtime. The actual loading of the ONNX model is just one command.
def load_onnx_model(onnx_model_file):
try:
session = onnxruntime.InferenceSession(onnx_model_file)
except (InvalidGraph, TypeError, RuntimeError) as e:
print(e)
raise e
return session
Using the ONNX Runtime for Predictions
The function below shows how to use the ONNX session that was created when we loaded our ONNX model. There are a few things worth noting here. First, you need to query the session to get its inputs. This is done using the session’s get_inputs()
method. The name of this input is used to create a dictionary of inputs that are passed to the session’s run command. Our MNIST model only has one input parameter which is a list of images. (In the image_samples
parameter.) If your model has more than one input parameter then get_inputs()
will have an entry for each parameter.
def onnx_infer(onnx_session, image_samples):
input_name = onnx_session.get_inputs()[0].name
result = onnx_session.run(None, {input_name: image_samples})
probabilities = np.array(result[0])
print(type(probabilities))
print(probabilities)
predictions = np.argmax(probabilities, axis=1)
return predictions
Most neural networks do not return a prediction directly. They return a list of probabilities for each of the output classes. In the case of our MNIST model, the return value for each image will be a list of 10 probabilities. The entry with the highest probability is the prediction. An interesting test that you can do is compare the probabilities the ONNX model returns to the probabilities returned from the original model when it is run within the framework that created the model. Ideally, the change in model format and runtime should not change any of the probabilities produced. This would make a good unit test that is run every time a change occurs to the model.
Summary and Next Steps
In this article, I provided a brief overview of the ONNX Runtime and the ONNX format. I then showed how to load and run an ONNX model using Python in the ONNX Runtime.
The code sample for this article contains a working Console application that demonstrates all the techniques shown here. This code sample is part of a Github repository that explores the use of Neural Networks for predicting the numbers found in the MNIST dataset. Specifically, there are samples that show how to create Neural Networks in Keras, PyTorch, TensorFlow 1.0, and TensorFlow 2.0.
If you want to learn more about Exporting to the ONNX format and using the ONNX Runtime, check out the other articles in this series.
References
Keith is a sojourner in the software industry. He has over 30 years of experience building and bringing applications to market. He has worked for startups and large enterprises in roles ranging from tech lead to business development manager. He is currently a senior engineer on BNY Mellon's Distribution Analytics team where he is building data pipelines from on-premise data sources to the cloud.