Here I show how to load and run an ONNX model using Java in the ONNX Runtime.
In this article in our series about using portable neural networks in 2020, you’ll learn how to install ONNX on an x64 architecture and use it in Java.
Microsoft co-developed ONNX with Facebook and AWS. Both the ONNX format and ONNX Runtime have industry support to make sure that all the important frameworks are capable of exporting their graphs to ONNX and that these models can run on any hardware configuration.
The ONNX Runtime is an engine for running machine learning models that have been converted to the ONNX format. Both traditional machine learning models and deep learning models (neural networks) can be exported to the ONNX format. The runtime can run on Linux, Windows, and Mac, and can run on a variety of chip architectures. It can also take advantage of hardware accelerators such as GPUs and TPUs. However, there is not an install package for every combination of OS, chip architecture, and accelerator, so you may need to build the runtime from source if you are not using one of the common combinations. Check the ONNX Runtime website to get installation instructions for the combination you need. This article will show how to install ONNX Runtime on an x64 architecture with a default CPU and an x64 architecture with a GPU.
In addition to being able to run on many hardware configurations, the runtime can be called from most popular programming languages. The purpose of this article is to show how to use ONNX Runtime in Java. I’ll show how to install the onnxruntime package. Once ONNX Runtime is installed, I’ll load a previously exported MNIST model into ONNX Runtime and use it to make predictions.
Installing and Importing the ONNX Runtime
Before using the ONNX Runtime, you will need to add the proper dependency to your build tool. The Maven repository is a good source for setting up the ONNX Runtime for a variety of tools including Maven and Gradle. To use the runtime on an x64 architecture with a default CPU, refer to the link below.
https://mvnrepository.com/artifact/org.bytedeco/onnxruntime-platform
To use the runtime on an x64 architecture with a GPU, use the following link.
https://mvnrepository.com/artifact/org.bytedeco/onnxruntime-platform-gpu
Once the runtime has been installed, it can be imported into your Java code files with the import
statements shown below. The import
statements that pull in the TensorProto tools will help us create inputs for ONNX Models and it will also help to interpret the output (prediction) of an ONNX model.
import ai.onnxruntime.OnnxMl.TensorProto;
import ai.onnxruntime.OnnxMl.TensorProto.DataType;
import ai.onnxruntime.OrtSession.Result;
import ai.onnxruntime.OrtSession.SessionOptions;
import ai.onnxruntime.OrtSession.SessionOptions.ExecutionMode;
import ai.onnxruntime.OrtSession.SessionOptions.OptLevel;
Loading ONNX Models
The snippet below shows how to load an ONNX model into ONNX Runtime running in Java. This code creates a session object that can be used to make predictions. The model being used here is the ONNX model that was exported from PyTorch.
There are a few things worth noting here. First, you need to query the session to get its inputs. This is done using the session’s getInputInfo
method. Our MNIST model only has one input parameter: an array of 784 floats that represent one image from the MNIST dataset. If your model has more than one input parameter then InputMetadata
will have an entry for each parameter.
Utilities.LoadTensorData();
String modelPath = "pytorch_mnist.onnx";
try (OrtSession session = env.createSession(modelPath, options)) {
Map<String, NodeInfo> inputMetaMap = session.getInputInfo();
Map<String, OnnxTensor> container = new HashMap<>();
NodeInfo inputMeta = inputMetaMap.values().iterator().next();
float[] inputData = Utilities.ImageData[imageIndex];
string label = Utilities.ImageLabels[imageIndex];
System.out.println("Selected image is the number: " + label);
// this is the data for only one input tensor for this model
Object tensorData =
OrtUtil.reshape(inputData, ((TensorInfo) inputMeta.getInfo()).getShape());
OnnxTensor inputTensor = OnnxTensor.createTensor(env, tensorData);
container.put(inputMeta.getName(), inputTensor);
// Run code omitted for brevity.
}
Not shown in the code above are the utilities that read the raw MNIST images and convert each image to an array of 784 floats. The label for each image is also read in from the MNIST dataset so that the accuracy of predictions can be determined. This code is standard Java code, but you are still encouraged to check it out and use it. It will save you time if you need to read in images that are similar to the MNIST dataset.
Using the ONNX Runtime for Predictions
The function below shows how to use the ONNX session that was created when we loaded our ONNX model.
try (OrtSession session = env.createSession(modelPath, options)) {
// Load code not shown for brevity.
// Run the inference
try (OrtSession.Result results = session.run(container)) {
// Only iterates once
for (Map.Entry<String, OnnxValue> r : results) {
OnnxValue resultValue = r.getValue();
OnnxTensor resultTensor = (OnnxTensor) resultValue;
resultTensor.getValue()
System.out.println("Output Name: {0}", r.Name);
int prediction = MaxProbability(resultTensor);
System.out.println("Prediction: " + prediction.ToString());
}
}
}
Most neural networks do not return a prediction directly. They return a list of probabilities for each of the output classes. In the case of our MNIST model, the return value for each image will be a list of 10 probabilities. The entry with the highest probability is the prediction. An interesting test that you can do is to compare the probabilities the ONNX model returns to the probabilities returned from the original model when it is run within the framework that created the model. Ideally, the change in model format and runtime should not change any of the probabilities produced. This would make a good unit test that is run every time a change occurs to the model.
Summary and Next Steps
In this article, I provided a brief overview of the ONNX Runtime and the ONNX format. I then showed how to load and run an ONNX model using Java in the ONNX Runtime.
The code sample for this article contains a working Console application that demonstrates all the techniques shown here. This code sample is part of a Github repository that explores the use of Neural Networks for predicting the numbers found in the MNIST dataset. Specifically, there are samples that show how to create Neural Networks in Keras, PyTorch, TensorFlow 1.0, and TensorFlow 2.0.
If you want to learn more about Exporting to the ONNX format and using ONNX Runtime, check out the other articles in this series.
References
Keith is a sojourner in the software industry. He has over 30 years of experience building and bringing applications to market. He has worked for startups and large enterprises in roles ranging from tech lead to business development manager. He is currently a senior engineer on BNY Mellon's Distribution Analytics team where he is building data pipelines from on-premise data sources to the cloud.