Let’s start by finding out what ONNX is. It stands for Open Neural Network eXchange, and it’s described as an open-source AI (artificial intelligence) ecosystem with the aim of establishing open standards for representing machine learning algorithms and software tools to promote innovation and collaboration. You can get it from GitHub.
To put that another way, it means you can create and train
AI models on any platform that you like, using any framework (eg PyTorch,
TensorFlow, Caffe2, Scikit-learn, etc) you like, and ‘translate’ that into a
standard format that can then be run on any other platform – and the one that
we’re interested in is the mainframe.
ONNX was originally called Toffee and was developed by a
team from Facebook, but was renamed in 2017. It’s supported by IBM, Microsoft, Huawei,
Intel, AMD, Arm, Qualcomm, and others.
Developers may want to use different frameworks for a
project because particular frameworks may be better suited to specific phases
of the development process, such as fast training, network architecture
flexibility, or inferencing on mobile devices. ONNX then facilitates the
seamless exchange and sharing of models across many different deep learning
frameworks. Another advantage of using ONNX is that it allows hardware vendors
and others to improve the performance of artificial neural networks of multiple
frameworks at once by targeting the ONNX representation.
ONNX provides definitions of an extensible computation graph
model, built-in operators and standard data types, focused on inferencing
(evaluation). Each computation dataflow graph is a list of nodes that form an
acyclic graph. Nodes have inputs and outputs. Each node is a call to an
operator. Metadata documents the graph. Built-in operators are to be available
on each ONNX-supporting framework. Thanks to Wikipedia for the information
in this format.
So, we saw in that list of vendors that IBM is involved in
the project. How is ONNX used on a mainframe? I know part of the answer to that
because I watched a fascinating presentation by Megan E Hampton, IBM – Advisory
Software Engineer, at the excellent GSE UK conference at the start of the
month. Here’s what she told her audience.
Currently, on the mainframe, there aren’t many tools available for the optimization of AI
models. That’s where ONNX comes in. It is
an open format for representing AI models. ONNX defines a computation graph
model, as well as definitions of built-in operators and standard data types.
ONNX uses
a standard format for representing machine learning (ML) and deep learning (DL)
models. ONNX models are generated by supported DL and ML frameworks or converted
from other formats by converting tools. ONNX models can be imported into
multiple frameworks and runtime engines and executed/accelerated by
heterogeneous hardware and execution environments.
Among the benefits of using ONNX on a mainframe are
that it:
- Allows clients to use popular tools and frameworks to build and train.
- Makes assets portable to multiple Z operating systems.
- Optimizes and enables seamless use of IBM Z hardware and software acceleration investments.
But
what’s the next stage? How do you get from an AI model to something useful that
can run on a mainframe? That’s where the IBM Z Deep Learning Compiler (zDLC) come
in. It uses open source ONNX-MLIR to compile .onnx deep learning AI models into
shared libraries. The resulting shared libraries can then be integrated into C,
C++, Java, or Python applications.
zDLC takes the ONNX (model) as input, and generates a single binary. It handles
static and dynamic shapes as well as multiple data representations. And it exploits
parallelism via OpenMP. OpenMP (Open Multi-Processing) is an application
programming interface (API) that supports multi-platform shared-memory
multiprocessing programming in C, C++, and Fortran. It consists of a set of
compiler directives, library routines, and environment variables that influence
run-time behaviour.
Multi-level intermediate representation (MLIR) significantly
reduces the cost of building domain specific compilers. It connects existing compilers together
through a shared infrastructure. It’s part of LLVM compiler and follows LLVM
governance. LLVM and MLIR are new and powerful ways of writing compilers that
are modular and generic. MLIR is flexible, and introduced the concept of ‘dialects’.
Think of
it like this:
ONNX (the
AI model) plus MLIR (the compiler) produces ONNX-MLIR | IBM Z Deep Learning
Compiler (ie it compiles the AI models).
So, just
to explain these further, MLIR is a unifying software framework for compiler
development. It is a sub-project of the LLVM Compiler Infrastructure project.
LLVM is a
set of compiler and toolchain technologies that can be used to develop a
frontend for any programming language and a backend for any instruction set
architecture. LLVM is designed around a language-independent intermediate
representation (IR) that serves as a portable, high-level assembly language
that can be optimized with a variety of transformations over multiple passes.
Interestingly, LLVM isn't an acronym, although, originally, it stood for Low
Level Virtual Machine.
Let’s go
back to the mainframe again, we can build and train a model in any popular
framework (PyTorch, TensorFlow, etc) on any platform, which allows the maximum
flexibility possible. Then on the mainframe, we can then use ONNX. Models are
converted to the ONNX interchange format. We can then leverage z/OS
Container Extensions (zCX) if we want to run the application inside a Docker
container on z/OS as part of a z/OS workload. We can also run the applications
on zIIP engines, which won’t impact the 4-hour rolling average cost of general
processors. The IBM zDLC (Deep
Learning Compiler) enables existing models to quickly and easily take advantage
of the IBM z16 Telum processor's Integrated Accelerator for AI.
Looking
at the Deep Learning Compiler Flow: the ONNX model (dialect) is lowered and
transformed through multiple phases of intermediate representation (IR) to a
dialect that can be processed by an LLVM compiler. The output of the LLVM
compilation and build is a shared library object that can be deployed.
It all
seems so simple when it’s explained. I expect we’re going to hear a lot more
about all this.