Saturday afternoon (Nov. 16) at Supercomputing 2019, Intel launched a new programming model called oneAPI. Intel describes the necessity of tightly coupling middleware and frameworks directly to specific hardware as one of the largest pain points of AI/Machine Learning development. The oneAPI model is intended to abstract that tight coupling away, allowing developers to focus on their actual project and re-use the same code when the underlying hardware changes.
This sort of "write once, run anywhere" mantra is reminiscent of Sun's early pitches for the Java language. However, Bill Savage, general manager of compute performance for Intel, told Ars that's not an accurate characterization. Although each approach addresses the same basic problem—tight coupling to machine hardware making developers' lives more difficult and getting in the way of code re-use—the approaches are very different.
When a developer writes Java code, the source is compiled to bytecode, and a Java Virtual Machine tailored to the local hardware executes that bytecode. Although many optimizations have improved Java's performance in the 20+ years since it was introduced, it's still significantly slower than C++ code in most applications—typically, anywhere from half to one-tenth as fast. By contrast, oneAPI is intended to produce direct object code with no or negligible performance penalties.
When we questioned Savage about oneAPI's design and performance expectations, he distanced it firmly from Java, pointing out that there is no bytecode involved. Instead, oneAPI is a set of libraries that tie hardware-agnostic API calls directly to heavily optimized, low-level code that drives the actual hardware available in the local environment. So instead of "Java for Artificial Intelligence," the high-level takeaway is more along the lines of "OpenGL/DirectX for Artificial Intelligence."
For even higher-performance coding inside tight loops, oneAPI also introduces a new language variant called "Data Parallel C++" allowing even very low-level optimized code to target multiple architectures. Data Parallel C++ leverages and extends SYCL, a "single source" abstraction layer for OpenCL programming.
In its current version, a oneAPI developer still needs to target the basic hardware type he or she is coding for—for example, CPUs, GPUs, or FPGAs. Beyond that basic targeting, oneAPI keeps the code optimized for any supported hardware variant. This would, for example, allow users of a oneAPI-developed project to run the same code on either Nvidia's Tesla v100 or Intel's own newly released Ponte Vecchio GPU.