The latest machine learning research proposes the FP8 binary exchange format: a natural progression to accelerate deep learning training inference

To meet the growing computing needs of neural networks, AI processing requires integrated innovations across hardware and software platforms. The use of less precise number formats to increase computational efficiency, reduce memory usage, and improve interconnection bandwidth is an important area to drive efficiency.

The researchers believe that having a standard interchange format will promote rapid development and interoperability of the software and hardware platform for computer hardware development. Thus, the industry has recently switched from 32-bit precision to 16-bit, and now even 8-bit precision formats to reap these advantages. 8-bit floating point precision is particularly useful for switch networks, and is one of the most important advances in artificial intelligence. In this context, NVIDIA, Intel, and Arm have published a joint article outlining the 8-bit Floating Point (FP8) specification. It offers a standard format that aims to drive AI development thanks to memory optimization and works in both AI training and inference steps. The FP8 specification was offered under two variants, E5M2 and E4M3.

The two encoding names, E4M3 and E5M2, specify the number of exponent (E) and decimal (M) bits according to the IEEE 754 standard. E4M3 is recommended for weight and activation tensors and E5M2 for gradient tensors when using FP8 encodings. Although some networks only require training of the E4M3 or E5M2 type, some networks require both types (or should maintain fewer tensors in the FP8). Inference and the forward trajectory of training are performed using a variant of E4M3, while the gradients of the back lane are performed using a variation of E5M2. The FP8 format was built with the guiding idea of ​​sticking to IEEE-754 standards and just getting away from where that would greatly improve the accuracy of DL applications. Thus, the E5M2 architecture is half IEEE precision with fewer decimal bits and adheres to IEEE 754 rules for exponent and private values. This makes it easy to convert between IEEE FP16 and E5M2 formats. E4M3’s dynamic range is increased by restoring most of the bit patterns used for certain values ​​rather than allowing many encodings of the given values ​​in this case.

To verify the effectiveness of the proposal formulated in this article, the authors conducted a pilot study related to training and inference phases to compare the results obtained with baselines trained in either FP16 or bfloat16. Across vision and language translation paradigms, the results of FP8 training match those of 16-bit training sessions.

This paper introduced a new binary exchange format FP8, E4M3, and E5M2. The authors ensure that software applications can continue to rely on IEEE FP properties such as the ability to compare and sort values ​​using integer operations by barely departing from the IEEE-754 standards for binary encoding of floating point values. The pilot study shows that by using the same model, optimizer, and hyperparameters for training, a wide variety of neural network models for image and language tasks can be trained in FP8 to equal model accuracy achieved with 16-bit training sessions. Using the same data types for training and inference, FP8 not only speeds up and reduces the resources needed for training, but also makes 8-bit inference deployment simpler.

This Article is written as a research summary article by Marktechpost Staff based on the research paper 'FP8 FORMATS FOR DEEP LEARNING'. All Credit For This Research Goes To Researchers on This Project. Check out the paper.

Please Don't Forget To Join Our ML Subreddit


Mahmoud is a PhD researcher in machine learning. It also carries a
Bachelor’s degree in Physical Sciences and Master’s degree in
Communication systems and networks. His current fields
Research is concerned with computer vision, stock market forecasting and deep
learning. Produced many scholarly articles about a person re
Determine and study the durability and stability of depth
networks.


Leave a Comment