Samsung Electronics Implements On-Device AI on Own NPU Core
The NPU was embedded into Samsung’s Exynos 9820 CPU that powers Galaxy S10 smartphone, making it the first zero-skipping NPU.

ttempts to implement artificial intelligence (AI) algorithm into a wide array of devices are creating new demand for a novel breed of chip architectures called neural processing units (NPUs), which promise to dramatically change the legacy von Neumann computing architecture where CPUs and DRAM continuously shuttle back and forth data to process and store data.

The moves to implement AI on devices are necessitated by strong requirements for real-time, human-like object recognition and data processing. There are many compelling reasons why AI algorithm must be sit on devices in what is called on-chip AI, or on-device AI.

What Makes On-Device AI Important?

Data latency is one challenging issue as real-time applications, like voice-recognition, image recognition, and self-driving cars require fast enough response time across hardware systems to process data on the spot with as little latency as possible.

The proximity of AI algorithm on the spot enables these devices to detect objects and then process data relatively faster than AI on the cloud. The on-device AI is completely safe from possible network breakdowns even though AI on the cloud is so vulnerable to hacking and network system hiccups. Its better power efficiency is advantageous, too, as it does not need to shuttle back and forth data to and from the cloud.

Data privacy issue also helps explain why AI algorithm should be implemented on devices. To make AI algorithm as intelligent as human beings, vast sets of data must be constantly fed back to the cloud, as the nature of AI algorithm is to analyze and infer data to create another set of data to figure out patterns of how users play with their devices.

Shim Eun Soo, Senior Vice President at Samsung Electronics
Photo 1: Shim Eun Soo, Senior Vice President at Samsung Electronics

“The more data AI algorithm is played with, the more it works like human, but uploading data to the cloud risks divulging various crucial, personal information like your frequently visited site names, email addresses, and your habits and preferences, infringing users’ privacy. This is another reason why AI algorithm must be on devices,” said Shim EunSoo, Senior Vice President and head of AI & Software Center with Samsung Electronics Co., Ltd.

Shim said that on-device AI algorithm is a must-have to get self-driving cars on the road, as it is the only way available to get cars to fully understand situations around them and process data from a variety of sensors, like camera and LiDAR, in real time and subsequently respond to the situation.

“Awash in dozens of sensors, self-driving cars must detect and understand everything around them, but real-time and precise object detection is still a tough challenge, because camera sensors still find it difficult to process backlight, thus compromising object detection. Meanwhile, snow drops and fogs also disturb the right operation of LiDAR, making on-device AI algorithm more important, as it promises to calculate many mathematical variations to perfectly detect and perceive object,” he added.

Morever, Shim said that self-driving have to process tons of sensing and mapping data as well as traffic controlling data on the spot to perfectly understand everything around them and get into action. Shuttling back and forth these tones of data to and from cloud, and process them on the cloud, cannot guarantee real-time detection and situation controlling.

Embedding AI Algorithm on NPUs

The surest way to implement AI algorithm on devices is to embed it into NPUs.

Traditional legacy CPU and GPU, and even FPGA are now competing head-on-head to become de facto, mainstream computing architecture for AI algorithm, but NPUs are emerging as the best candidate in terms of system performances, power efficiency, and real-time, and speedy data processing.

On-Device AI Development Road Map

Samsung’s experimentation with on-device AI started two years back when it tested its own object detection and machine control AI algorithm on an in-vehicle computer for self-driving cars. Since then, Samsung has been trying to build a perceptional algorithm for object detection around a deep learning technology, while building machine control AI algorithm around a combination of machine learning and heuristic methodologies.

The first commercial products of such AI algorithm were its Galaxy S8 smartphones, detecting, identifying, and authenticating users’ faces to lock and unlock their smartphone screens. The AI algorithm is so fast, so precise, and so powerful that it can match that of a fingerprint detection AI algorithm in terms of performance. It is also power efficient and compact as it works just on one ARM core.

Samsung Exynos 9820 incorporates a dedicated NPU, powering its newly released Galaxy S10 smartphone.
Photo 2: Samsung Exynos 9820 incorporates a dedicated NPU, powering its newly released Galaxy S10 smartphone.

Samsung is also working hard on anti-spoofing AI algorithm to prevent other hackers from hacking, or counterfeiting, users’ face photos and then unlocking their smartphone screen. Its next technology roadmap is to develop live inspection AI algorithm that can discern human faces from other creatures.

Speech or voice recognition AI algorithm is what Samsung has been working on, too, as the company has been trying to revolutionize user interface the way that people interacts with machines.

The company has built its text-to-speech machine translation algorithm Bixby around an encode and decode architecture-based speech recognition AI algorithm. The text-to-speech AI algorithm perform on devices as well as the counterpart on the cloud, even if it incorporates the same amount of data as on the cloud, because its model works on compressed sets of data, which is reduced by a factor of 20 times or 30 times.

All combined, Samsung has embedded these object detection and speech recognition AI algorithms on its homegrown NPU core to put them on devices like smartphones.

According to Shim, Samsung’s NPUs are so programmable and so flexible that they can support a variety of communications network standards. They are also scalable and power efficient, as they are quantified into 8-bit fixed points operation architecture rather than 32-bit floating points. They also feature a pruning technology to ensure zero-skipping function, which skips all of zeroes to reduce MAC blocks’calculation load.

The NPU was embedded into Exynos 9820 CPU that powers Samsung’s newly released flagship smartphone Galaxy S10 as the first zero-skipping NPU. Yet, the NPU still has a long way to go until it works as smart as humans and performs lots of tasks without compromising power efficiency. One biggest challenge of the NPU is how to improve memory bandwidth by developing a new algorithm to use less storage space of memory chips, or to develop and implement in-memory logic architecture, under which logic functions work inside memory chips.