News

Current location：

How strong is the DPU's ability to "reduce costs and increase efficiency"? Under a blue ocean, domestic DPUs still need to overcome the pain of "high energy consumption"

Categories:News
Author:
Origin:
Time of issue:2022-08-12
Views:0

(Summary description)For a long time, the CPU dominated the entire data center, and virtualization computing power became the focus of cloud computing.

Until the explosive growth of artificial intelligence applications, GPU's chip architecture was found to be more suitable for supporting large-scale AI model training and reasoning, and GPU became the second largest computing power chip in data centers.





How strong is the DPU's ability to "reduce costs and increase efficiency"? Under a blue ocean, domestic DPUs still need to overcome the pain of "high energy consumption"

Categories:News
Author:
Origin:
Time of issue:2022-08-12
Views:0

Information

For a long time, the CPU dominated the entire data center, and virtualization computing power became the focus of cloud computing.

Until the explosive growth of artificial intelligence applications, GPU's chip architecture was found to be more suitable for supporting large-scale AI model training and reasoning, and GPU became the second largest computing power chip in data centers.

As bottlenecks in data center infrastructure become increasingly difficult to overcome, a new generation of DPUs has emerged.

What is 01DPU?

DPU is a data processing unit oriented towards the infrastructure layer. The so-called infrastructure layer is a logical layer that is different from the application layer, in order to provide physical or virtualized resources, and even basic services for the "application". The existing computing systems are artificially divided into infrastructure layer (IaaS), platform layer (PaaS), software layer (SaaS), and the top layer is the application layer. In view of this, Intel also refers to its own DPU as an "IPU".

From the perspective of optimization technology, the more fundamental the components are, the more inclined they are to prioritize performance and have more "machine dependencies"; The optimization at the upper level is more oriented towards production efficiency, and through layer by layer encapsulation, it shields the differences at the bottom and is transparent to users.

Why is there a DPU for the infrastructure layer? Does it mean that the existing data centers' CPUs, GPUs, routers, and switches cannot continue to serve as "data processing units for the infrastructure layer"?

In the study of computing systems, it is largely a study of "optimization"; The existing infrastructure is not inadequate, but rather insufficient for "optimization". Without the invention and introduction of new technologies, the contradiction between final demand and supply will become increasingly prominent.

The first issue to be addressed with the emergence of DPU is the problem of network packet processing. As the core network and convergence network develop towards 100G and 200G, and the access network also reaches 50G and 100G, the CPU cannot provide enough computing power to process data packets. Moreover, the growth rate of network bandwidth is driven by the abundance of applications, the expansion of data center scale, and the advancement of digitization, while the CPU performance growth rate decreases with the slowdown of Moore's Law, further exacerbating the computational burden on the CPU on server nodes.

Another example is the "data forwarding between virtual machines" problem in cloud computing scenarios, known as OVS. Typically, 20 virtual machines require approximately 5 cores of computing power from a maximum of 5 multi-core processors, which is a significant overhead and also a reason for the availability of DPUs.

In addition, the current system architecture is not designed to handle network data, and its efficiency is not high in scenarios of high bandwidth networks, random access, and high concurrency transmission and reception. Existing technology has pioneered the use of "polling" instead of interrupts to handle IO operations, but these "tinkering" based on the existing system can only be seen as a temporary solution, essentially an adaptation of classical technology in new scenarios.

Some people simply understand DPU as "reducing the burden" on the CPU, treating it as a "variant" of a network card, and seeing it as a simple algorithm hardware carrier, presenting itself as a "simple mind, well-developed limbs" image. But if we re-examine the carrier distribution of system functions, we will see that DPU is not just an accelerator, but a key component that cooperates with the CPU in all aspects.

From the host responsible for all management, control, and data functions, to gradually "uninstalling" these functions to iteratively generate heterogeneous computing, smart network cards, and DPUs, the value of DPUs has become increasingly evident, and people can even build computing systems centered around DPUs. Not long ago, Alibaba Cloud announced the CIPU, which claims to replace CPU as the core hardware of the new generation of cloud computing. It can be said that it has pushed DPU to the center of the stage. Although there is still controversy, this may also be the direction of DPU development.

The relationship between 02DPU and CPU, GPU

Changing from SmartNIC to DPU is not simply changing the name. In order to fully achieve application efficiency in the data center, functions such as transfer offloading, programmable data plane, and hardware offloading for virtual switching are important parts of SmartNIC, but they are only one of the most basic requirements of DPU.

To elevate SmartNIC to the level of DPU, it is necessary to support more functions, such as being able to run the control plane and providing C language programming in the Linux environment.

DPU is a specialized processor for data centers, adding various acceleration functions such as AI, security, storage, and networking, and will become a new generation of important computing power chips. It can accelerate performance sensitive and universal work tasks, better support the upper layer services of CPU and GPU, and become the central node of the entire network.

Lao Huang summarized the characteristics of three DPUs: unloading, acceleration, and isolation. Corresponding to the three main application scenarios of DPU: network, storage, and security:

Uninstall: Data center network services, such as virtual switching and virtual routing; Data center storage services, such as RDMA and NVMe (which can be understood as some remote storage technologies); Security services in data centers, such as firewalls, encryption and decryption, etc

Acceleration: The services and applications mentioned above are usually implemented using software and run on the CPU. And DPU can use hardware to implement and run these applications, which is several orders of magnitude faster than software, which is what we often hear as "hardware acceleration"

Isolation: As the application mentioned above runs in the DPU, while the user application runs in the CPU, the two are separated. This will bring many safety and performance benefits

Some basic positioning analysis:

Independent DPUs are positioned on infrastructure processors, primarily for hardware acceleration

Independent GPUs are mainly used for elastic computing acceleration at the application layer

And the CPU is mainly responsible for the work of the application layer with low computational density and high value density

As shown in the following figure: CPU has 60 area units, totaling 60 CPU cores; The GPU has 60 area units, totaling 60 GPU cores (approximately corresponding to streaming multi-core processor SM); The DPU is composed of 10 CPU cores, 10 GPU cores, and 40 other acceleration engine cores.

CPU is the definer of the entire IT ecosystem, whether it is x86 on the server side or ARM on the mobile side, each has built a stable ecosystem, not only forming a technology ecosystem, but also a closed value chain.

GPU is the main chip that performs rule calculations, such as graphic rendering. After NVIDIA's promotion of the General GPU (GPGPU) and CUDA programming framework, GPU has become the main computing power engine in data parallel tasks such as graphics and images, deep learning, matrix operations, and the most important auxiliary computing unit for high-performance computing. Among the top 10 high-performance computers (supercomputers) announced in June 2021, six (2nd, 3rd, 5th, 6th, 8th, 9th) have NVIDIA GPUs deployed.

The emergence of DPU is a milestone in heterogeneous computing. Similar to the development of GPU, DPU is another typical case of application driven architecture design; But unlike GPUs, DPUs are designed for more low-level applications. As DPU offloads the infrastructure operations of the data center from the CPU, the data center will form a trinity of DPU, GPU, and CPU.

The DPU first serves as the engine for computing offloading, with the direct effect of reducing the burden on the CPU. Some of the functions of DPU can be seen in the early TOE (TCP/IP Offloading Engine). As its name suggests, TOE is the task of "offloading" the CPU's processing of the TCP protocol to the network card.

Although the traditional TCP software processing method has a clear hierarchy, it has gradually become a bottleneck in network bandwidth and latency. The software processing method affects the CPU usage and also affects the performance of the CPU in processing other applications. TCP offloading engine (TOE) technology, by entrusting the processing process of TCP and IP protocols to the network interface controller, significantly reduces the pressure on CPU processing protocols while utilizing hardware acceleration to improve network latency and bandwidth.

Analysis of network data processing structure:

What is the strength of 03DPU in reducing costs and increasing efficiency?

In order to seize the dividends of the DPU track, more and more startups have emerged in China. Through high starting point architecture concepts and independent innovation and research and development, many local DPU startups have increasingly emerged on this $10 billion track.

At present, mainstream players on the domestic DPU track include startups such as Xinqiyuan, Zhongke Yushu, Yunbao Intelligent, Dayu Intelligent, Edge Intelligent, Xingyun Intelligent, and Yunmai Intelligent.

But in terms of DPU technology route, each company has slightly different choices. According to the author's understanding, from the current mainstream technology architecture, there are generally three design architectures for DPUs: Arm multi-core or MIPS multi-core; The second type is a SmartNIC architecture based on FPGA; The third type is the architecture of heterogeneous core arrays.

Among them, although architectures based on ARM multi-core or MIPS multi-core arrays can offload clearly defined tasks, such as standardized security and storage protocols, they are based on software programmable processors and lack processor parallelism, resulting in slower speeds when used for network processing. Meanwhile, the fixed function engines in multi-core SmartNIC ASICs cannot be extended to handle new encryption or security algorithms because they lack sufficient programmability and can only adapt to minor algorithm changes.

The DPU based on the SmartNIC architecture of FPGA has the advantages of high flexibility and programmability. In terms of development, it can have high programmability like a CPU, or quickly develop new functions like in SoC solutions. At the same time, it can save some effort on interfaces, but many important parts have not yet been broken through. At the same time, FPGA prices are well-known, and the cost of creating DPU solutions based on this is relatively high.

In contrast, the architecture of heterogeneous core arrays is currently the most favored by startups, mainly due to their higher flexibility and the ability to bring more efficient data processing efficiency. But there are also weaknesses, such as the need for enterprises to develop their own architecture and the high cost of research and development investment. According to the KPU architecture, four types of heterogeneous cores are organized to handle network protocols, OLAP/OLTP processing, machine learning, and secure encryption computing cores.

However, for the "big computing power network" application under the East West computing, the ultimate consideration is still the cost-effectiveness of DPU. Nowadays, with the launch of the East Digital West Computing project, cloud service enterprises in China, represented by operators, continue to increase their investment and attention in this field. From the capital expenditure plans disclosed by the three operators, it can be seen that they are all investing heavily in the East Digital West Computing project.

From the perspective of specific data, China Mobile plans to spend 48 billion yuan on computing power network capital in 2022, with approximately 450000 IDC racks available for external use and a cumulative production of over 660000 cloud servers; China Telecom is expected to increase the proportion of industrial digitization in capital expenditure from 15.6% to 30% from 2020 to 2022. Among them, IDC plans to invest 6.5 billion yuan (an increase of 45000 frames) and computing power investment 14 billion yuan (an increase of 160000 cloud servers) in 2022, with a computing power scale of 3.8 EFLOPS in 2022, a year-on-year growth of over 80%.

At the same time, China Unicom will also optimize and expand the resource layout of "5+4+31+x" around the eight major computing power hubs of the country's Eastern Digital and Western Computing. For example, China Unicom has launched the "Eastern Digital and Western Computing" national hub node construction project in Tianjin, which can accommodate approximately 25000 cabinets after completion. The total investment of China Unicom's Gui'an Cloud Data Center project is about 6 billion yuan, with a planned total of 32000 servers, which can accommodate 600000 servers.

Industry insiders have told me, "A server may not have a GPU, but there must be one or more DPUs, just like every server must be equipped with a network card." Taking China Mobile, which has the largest server investment among the three major operators, as an example, in 2022, more than 660000 new cloud servers were added. If calculated at around 10000 yuan per DPU, the minimum configuration would result in additional cost expenditures of 6.6 billion yuan for operators.

Although this additional cost is relatively high for data center projects in the early stages, the value of DPU in optimizing data resources for the entire data center system is long-term and sustainable. According to the author's information from DPU chip developers, if divided by different types of cloud service scenarios, large-scale adoption of DPU can reduce the overall operating costs of data centers by about 15% -30%. This is a remarkable optimization for the "East West Computing" project, which often reaches the scale of millions of racks.

04 2025 is a large-scale entry period

As is well known, due to the fact that "East to West Computing" is a project involving information security, this characteristic determines that this national level project is destined to be dominated by domestic DPU brands. Currently, apart from a few giants with capital and product landing capabilities such as Alibaba (currently mostly for personal use) and Huawei, most domestic DPU startups are currently in the "preparation period" of a race to conquer.

Industry insiders have told me that starting from 2022, it will take at least 2-3 years for DPU to truly play a role in markets such as "East West Computing". The true maturity period of the DPU market is expected to be around 2025, and only after the hardware and software mature simultaneously will it truly begin to explode on a large scale in various application scenarios.

Due to the design and development of DPUs by various companies starting from this year, it will take at least 2-3 years from design to actual hardware system formation. In the past three years, there has not been a strong competitive relationship among startups. The core of enterprise competition is still the competition for technology iteration cycles, product introduction cycles to the market, and small-scale shipping capabilities.

But to truly enter the field of "Eastern Digital Computing", even domestic DPU enterprises still have many barriers. Currently, the construction volume of data centers in China, which involve "counting from east to west", is sufficient to accommodate a large number of local start-up companies. However, due to the unique nature of government projects, multiple tests have been put forward for enterprises in terms of project experience, comprehensive technical capabilities of DPU products, product performance and functions, supporting capabilities, and team comprehensive strength.

In addition, issues related to energy consumption have gradually become the main pain point faced by DPU enterprises. Energy consumption is undoubtedly a significant constraint for any chip, and DPU is no exception. Especially with stricter regulation starting in the second half of 2021, mainly in the eastern coastal region, the government's management methods for data center projects are becoming increasingly strict, and management paths and methods are gradually increasing.

As a key energy consuming unit, data center projects require approval from local governments and the provision of energy-saving reports. The various indicators provided are also set and managed by relevant departments of development, reform, and economic and information technology in various regions. The indicators for data centers mainly focus on controlling energy consumption, including PUE (Power Usage Efficiency), the proportion of renewable energy structure, and carbon emissions. Among them, PUE is the core policy driver.

It is reported that PUE is the ratio of total energy consumption in data centers to IT equipment energy consumption, with a benchmark of 2. The closer the value is to 1, the higher its energy efficiency. In July 2021, the Ministry of Industry and Information Technology released the "Three Year Action Plan for the Development of New Data Centers (2021-2023)", which pointed out that in terms of energy efficiency, the PUE of newly built large and above data centers should be reduced to below 1.3, and efforts should be made to reduce it to below 1.25 in severe and cold areas. From the perspective of various regions, in 2021, Shanghai requires a stock data center PUE of no more than 1.4, new data centers are limited to PUE below 1.3, and Beijing and Shenzhen require PUE below 1.4.

But from 2022 until now, in less than a year, the indicators in various regions have decreased from 1.5 to 1.4, and some cold regions have even controlled them to 1.25. In fact, values below 1.3 are difficult to achieve for a typical data center model, which also imposes higher energy consumption requirements on various components used in the data center.

However, due to the immature development of technology, the current "overheating" and high power consumption are significant pain points for DPUs. Even for foreign Fengible, NVIDIA's DPU, or Intel's IPU, power consumption is a major flaw in such products. In the past, a single network DMA chip consumed only about 5 watts of power, but now a DPU often consumes over 100 watts (Fungible F1 120 watts).

Therefore, currently, most application scenarios are still difficult to withstand network devices with such high power consumption. Especially above 100/200G, when the power consumption of the optical module has exceeded that of the network equipment, adding another 100 watt network DPU will greatly increase the energy consumption of the network, making it even more difficult to meet the increasingly strict PUE requirements of the current "East West Computing". From this, it can be seen that the power consumption is still the current pain point when DPU enters the "East West Computing" scenario.

Scan the QR code to read on your phone

Previous: EDA Cloud Tools Achieve Commercialization Node, Can New Models Shape the Future of IC "Core"

Next: What is the cause of battery fire? The first domestically produced battery pack sensing and monitoring chip that can predict in advance.

Previous: EDA Cloud Tools Achieve Commercialization Node, Can New Models Shape the Future of IC "Core"

Next: What is the cause of battery fire? The first domestically produced battery pack sensing and monitoring chip that can predict in advance.