Demand Surges 300x! How Does Huawei Cloud Handle the Massive Traffic?

Article picture

Aug 27 – At the opening of the 4th 828 B2B Enterprise Festival, Huawei Cloud announced its Tokens Service fully integrates with the CloudMatrix384 Super Node. Leveraging the "hybrid advantage" to offset single-point shortcomings, it achieves a performance leap.

1756366172637970.png

Huawei Cloud stated that via xDeepServe architecture innovation, a single chip can reach a maximum throughput of 2400 TPS and an ultra-low latency of 50ms TPOT—performance far exceeding industry standards.

In the past 18 months, domestic AI computing demand has grown exponentially: daily Token consumption rose from 100 billion in early 2024 to over 30 trillion by end-June 2025, a 300x+ increase in a year and a half. This reflects the rapid expansion of domestic AI applications and raises higher requirements for computing infrastructure.

Launched in March 2025 (building on existing per-GPU-hour billing), Huawei Cloud’s MaaS-based Tokens Service offers multiple specifications (online, on-premise, offline, premium) to meet diverse performance and latency needs.

Now, with CloudMatrix384 integration and the native xDeepServe framework, Tokens Service achieves another major throughput breakthrough: up from 1920 TPS at the start of 2025 to 2400 TPS, with TPOT as low as 50ms.

Currently, Huawei Cloud MaaS supports major large models (DeepSeek, Kimi, Qwen, Pangu, SDXL, Wan) and mainstream Agent platforms (versatile, Dify, Kouzzi).

ICgoodFind: We’ll track Huawei Cloud’s tech progress to deliver the latest updates for the AI computing industry.

Leave a comment

Comment

    No comments yet

©Copyright 2013-2025 ICGOODFIND (Shenzhen) Electronics Technology Co., Ltd.

Scroll