Chinese tech giant Huawei used its Huawei Connect 2025 conference in Shanghai to lay out a multi-year roadmap for the next generations of its Ascend processors and the cluster products that will host them. In his keynote, deputy chair of the Huawei board, Eric Xu, said that 2025 had been a “memorable year,” and pointed to the January debut of DeepSeek-R1 as a turning point for the firm. He acknowledged that China is likely to lag behind in semiconductor manufacturing process nodes, “for a relatively long time.”
The company framed its response to trade restrictions around boosting infrastructure design and moving more software into the public domain. Huawei said it will open-source large portions of its stack, naming the openPangu foundation AI models and the Mind series SDKs among the releases planned.
Huawei confirmed three new Ascend lines: the 950, 960, and 970. The initial family will include Ascend 950PR and Ascend 950TO, which Huawei says are built from the same die and add support for lower-precision formats such as FP8. The Ascend 950 is rated at 1 PFLOP, while MXFP8 is rated at 2 PFLOPs. A PFLOP is one thousand trillion floating point calculations per second.
On-chip vector units will receive upgrades and the memory subsystem will offer finer-grained access, dropping from 512-byte blocks to 128-byte chunks. Huawei expects these changes to improve both throughput and latency for large model inference and training workloads.
Interconnect bandwidth on Ascend 950 parts is set at 2 TB/s, about 2.5 times the figure for the current Ascend 910C. The company said the 950PR will arrive in Q1 2026, with the Ascend 950DT slated for Q4 2026.
A follow-up silicon family, the Ascend 960, is due in Q4 2027. Huawei claims the 960 will double the 950 in compute, memory access bandwidth, memory capacity and interconnect port count. The 960 will support Huawei’s proprietary HiF4 data format, which the vendor says offers greater precision than other FP4 technologies.
The top-end Ascend 970 is scheduled for Q4 2028. Xu said, “We’re still working on some of its specs, but our general goal is to push all of its specs much higher.” Preliminary targets include a 4 TB/s interconnect bandwidth, 8 PFLOPs of FP4 performance and expanded on-package memory capacity.
Huawei’s commercial approach pairs the new NPUs with large-scale compute pods it calls SuperPoDs. The first entry will be the Atlas 950 SuperPoD, due to appear in Q4 2026 and built around Ascend 950DT chips.
NVIDIA plans its NVL144 system, seen as an analogue to a SuperPoD, for mid- to late-2026. Huawei presented internal comparisons showing the Atlas 950 SuperPoD containing 56.8 times more NPUs than the number of GPUs in an NVL144 configuration and offering nearly seven times the processing power. NVIDIA’s NVL576 is slated for 2027; Huawei told attendees it expects the Atlas 950 SuperPoD to remain the stronger performer.
On the CPU side, Huawei said it will ship two variants of the Kunpeng 950 in Q1 2026: a 96-core, 192-thread model and a higher-end 192-core, 384-thread variant. Xu described what he called “the wold’s first general-purpose computing SuperPoD,” the Kunpeng 950-based TaiShan 950 SuperPod, which Huawei plans to make available in Q1 2026.
Both NPU and general-purpose SuperPoDs will use UnifiedBus 2.0, the next version of the interconnect that underpins its SuperPod designs. UnifiedBus 1.0 is already the fabric for the Atlas 900 A3 SuperPoD, which entered service in March this year and has more than 300 installations to date. Huawei said UnifiedBus 2.0 will be published as an open protocol with technical specifications released immediately to the developer community.
The vendor intends for UnifiedBus 2.0 to connect SuperPoDs into larger groupings called SuperClusters. The first such product, the Atlas 950 SuperCluster, is claimed to offer 2.5 times more NPUs and 1.3 times more compute than xAI’s Colossus, which Huawei described as the world’s most powerful computing cluster today.
Looking further ahead, Huawei plans an Atlas 960 SuperCluster in the last quarter of 2027 that will assemble more than a million NPUs and deliver 4 ZFLOPS in FP4. A ZFLOP represents 10^21 floating point operations per second. “SuperPoDs and SuperClusters powered by UnifiedBus are our answer to surging demand for computing, both today and tomorrow,” Xu said.

