How do agents work?

Alibaba just introduced the XuanTie C950, a server-class processor built specifically for running AI agents at scale. The chip, announced at the company’s annual ecosystem conference in Shanghai, runs on a 5-nanometer process at 3.2 GHz and delivers over three times the performance of its predecessor. For organizations planning AI agent deployments, this signals a shift in how the infrastructure behind those agents is being designed, priced, and controlled.
Agentic AI refers to systems that go beyond generating text or answering questions. These are AI systems that autonomously carry out multi-step tasks: pulling data from one system, making a decision, updating a record in another, and coordinating with other agents to complete a workflow. A supply-chain agent might monitor inventory, renegotiate supplier terms based on real-time pricing, and trigger reorders without human input. An e-commerce operations agent might adjust pricing across marketplaces, manage product listings, and resolve disputes end to end.
These workflows put different demands on hardware than a chatbot answering isolated questions. A chatbot needs one fast response. An agent orchestrating a ten-step workflow across three enterprise systems needs sustained, low-latency compute at every step. That requires processors optimized for sequential decision-making, not just raw parallel throughput. The C950 is designed for exactly this kind of workload.
The C950 is a CPU, not a GPU. GPUs handle the parallel calculations needed to train large AI models. CPUs handle sequential, general-purpose tasks: reading inputs, managing logic, and executing instructions in order. That makes CPUs critical for AI inference, the stage where a trained model actually processes real inputs and produces real outputs.
The technical profile: 5-nanometer fabrication, 3.2 GHz clock speed, RISC-V architecture. It uses an 8-instruction decode width and 16-stage pipeline, which means it can read and execute large volumes of commands efficiently. Alibaba claims it scored over 70 points on the SPECint2006 benchmark, a new global record for RISC-V processors.
Paired with Alibaba’s Vector Acceleration Engine and Matrix Acceleration Engine, the chip runs inference for the company’s Qwen language models and the open-source DeepSeek series. The architecture also allows customization: users can tailor instruction sets for specific inference patterns, which Alibaba says delivers over 30% performance improvement compared to mainstream alternatives when optimized for particular use cases.
The C950’s RISC-V architecture is not just a technical choice. RISC-V is an open-source chip blueprint, free from licensing fees and, critically, free from U.S. export controls. The rival architecture, Arm, requires royalties and is tied to Western IP. U.S. restrictions have limited Chinese access to advanced Nvidia GPUs, accelerating the push toward architectures China can develop and manufacture independently.
Alibaba launched the XuanTie series in 2018 and has iterated steadily: the C910 in 2019, the C920 in 2024, server-grade chips in 2025, and now the C950. T-Head, Alibaba’s chip design unit, has shipped over 470,000 AI chips as of February 2026 and is approaching 10 billion yuan (roughly $1.45 billion) in annual revenue. The unit is reportedly preparing for a separate listing.
The broader context is significant. Chinese open-source language models captured approximately 30% of global market share in 2026, up from 1.2% in 2024, according to OpenRouter analyst data. At every layer, from models to chips to agent platforms, China’s AI ecosystem is becoming less dependent on Western technology.
The C950 matters beyond Alibaba’s own cloud. It signals that major infrastructure providers are designing silicon specifically for agent workloads. When chip makers optimize for multi-step reasoning and orchestration rather than single-turn generation, it changes what becomes practical to run at scale and at what price point.
Consider the parallels to how organizations deploy AI agents today. An AI Email Agent that triages incoming messages, drafts responses, and routes action items runs dozens of inference calls per email thread. A Pro-Active Agent monitoring project timelines runs continuous inference loops to flag risks before they escalate. A Custom AI Agent managing department-specific workflows like invoice processing or compliance checks needs sustained compute across every step of a multi-stage pipeline.
Purpose-built inference hardware makes these workloads cheaper and faster. As more providers follow Alibaba’s lead, the cost of running agent orchestration at scale will drop, making multi-agent deployments accessible to mid-sized organizations that today find them cost-prohibitive.
Alibaba does not sell the C950 externally. Instead, it powers Alibaba Cloud services, which means enterprise customers access the silicon through cloud APIs. But the implications extend beyond one vendor.
First, inference costs are heading down. When major cloud providers design their own chips, they reduce dependence on Nvidia’s pricing and pass some savings to customers. For organizations running AI agents across multiple departments, even small per-inference cost reductions compound quickly.
Second, the hardware competition validates the agent model. When billion-dollar chip programs are built around agentic workloads, it confirms that the industry sees multi-agent systems as the dominant AI deployment pattern, not a niche experiment. Organizations that wait to build their agent strategy will find themselves further behind as infrastructure costs fall and adoption accelerates.
Third, vendor diversification matters. As Chinese and Western AI stacks diverge, organizations operating globally may need agent architectures that work across cloud providers. A context-first approach, paired with structured team reskilling, where your Interactive Agent draws from a shared knowledge base rather than being locked to one vendor’s models, protects against infrastructure shifts.
Design your AI agent workflows so they are not locked to a single cloud provider or chip architecture. Use orchestration layers that can route inference to whichever backend offers the best price-performance ratio at any given time. This protects you as the hardware market shifts.
Most organizations do not track per-agent inference spending. Start measuring it now. Know what each agent workflow costs per transaction so you can take advantage of price drops as purpose-built chips like the C950 enter production. An Agent Strategy Scan can help identify where your highest-volume inference workloads sit.
The biggest cost savings from cheaper inference hardware will hit high-volume, multi-step workflows first. Identify which agents in your organization handle the most transactions: email triage, customer routing, document processing. These are the workflows where infrastructure improvements translate directly to margin improvement.
Cheaper inference means more organizations will deploy agents. The differentiator will not be compute, it will be context. The organizations that win will be those whose agents understand their specific business rules, customer history, and operational patterns. Start building that context layer now, so when costs drop, you are ready to scale.
Watch for T-Head’s potential IPO, Alibaba Cloud pricing changes, and whether competitors like Tencent and ByteDance release their own inference-optimized chips. Each development will affect agent deployment economics. Organizations that track these shifts can time their scaling decisions to coincide with cost inflection points.
How does Use Your AI deploy AI for any organisation in days?