FOR EDUCATIONAL AND KNOWLEDGE SHARING PURPOSES ONLY. NOT-FOR-PROFIT. SEE COPYRIGHT DISCLAIMER.

Elon Musk plans to expand Colossus AI supercomputer tenfold. FT. 

Facility in Memphis expected to incorporate more than 1mn GPUs as billionaire’s xAI aims to catch up with rivals

Elon Musk’s artificial intelligence start-up xAI has pledged to expand its Colossus supercomputer tenfold to incorporate more than 1mn graphics processing units, in an effort to leap ahead of rivals such as Google, OpenAI and Anthropic.

Colossus, built in just three months earlier this year, is believed to be the largest supercomputer in the world, operating a cluster of more than 100,000 interconnected Nvidia GPUs. The chips are used to train Musk’s chatbot Grok, which is less advanced and has fewer users than market-leader ChatGPT or Google’s Gemini.

Work has already begun to increase the size of the facility in Memphis, Tennessee, according to a statement from the Greater Memphis Chamber on Wednesday. Nvidia, Dell and Supermicro Computer would also establish operations in Memphis to support the expansion, the chamber of commerce said, while it would establish an “xAI special operations team” to “provide round-the-clock concierge service to the company”.

The cost of acquiring so many GPUs would be significant. The latest generation of Nvidia GPUs typically cost tens of thousands of dollars, although older versions of the chips can be cheaper. Musk’s planned expansion of Colossus would require an investment likely to reach tens of billions of dollars — plus the high cost of building, powering and cooling the vast servers in which they would sit.

xAI has raised about $11bn in capital from investors this year. AI companies are scrambling to secure GPUs and access to data centres to supply the computing power needed to train and run their frontier large-language models. OpenAI, the maker of ChatGPT, has an almost $14bn partnership with Microsoft that includes credits for computing power.

Anthropic, the maker of the Claude chatbot, has received $8bn in investment from Amazon and will soon be given access to a new cluster of more than 100,000 of its specialised AI chips. Rather than form partnerships, Musk, the world’s richest man, has used his power and influence within the tech sector to build his own supercomputing capacity, although he is playing catch-up after founding xAI barely more than a year ago.

The trajectory has been steep — the start-up is valued at $45bn and recently raised another $5bn. Musk is in fierce competition with OpenAI, which he helped co-found with Sam Altman among others in 2015. The pair subsequently fell out and Musk is now suing OpenAI, seeking to block its transition from a non-profit to a more traditional enterprise.

An investor in xAI said the speed with which Musk had created Colossus was the “feather in the cap” of the AI company, despite it having limited commercial product offerings. “He has built the most powerful supercomputer in the world in three months.” Huang has referred to Colossus as “easily the fastest supercomputer on the planet as one cluster”, and said a data centre of this size would typically take three years to build.

The Colossus project has attracted controversy for the speed in which it was built. Some of have accused it of skirting planning permissions and criticised the demands that it places on the region’s power grid. “We’re not just leading from the front; we’re accelerating progress at an unprecedented pace while ensuring the stability of the grid utilising megapack technology,” Brent Mayo, xAI’s senior manager for site builds and infrastructure, said at an event in Memphis, according to the statement.

FOR EDUCATIONAL AND KNOWLEDGE SHARING PURPOSES ONLY. NOT-FOR-PROFIT. SEE COPYRIGHT DISCLAIMER.