Are the leading Global AI companies in ‘DeepSeek’?

Are the top AI companies in ‘Deep' ... Seek?

Occasionally, something explodes in the world of tech that shocks almost everybody. It’s the dotcom bubble bursting, or the first generation of iPhones being released. And the excitement which followed, the awe, the wows, the rush to get hold of one, and show your friends what it could do? Yes, that kind of thing. Much more recently, remember when ChatGPT was first released to the public? And how after the media caught wind of it, everyone suddenly started talking about it. A few people even went crazy for a few weeks and couldn’t stop yapping about it.

Well, a few days ago, a similar technological shift may just have happened. And some people are losing their minds over it.

Chinese company DeepSeek released a powerful new open-source artificial intelligence large language model (LLM) called R1, which has left some tech folks not only in Silicon Valley but across the western world panicking, and trying to explain what is going on.

A Guardian report noted that the Nasdaq Composite index, which has got a lot of tech companies listed, closed down 3.1% on the day R1 was released, with the drop at one point wiping off more than $1trillion in value

via Guardian

Benchmarking surprise

Initial benchmarking tests released by DeepSeek (see image above) show that their R1 is performing just as good, if not marginally better than Open AI's ChatGPT o1 model in some tests. What’s even more surprising is the budget which DeepSeek claims they used to develop some of their models.

Who has been affected?

To say that DeepSeek’s R1 has blown everything out of the water is an understatement. Since DeepSeek was released, the share prices of several major AI companies have experienced significant declines. Nvidia (NVDA) saw its shares drop by approximately 17% on the day following the DeepSeek announcement, which translated into a market value loss of around $593 billion. Broadcom Inc. experienced a similar drop, with shares falling about 17%. Advanced Micro Devices (AMD), whose MI300 series data centre accelerators and processors (CPUs) like the EPYC series are used in AI development and deployment, for AI training, inference and AI server infrastructure, also had a decrease of approximately 6% in its share price. Oracle Corporation saw its shares decline by 14%, whereas Microsoft (MSFT), another significant investor in AI companies (they’ve invested at least $13 Billion in OpenAI) saw its shares fall by about 3.8% on the day of the market reaction to DeepSeek. Alphabet (GOOGL) shares were down by more than 4%. ASML, another key player in semiconductor manufacturing, saw its shares decrease by over 7%. SoftBank Group, which has significant AI investments, dropped by 8% in its share price. Marvell Technology Inc (MRVL)’s share price, whose AI-related products include custom ASICs (Application-Specific Integrated Circuits), and data processing units (DPUs) designed for data centers and cloud computing infrastructure, tumbled by 19.1%.

The prevailing opinion online and in newsrooms in the days following R1’s release is that DeepSeek’s rise is a clear indication that the US had lost its advantage or pioneering edge in AI. But experts, including executives at leading AI model development companies suggested that various shifts in the technological landscape were taking place. Rather than solely pursuing ever-larger models which demand vast computing power, the focus appeared to be now shifting towards advanced capabilities like reasoning. This shift creates opportunities for smaller, innovative startups like DeepSeek, which haven’t secured massive funding, nor are backed by large Tech behemoths like Google, Amazon, Meta or Microsoft.

But the warnings were there, in September DeepSeek published an image showing it’s model’s achievement on the LMSYS Chatbot Arena Rankings.

Sputnik moment

Marc Andreessen, the veteran Venture Capitalist, whose VC firm is invested in AI companies, posted on twitter that “DeepSeek-R1 is AI’s Sputnik moment”.

But How much does it cost to build an AI?

Developers and AI enthusiasts in their droves visited DeepSeek's website and app recently to experiment with the company's newest model, posting demonstrations of its advanced abilities across social media platforms. This activity coincided with a Monday decline in US tech stocks, notably Nvidia, as investors expressed doubts about the scale of financial investments in AI development.

DeepSeek’s technology is commendable for several reasons. Not only has it been developed by a relatively small Hangzhou-based AI lab in China, but a research paper published last December claimed that DeepSeek-V3, an earlier LLM cost only $5.6 million to build, which is a fraction of the $100million that OpenAI1 said that some of its models cost. The implication was that companies may not after-all need thousands upon thousands of graphics processing units for their artificial-intelligence projects.

This matters because the computational and operational expenses associated with training large-scale transformer-based language models can be expensive. At least in the west, they have been. Estimates range from $100 million to over $1 billion for architectures comparable to GPT-4, encompassing costs for high-density GPU/TPU clusters, power consumption, ML engineering expertise, dataset curation, and iterative training cycles. While the precise training expenditures remain proprietary information undisclosed by major AI research organizations like OpenAI, Google, Anthropic, and Meta, cost approximations are derived from published academic literature, industry analyses, and sporadic disclosures in technical presentations. Contemporary cost modelling frameworks indicate that total expenditure scales with architectural parameters such as model dimensionality, training corpus size, compute-hours required for convergence, hardware utilisation efficiency, and implementation-specific optimizations. The financial investment represents both the direct computational resources required for training as well as the substantial infrastructure and expertise needed to develop these high-parameter neural architectures, including data centre operations, distributed training frameworks, and specialized machine learning talent.

Thus, for DeepSeek to claim to have achieved similar benchmarking results to the LLMs of the top AI companies, at a fraction of the budget was bound to cause panic and alarm. Their R1 model appears to be so efficient that it’s sparked discussions about cost reduction at major tech companies. Some engineers at the top AI companies are already wondering whether their companies should begin to investigate DeepSeek's research and techniques to identify potential AI spending cuts.

“It’s probably too early to really have a strong opinion on what this means for the trajectory around infrastructure and Capex,” Mark Zuckerberg said. “There are a bunch of trends that are happening here all at once.”

Zuckerberg was talking to reporters about Meta’s earnings, and went on to say that his company is still studying DeepSeek and the implications of the release, and that eventually they may adopt the best aspects of what is learned. But it’s unlikely to significantly lower costs.

“It’s going to be expensive for us to serve all of these people because we are serving a lot of people,” said Zuckerberg. He also said DeepSeek’s emergence is a validation of Meta’s open-source approach to AI.

As things stand, the true price of developing DeepSeek’s R1 is not yet known. Thus, many analysts and industry experts continue to believe that the true cost must have been a lot higher. However, even if it were $40million or $50 million, what would that change, because from everything that’s known at the moment, it’s still a game changer.

Knowledge Distillation – What is it?

US export controls the last few years targeting Chinese companies’ ability to acquire and manufacture cutting-edge chips is a subject requiring exploration, as it calls into question what hardware DeepSeek used to develop its LLMs.

DeepSeek's latest models, DeepSeek R1 and R1-Zero, demonstrate reasoning capabilities comparable to the most advanced systems from OpenAI and Google. These models, like their counterparts, break down problems into smaller parts for more effective solutions, a process requiring extensive specialized training for reliable accuracy. DeepSeek's release includes details of their approach to developing the R1 models, claiming performance on par with OpenAI's groundbreaking o1 reasoning model on certain benchmarks. Their tactics include a more automated method for training problem-solving skills and a strategy for transferring knowledge from larger to smaller models.

This approach employed by DeepSeek’s engineers to train their LLMs is known as Knowledge distillation. It involves training a smaller, more efficient "student" model to emulate the behaviour of a larger, more capable "teacher" model by using the teacher's output distributions and ‘logits’ as training targets rather than raw ground-truth data.

The way this works is that the student model is trained to minimize the Kullback-Leibler divergence between its output distribution and the teacher's soft labels, often incorporating temperature scaling to control how much weight is placed on the teacher's uncertainty estimates.

This approach leverages the teacher model's learned representations and generalisation capabilities to create more compact models that maintain much of the original performance while requiring significantly fewer parameters and computational resources.

The distillation process can be enhanced through techniques like intermediate layer matching, selective distillation of high-confidence predictions, and careful balancing between distilled knowledge and direct supervision from ground-truth data.

While models like DeepSeek will ultimately benefit those who don’t want to spend too much on AI, reservations remain among some people about the origin of the model, and whether a “Chinese model” can be trusted with sensitive data. However, these reservations are somewhat unsubstantiated, and as things stand no one has presented any technical evidence that there is anything to be worried about. Further, since DeepSeek R1 is open source model (and has an MIT license), it is possible to use it offline in a closed system.

Thus, several prominent AI companies like Perplexity and Huggingface have already publicly stated that they will let their users use DeepSeek’s models, from within their systems.

In a June 2024 research paper, DeepSeek disclosed that its earlier model, DeepSeek-V2, was developed on clusters of Nvidia H800 chips, which are less powerful than the A100 Tensor Core GPU chips typically employed by leading US AI companies like OpenAI. However, some AI insiders speculate that DeepSeek might have used the computing power of more than the reported 10,000 Nvidia A100 chips the company said they had access to in developing and training the R1 model.

The full impact of this story and the unfolding of this narrative are yet to be determined.

0 Comments
Digital