A new AI assitant from China has Silicon Valley talking

Silicon Valley has been rocked by a little lab in China.

The U.S. tech industry has been discussing and debating what the abrupt arrival of an advanced AI helper from DeepSeek, a little-known firm in the Chinese city of Hangzhou, implies about the larger AI development race.

The AI models that drive DeepSeek’s assistant are already outperforming the best models in the United States, despite the fact that they were created with a fraction of the resources, according to the business. The assistant recently ranked number one in the Apple App Store.

A week ago, DeepSeek published R1, their most recent large language model. R1 is already outperforming a number of other models, including as Google’s Gemini 2.0 Flash, Anthropic’s Claude 3.5 Sonnet, Meta’s Llama 3.3-70B, and OpenAI’s GPT-4o. It is ranked second only to OpenAI’s o1 model in the Artificial Analysis Quality Index, a widely used independent AI analysis ranking.

Entrepreneur Marc Andreessen, who coauthored Mosaic, one of the first web browsers ever created, wrote on XSunday that DeepSeek R1 is AI’s Sputnik moment, drawing comparisons to the space race between the US and the USSR and the event that made the US realize that its technological prowess was not unchallenged.

One of R1’s primary strengths is its capacity to use chain-of-thought reasoning, which divides difficult tasks into manageable chunks, to explain its reasoning. This technique lets the model go back and make changes to previous steps, simulating human thought, while still letting users understand its reasoning.

Microsoft CEO Satya Nadella, whose business is one of OpenAI’s largest investors, described DeepSeek’s new approach as “very impressive” during last week’s World Economic Forum in Switzerland. He also stated that he thinks we should take the advancements coming out of China very seriously.

R1 and O1 belong to a new class of reasoning models designed to tackle more challenging issues than earlier AI model generations. However, in contrast to OpenAI’s o1, DeepSeek’s R1 is open weight and free to use, so anybody may examine and replicate its design.

R1 was based on DeepSeek’s prior model V3, which had also outperformed Alibaba’s Qwen2.5-72B, China’s previous top AI model, GPT-4o, and Llama 3.3-70B. When V3 was released in late December, its performance was comparable to that of Claude 3.5 Sonnet.

R1’s development claims from DeepSeek are part of what makes it so amazing.

According to a DeepSeektechnical report, R1 was built in just two months and cost less than $6 million, despite the fact that major US tech companies still spend billions of dollars annually on AI. Additionally, DeepSeek was forced to use less powerful chips to create its models due to U.S. export laws that restricted access to the top AI computer chips.

In American tech circles, it has sparked a contentious discussion about how a small Chinese company managed to so significantly outperform the most well-funded AI companies. And what does this signify for the future of the field?

In a Threads post, Yann LeCun, chief AI scientist at Meta, stated that this development is proof that open source models are outperforming proprietary ones rather than indicating that China is overtaking the US in AI. He went on to say that other open-weight models, such as those of Meta’s, helped DeepSeek.

They developed fresh concepts and expanded upon the work of others. Everyone can benefit from their work because it is open source and published, LeCun added. That’s the strength of open source and open research.

(While several businesses, such as DeepSeek and Meta, assert that their AI models are open source, they haven’t really made their training data publicly available.)

After some users noted that V3 would sometimes confuse itself with ChatGPT, OpenAI CEO Sam Altman also seemed to poke fun at DeepSeek last month. Altman wrote on X that it is (relatively) easy to replicate anything that you know works, one day after V3 was released. Trying something new, hazardous, and challenging when you don’t sure if it will work is really challenging.

Unsupported allegations that DeepSeek’s success is a Chinese government psyop or psychological operation were made by several online figures, raising doubts about the small team’s capacity to defeat all of the world’s leading scholars as a side project.

This weekend, a number of people responded to these accusations, including Soumith Chintala, cofounder of PyTorch, the machine learning toolkit created by Meta AI.

Despite deepseek open-sourcing and producing some of the most meticulous papers ever, I find it amusing that individuals are using it to cope by spreading strange conspiracy theories, Chintalaposted on X. Read, copy, and compete. Avoid being sour; it only makes you appear inept.

Others in the investment and technology sectors echoed the kudos and expressed enthusiasm for the potential ramifications of DeepSeek’s performance.

This explains the humor of the DeepSeek phenomenon. According to macroeconomist Philip Pilkington, on X, a group of swindlers have been peddling AI secret sauce for years—spooky mystery juice that will never be fully explained. The circus tent burned down after a group of young men devised a decent algorithm and publicized it.

In a similar vein, former Github CEO Nat Friedman wrote: “The Deepseek team is obviously really good.” There are many skilled engineers in China. Cope is the other take. Apologies.

The models from DeepSeek boast bilingualism, performing exceptionally well in both Chinese and English. However, they seem to be subject to censorship or certain political inclinations toward subjects that are considered sensitive in China.

DeepSeek’s R1 occasionally responds that the topic is beyond of its current purview when questioned about the sovereignty of Taiwan, an autonomous island democracy that Beijing asserts is its territory. At other points, the model states that Taiwan is an unalienable part of China’s territory and adds, “We are committed to achieving the complete reunification of the motherland through peaceful means and firmly oppose any form of Taiwan independence separatist activities.”

Following DeepSeek’s most recent models, other Chinese tech companies are already introducing new competitors in the race for supremacy in AI.

The latest Qwen2.5-1M model, an improvement over the Qwen2.5-72B, was unveiled by Alibaba on Sunday.

On Saturday, Moonshot AI, a Beijing-based firm that owns Kimi AI, announced the release of its most recent multimodal reasoning model, Kimi k1.5, which it claims is on par with OpenAI’s o1.

Leave a Reply Cancel reply