Since 2024, the artificial intelligence industry has undergone a phased transformation.
According to statistics from the Economic Observer, as of October 9, 2024, the Cyberspace Administration has approved a total of 188 generative artificial intelligence filings, meaning that 188 large models are capable of going online to provide generative AI services. Over 30% of the large models have not further disclosed their progress after passing the filing; only about 10% of the large models are still accelerating the training of their models; nearly half of the large models have turned to the development of AI applications.
This contrasts sharply with the "hundred-model war" over the past year.
This change has also been transmitted to the upstream computing power market. During the China Computing Power Conference held from September 27 to 29, 2024, the Economic Observer learned from computing power operators, builders, and chip suppliers that the supply and demand relationship of domestic computing power is no longer tense.
Since 2022, internet companies and AI enterprises have been competing to purchase computing power equipment, with state-owned enterprises represented by operators investing heavily in the construction of computing power centers. AI servers on the supply chain are often out of stock, computing power GPUs are hard to find, and prices have doubled in a few months.
Starting from 2024, the number of enterprises purchasing and renting computing power equipment has decreased; since the second half of 2024, there has been a certain degree of vacancy in the racks of computing power centers; the price of Nvidia's high-performance accelerator card A100, which was once speculated to 150,000 yuan per piece, has stopped rising, and another graphics card 4090 with relatively lower performance configuration has been frequently purchased by computing power enterprises for use as computing power acceleration chips.
A China Telecom official said that computing power has entered a buyer's market.
Differentiation of large model enterprises
The users of computing power - large model enterprises are undergoing differentiation.
As of the end of August 2024, the Cyberspace Administration has approved a total of 188 generative artificial intelligence filings. However, according to statistics from the Economic Observer, 60 (32%) large models have not disclosed any progress in improving the parameter magnitude or application landing after passing the filing, and 9 (5%) large models have updated their versions without specifying changes in parameter magnitude and pre-training data volume.The vast majority of these models come from small and medium-sized enterprises or institutions, such as multiple open-source community projects from companies like Shenyan Technology and Lingxin Intelligence, which have not been updated in nearly a year. Among these 188 large models, 22 models are still accelerating training, updating versions this year, and increasing the amount of parameters and pre-trained data. These models mainly come from large Internet companies, operators, and AI large model companies, among which only 4 companies have released large models with trillion-level parameters, and have significantly increased the amount of pre-trained data, including Tencent, China Telecom, and two large model startups, MiniMax and Step Star.
These companies have significantly increased their demand for computing power used to train large models. Since 2024, Tencent and China Telecom have built clusters with tens of thousands of cards, and MiniMax was one of the first to settle in the China Telecom Shanghai Lingang domestic cluster with tens of thousands of cards in March. Another 18 models have parameter amounts ranging from tens of billions to hundreds of billions, with relatively limited increases in the amount of parameters and pre-trained data. These models come from companies such as Baidu, Alibaba, iFLYTEK, SenseTime, and Huawei.
These manufacturers are also accelerating the update of basic models. Alibaba released the Tongyi Qianwen 2.5 version, with the parameter amount reaching the trillion level, which is a major update after the 2.0 version in October last year. In the first half of 2024, SenseTime will promote the "Daily New" large model to a scale of 600 billion parameters. In contrast, Baidu, which updated faster last year, has slowed down, and its Wenxin 4.0 large model has not released a new version since October last year.
A Baidu technical person told the Economic Observer that Baidu's basic models have been undergoing cutting-edge AI training, but the results have not been announced yet. "Big factories will definitely not give up training models, otherwise they will completely miss the cake."
According to statistics from the Economic Observer, nearly 50% of the large models that have passed the record have turned to AI applications this year. Most models have been applied to existing applications or have launched new applications. For example, after the 360 browser was connected to the 360 Zhiniao large model, it added AI search functions, which can generate in-depth answers based on questions and conduct multiple rounds of questioning; Kingsoft Office added AI-generated PPT and copywriting functions in the WPS Office suite.
These models are usually used for actual tasks, that is, from the training stage to the inference stage, the required computing power will be significantly reduced.A major model manufacturer shifted its focus to industry applications after its base model reached the scale of tens of billions of parameters, opting not to expand the parameter count to avoid excessive costs in later use, thus also not requiring excessive computational power. The manufacturer's representatives believe that larger models are not necessarily better, as a larger parameter count implies higher usage costs. Models with hundreds of billions or trillions of parameters are mainly aimed at achieving high rankings on benchmarks.
Wu Lianfeng, Vice President and Chief Analyst of IDC China, told the Economic Observer that since the "hundred-model war" began over a year ago, the market has shown signs of segmentation: a few models continue to follow the path of general-purpose large models, moving towards the scale of hundreds of billions or trillions of parameters; others have shifted from basic model development to application development, and a batch of tool-based applications based on large model technology have emerged in the market. These applications show obvious homogenization, and no widely used hit cases have emerged.
According to data released by the AI Product List, a third-party data service provider, in September, seven out of the top ten global AI applications came from the United States, and two from China—Baidu's AI Intelligent Answering and 360 AI Search. The monthly visits of the US AI application ChatGPT reached 3.23 billion, while Baidu's AI Intelligent Answering had a monthly visit volume about one-eighth of ChatGPT, and 360 AI Search had less than one-tenth of ChatGPT's visits.
Computational power becomes a buyer's market
The large model market is closely related to the computational power market. According to the principle of the scale law, if one wants to train a larger large model, one must first increase the parameter count or the amount of pre-training data. If the model's parameter count increases by ten times, the required computational power may increase by a hundred times or more.
Currently, some large models are still at the training stage, while others have shifted to application and actual delivery stages but have not yet been widely used. From the perspective of demand, the demand for training computational power from related companies has significantly decreased, and the demand for inference computational power has not seen explosive growth. From the supply side, China has built and is building more than 250 intelligent computing centers, and the continuous supply of computational power has not stopped.
Building a computational power facility usually requires cooperation between investors, operators, and builders. Investors are mainly local governments and central state-owned enterprises; operators include telecommunications operators as well as internet companies, Huawei, and a few traditional enterprises such as real estate companies that have crossed over to participate; builders usually include server providers and GPU chip providers.
Super Fusion is a supplier that provides servers and computational power services, with customers mainly from finance, internet, and power companies. This company has felt a shift in the market in recent months. Last year, internet manufacturers were scrambling to buy servers, with buyers' demands being very urgent. As long as there was confirmation of available stock, orders could be placed, and the negotiation process was very quick, sometimes even without the need for negotiation. Since 2024, there have been fewer customers coming to purchase, with longer inquiry and negotiation times, and buyers pay more attention to the cost-performance ratio and technical specifications of the products.
In addition, there has been a certain degree of vacancy in intelligent computing centers. China Telecom has put into operation 10 intelligent computing centers across the country. The aforementioned China Telecom personnel found that many computational power centers are not fully utilized, with many racks left vacant.According to data from the China Academy of Information and Communications Technology, the number of racks in China's computing power facilities only increased by 2.5% in the first half of 2024, compared to a 25% increase for the entire year of 2023. The number of racks in computing power facilities indirectly reflects the actual scale of computing power.
This year's "Government Work Report" proposed to moderately advance the construction of digital infrastructure, accelerate the formation of a national integrated computing power system, and cultivate the computing power industry ecosystem. The current scale of computing power construction in many regions is planned based on the computing power demand for the next 2-3 years. At the stage when model computing has not yet exploded, it is inevitable that the utilization rate will be insufficient.
The aforementioned China Telecom personnel told the Economic Observer that the current computing power has become a buyer's market, and users have more bargaining power over the price of computing power. Investors' attitudes are also more cautious and rational, and they have begun to put forward corresponding return requirements and assessments for operators. On the one hand, operators are turning to purchase computing power equipment with higher cost-performance ratios; on the other hand, they are adopting more flexible strategies, such as building computing power on demand. They have arranged thousands of racks in production capacity planning, and will only truly purchase computing power equipment and put it into operation after receiving clear user needs and orders. "As operators, we can no longer invest without considering the cost like before. To recover the cost as soon as possible, we must consider the cost input and investment return cycle," the China Telecom personnel said.
The industry's procurement of computing power chips also pays more attention to cost-effectiveness. Since 2024, domestic demand for NVIDIA 4090 graphics cards has been increasing. Currently, the price of this top-tier gaming graphics card has risen from 12,000 yuan at the beginning of the year to 18,000 yuan.
An NVIDIA agent told the Economic Observer that since the second half of the year, the turnover rate of 4090 graphics cards has been very high, and they were sold out within 3 days of arrival. In contrast, the unit price of A100 has stopped rising and remains unchanged at 150,000 yuan, but the turnover rate is declining.
Both 4090 and A100 are GPU chips. In NVIDIA's product line, 4090 is a high-end gaming graphics card for players, while A100 is a high-performance acceleration card sold to computing power centers. Although 4090 is weaker than A100 in some performance aspects, it can also meet the inference tasks of some models. The most important thing is that its price is one-tenth of the A series and H series acceleration cards.
The vast majority of buyers for this wave of 4090 graphics cards are enterprises, most of which are the builders or technology providers of intelligent computing centers, using affordable graphics cards to replace high-priced A100 or H100 chips.
SenseTime Technology is promoting the entry of large models into the terminal side and delivering them to customers. However, as the model enters the commercial closed-loop stage, the company's demand for computing power is also changing, including the use of intelligent computing power scheduling and other technologies to improve the efficiency of computing power. Tian Feng, Dean of SenseTime Technology's Intelligent Industry Research Institute, said that in the past, the company purchased computing power without considering the cost, but now it is more pursuing the cost-effectiveness of computing power.
post your comment