• March 6, 2024

The bottleneck of liquid-cooled AI servers

As the shipment of Blackwell chips increases, the willingness of customers to adopt liquid cooling will also be enhanced.

Industry insiders say that the supply of Universal Quick Disconnect (UQD) for liquid cooling solutions has become tight, which may become the main bottleneck for the growth of liquid cooling in AI servers.

Server ODM manufacturers point out that Nvidia's Blackwell AI chips, including B100 and B200, will begin shipping this year, but the GB200 solution will not start mass production until the end of 2024 or 2025.

Currently, most customers of B100 and B200 still use air cooling heat dissipation design, but according to the feedback from ODM manufacturers, the penetration rate of liquid cooling continues to increase. It is estimated that with the increase in the shipment of Blackwell chips, the willingness of customers to adopt liquid cooling will also increase.

Expanding production capacity to cope with the increase in liquid cooling demand

Related companies are expanding production to welcome the new era of liquid cooling. The heat dissipation module manufacturer, Jingchen Technology, plans to increase the monthly production capacity of water-cooled plates from 30,000 to 300,000, a tenfold increase.Auras has established a new factory in Thailand to respond to customers' geopolitical concerns and demands. The factory is expected to start mass production in the third quarter. In addition to expanding the local production capacity of cold plates, Auras also plans to produce cooling distribution units (CDU) and cooling liquid distribution manifolds (CDM) locally, with a planned monthly production capacity of about 2,000-3,000 sets.

Heat dissipation module factory AVC pointed out in a recent financial report conference call that the monthly production capacity of cold plate modules in its Chinese and Vietnamese factories is about 115,000 units. If calculated by cold plates, the monthly output is about 420,000 units. AVC plans to expand its production capacity by 50% before the end of the year.

AVC also plans to expand the monthly production capacity of CDU to 1,000 units and the monthly production capacity of CDM to 30,000 sets. AVC emphasizes that this planned production capacity can be flexibly adjusted according to customer order requirements.

Thermal technology company Gao Li Heat Treatment is expanding the production capacity of its Zhongkang factory in Taiwan due to increased customer demand for liquid cooling capacity. It is expected that the monthly production capacity of CDM will increase from 1,000 units to 2,000 units by the end of the third quarter, and to 4,000 units by the end of the year. The annual production capacity of CDU in the industry will also reach 2,000 units by the end of the year.

The above-mentioned manufacturers all have high expectations for liquid cooling demand, mainly due to the computing efficiency and data center PUE standards of China and the European Union. But the most important factor is that Nvidia has lifted the self-restriction on the heat dissipation specifications of chip manufacturers.

The rapid growth of liquid cooling has led to a shortage of UQD

As everyone eagerly looks forward to the advent of the liquid cooling era, UQD has become the biggest bottleneck for growth. Heat dissipation module factories point out that the supply of UQD is tight recently. Although the current market share of liquid cooling is only in the single digits, if it rises to double digits in the future, UQD may be hard to come by.

UQD suppliers mostly come from Europe and America, such as American giant Parker Hannifin and CPC, Swiss Staubli International, Danish Danfoss, Swedish Cejen, etc. Taiwan's connector component giant Lotes is also actively entering the market and has started to send samples.

Anbo Technology Chairman Liang Zhijian pointed out that since liquid cooling values the most is to avoid leakage, and UQD is the most likely component to leak, the supply of UQD is the most tense among liquid cooling components. This is not only a technical issue, but related manufacturers also have patent protection. Anbo Technology is studying how to break through these patent barriers.Industry insiders have stated that UQD manufacturers are protected by patents, but they also need to go through multiple layers of verification, including OCP certification and client-side validation, which are time-consuming and labor-intensive. Coupled with the fact that existing European and American manufacturers have no intention of expanding production capacity, this will become the main bottleneck for the rapid development of liquid cooling technology.

Supermicro is one of the fastest-growing liquid cooling manufacturers. Its founder and CEO, Charles Liang, pointed out that over the past 30 years, liquid cooling has only accounted for 1% of the server market. However, he estimates that the penetration rate will soar to 30% by 2025.

Liquid cooling emerges as a potential solution as the boom in artificial intelligence puts pressure on the power grid. The rapid development of generative artificial intelligence has driven an unprecedented expansion of data centers, raising concerns about its impact on the power grid. These power-hungry facilities could lead to power outages and increased energy costs.

According to estimates by the Electric Power Research Institute, by 2030, data centers may consume 9% of the electricity in the United States, double the current amount. The electricity consumption of a large data center is equivalent to that of hundreds of thousands of households.

The increasing power demand of artificial intelligence is particularly worrying. Early artificial intelligence models consume ten times the electricity of a Google search, and newer chips have an even higher demand for energy. Experts warn that the future development of artificial intelligence may be limited by our ability to generate enough electricity.

Some countries face severe challenges. For example, by 2026, Ireland may use 30% of its electricity for data centers. In the United States, the electricity consumption of data centers is concentrated in 15 states, with Texas and Virginia having the most. The situation in California is critical, with the expected construction of new data centers potentially generating an electricity demand that exceeds the power output of nuclear power plants.

Data centers: huge energy consumption and the rise of liquid cooling.The computational demands of artificial intelligence are driving up server temperatures and carbon emissions, leading to a significant increase in the demand for cooling systems. Cooling systems account for 40% of the total electricity consumption of data centers, making them the second-largest source of power consumption after the servers themselves.

The global server cooling market is expected to grow from $20 billion in 2024 to $90 billion by 2027. The liquid cooling systems for data centers are projected to increase from 1% to 22%, with the market value growing from $317 million to $7.8 billion over the next three years.

Liquid cooling solutions that use water or coolants to cool servers are becoming increasingly popular. New technologies include immersion cooling (submerging entire server racks in non-conductive liquids) and direct liquid cooling (circulating water around the servers). Although more expensive than air cooling systems at present, liquid cooling can reduce the power consumption of data centers by 10% or more.

Research firm Global Market Insights forecasts that the global data center liquid cooling market will grow from $2.1 billion in 2022 to $12.2 billion by 2032. A survey by the Uptime Institute found that 16% of data center managers believe liquid cooling will become the primary cooling method in data centers within 1-3 years, while 41% think it will take 4-6 years. As a result, hybrid cooling methods are more likely to emerge in the short term.

Upsite Technologies, a leader in data center air cooling system management, points out that while technology is constantly advancing, it is unlikely to achieve 100% liquid-cooled data centers in the short term, as liquid cooling equipment still requires air cooling for heat dissipation.

Although liquid cooling is also more efficient, it is difficult to implement on a large scale and requires substantial upfront investment. Air cooling is less expensive but less efficient. Therefore, hybrid cooling facilities are becoming increasingly popular to maximize the advantages of both liquid and air cooling.

Data Center Energy Crisis Sparks Calls for Urgent Action

Due to the environmental impact of data centers, they are receiving increasing attention. Governments around the world are implementing regulations to control their energy consumption and carbon footprint. China's "Green Data Center" guidelines and similar initiatives in Germany, Singapore, and Japan are examples of this trend.

Industry experts such as Schneider Electric emphasize the need for comprehensive environmental indicators to assess the sustainability of data centers. This includes factors beyond energy use, such as water resources and waste generation.The U.S. government is putting pressure on large technology companies to invest in clean energy, recognizing the significant environmental impact of the growing power demand for generating artificial intelligence.

Finding the Right Power Source: Data Centers and the Energy Challenge

Data centers require a diversified energy mix to balance reliability and sustainability to meet the ever-increasing demand.

Renewable energy sources such as solar and wind are attractive due to their low carbon footprint. However, their dependence on weather conditions can lead to unstable output, making them unsuitable as the sole power source for data centers. Building redundant facilities to compensate for this inconsistency may be necessary, but it is costly.

Nuclear power emerges as a potential solution. Traditional nuclear power plants provide reliable baseload power, generating stable electricity that is crucial for data center operations. Moreover, the global nuclear power market is expected to achieve steady growth over the next decade.

Innovations in the nuclear energy sector offer more promising possibilities. Small Modular Reactors (SMRs) are being developed as smaller, safer, and more scalable alternatives to traditional nuclear power plants. Although still in the research and development stage, SMRs have the potential to be directly deployed at data centers, providing dedicated clean energy.

However, the widespread application of SMRs faces significant barriers. Regulatory and manufacturing challenges may delay their commercial deployment for several years. The U.S. government is actively exploring solutions, including collaborating with tech giants to reduce costs and streamline processes.

Another way to reduce the energy demand of data centers is to optimize artificial intelligence workloads. By shifting some AI tasks from the cloud to local devices with smaller, less resource-intensive AI models, overall energy consumption can be reduced.The future development of data centers requires a multifaceted approach. It is crucial to adopt a diversified energy mix, including reliable sources such as nuclear energy, while actively seeking renewable and innovative solutions like Small Modular Reactors (SMRs). Additionally, optimizing AI workloads on local devices can further promote the sustainable development of data centers.

*Statement: This article is the original creation of the author. The content of the article represents the author's personal views. Our reposting is solely for sharing and discussion, and does not represent our approval or agreement. If there are any objections, please contact the backend.

Comment