Large AI models have made waves throughout the world. From mainstream models like ChatGPT and Sora, to industry-specific foundation models, AI can empower industry applications with speed and scale. Behind this boom of AI is an ever-skyrocketing amount of model parameters witnessed by AI service companies.
Especially in the development phase of single-mode large models, the number of parameters reaches 100 billion. Many AI companies simply cannot handle this growth with current infrastructure solutions. Conventional practices like divided, independent clusters and unreliable external storage are ineffective in handling and making use of huge data volumes. Moreover, multiple clusters do not improve aggregation performance.