Need for speed for Artificial Intelligence, Machine Learning and Data Analytics

6 min readApr 6, 2022

Artificial intelligence, machine learning, data analytics and other demanding data-driven workloads requires amazing computational capabilities from servers and the network. Today’s screaming fast multi-core processors provide a solution by accelerating computational processing. Consequently, all this places a huge demand on the network infrastructure. With an increase in the number of mission-critical workloads — such as AI/ML running on denser and faster datacenter infrastructure — there is a greater need for speed and efficiency from high performance networks.

If we look at what’s driving the need for increased bandwidth, we find growing densities within virtualized servers that have evolved on north-south and east-west traffic. In addition, a massive shift in machine-to-machine traffic has resulted in a major increase in required network bandwidth.

The arrival of faster storage in the form of solid state devices such as Flash and NVMe is having a similar effect. We find the need for increased bandwidth all around us as our lives increasingly intersect with technology. Leading this charge are Artificial Intelligence (AI) workloads, which necessitate solving complex computations that require fast and efficient data delivery of a vast amount of data sets. The use of lightweight protocols such as Remote Direct Memory Access (RDMA) can further help to complete the fast exchange of data between computing nodes, and streamlines the communication and delivery process.

It was only a few years ago when a majority of data centers started deploying 10GbE in volume, reducing the required training times. Today, we see a shift toward 25 & 100GbE, and adaption for 200/400GbE, to answer emerging bandwidth concerns. Overall, the Ethernet switch market is undergoing a transformation. Previously, Ethernet switching infrastructure growth was led by 10/40Gb, but the tide is turning in favor of 25 and 100Gb.

Analysts agree that soon 25 and 100Gb — and emerging 400Gb Ethernet speeds — are expected to surpass all other Ethernet solutions as the most deployed Ethernet bandwidth. This trend is driven by mounting demands for host-side bandwidth as data center densities increase, and pressure grows for switching capacities to keep pace. More than just bandwidth, 25 & 100GbE technology is helping to drive better cost efficiencies in capital and operating expenses, compared to legacy 10/40Gb, and enabling greater reliability and power usage for optimal data center efficiency and scalability.

Emergence of processing and analyzing of data at the Edge

Edge computing allows data from devices to be analyzed at the edge before being sent to the data center. Using Intelligent Edge technology can help maximize a business’s efficiency. Instead of sending data out to a central data center, analysis is performed at the location where the data is generated. Micro data centers at the Edge — with the integration of storage, compute and networking — delivers the speed and agility needed for processing the data closer to where it’s created.

According to Gartner, an estimated 75% of data will be processed outside traditional centralized data centers by 2025. Edge computing has transformed the way data is being handled, processed, and delivered from millions of devices around the world. Now, the availability of faster networking is enabling intelligent edge computing systems to accelerate the creation or support of real-time applications, such as video processing and analytics, factory automation, artificial intelligence and robotics.

The availability of faster servers and storage systems connected with high speed Ethernet networking allows for efficient intelligent processing of this data at the edge. This is where AI — applied at the Edge and enabled by a high speed networking — can transform how we collect, transport, process and analyze data with speed and agility. As data becomes more distributed, with billions of devices at the Edge generating data, it will require real time processing to effectively analyze and draw actionable insights.

There is a paradigm shift from collect-transport-store-analyze to collect-analyze-transport-store, in real time. The Intelligent Edge enables the processing power closer to where the data being created. It can further solve latency for real time applications. Streaming analytics can provide the required intelligence needed at the edge where data can be cleansed, normalized, and streamlined before being transported to the central data center or cloud for storage, post-processing, analytics, and archiving. Then, once data is sent to central data center from multiple edge locations, it can be further combined and correlated for trend analysis, anomaly detection, projections, predictions, and insights, using machine learning and AI.

GPUs improve the performance required for parallel processing

The rapid increase in the performance of graphics hardware, coupled with recent improvements in its programmability, has made graphics accelerators a compelling platform for computationally demanding tasks, in a wide variety of application domains. GPU based clusters are used to perform compute intensive tasks, like finite element computations, and computational fluids dynamics[1]. Since the GPUs provide high core count and floating point operations capability, high-speed networking is required to connect between the GPU platforms. This provides the needed throughput and the lowest latency that GPU to GPU communications require.

Using AI, organizations are developing, and putting into production, process and industry applications that automatically learn, discover, and make recommendations or predictions that can be used to set strategic goals and provide a competitive advantage. To accomplish these strategic goals, organizations require data scientists and analysts who are skilled in developing models and performing analytics on enterprise data. These data analysts also require specialized tools, applications, and compute resources to create complex models and analyze massive amounts of data.

GPUDirect for GPU-to-GPU communication

Figure 1. GPUDirect Remote Direct Memory Access (RDMA)

The main performance issue when deploying clusters that consist of multi-GPU nodes, involves the interaction between the GPUs, or the GPU-to-GPU communication model. Prior to the GPU-direct technology, any communication between GPUs had to involve the host CPU, and required buffer copies. CPU involvement in the GPU communications and the need for a buffer copy created bottlenecks in the system, slowing the data delivery between the GPUs. New GPUDirect technology enables GPUs to communicate faster by eliminating the need for CPU involvement in the communication loop and eliminating buffer copies. The result is increased overall system performance and efficiency that reduces the GPU-to-GPU communication time by 30%. This allows for scaling out to 10s, 100s, or even 1,000s of GPUs across hundreds of nodes. To accommodate this type of scaling and for it to work flawlessly, requires a network with little to no dropped packets, the lowest latency, and the best congestion management.

Need for speed for Artificial Intelligence, Machine Learning and Data Analytics

Emergence of processing and analyzing of data at the Edge

GPUs improve the performance required for parallel processing

GPUDirect for GPU-to-GPU communication

Written by Faisal Hanif