Infiniband versus Ethernet
Networking Technologies
Last updated
Copyright Continuum Labs - 2023
Networking Technologies
Last updated
InfiniBand and Ethernet are both networking technologies used for data communication, but they have different origins, architectures, and target applications.
InfiniBand is a high-performance, low-latency interconnect standard designed for connecting servers, storage systems, and other data centre components. It was developed specifically for high-performance computing (HPC) and other data-intensive applications.
Ethernet is a widely-used, general-purpose networking technology that connects devices in local area networks (LANs) and wide area networks (WANs). It was initially designed for office environments and has evolved to support a wide range of applications and speeds.
Both InfiniBand and Ethernet are technology standards, not specific products. They define the rules and specifications for communication between devices in a network.
Various vendors develop and manufacture products (such as network adapters, switches, and cables) that adhere to these standards.
The Components of Infiniband
Channel Adapters: These are like the on-ramps and off-ramps of the superhighway. They help computers and devices get on and off the InfiniBand network. There are two types: a. Host Channel Adapters (HCAs): These are used by things like servers or storage devices to connect to the InfiniBand network. b. Target Channel Adapters (TCAs): These are used by special devices, usually for storage.
Switches: These are like the traffic lights of the superhighway. They make sure data goes where it needs to go quickly and efficiently.
Routers: If you want to connect multiple InfiniBand networks together (like connecting multiple superhighways), you use routers. They help data move between the different networks.
Cables and Connectors: These are like the roads of the superhighway. They physically connect everything together.
The smallest complete InfiniBand network is called a "subnet," and you can connect multiple subnets together using routers to create a huge InfiniBand network. It's like connecting multiple cities with superhighways.
What makes InfiniBand special is that it's really fast (low-latency), can move a lot of data quickly (high-bandwidth), and is easy to manage (low-management cost).
It's perfect for connecting a lot of computers together (clustering), moving data between computers (communications), storing data (storage), and managing everything (management) - all in one network.
So, in a nutshell, InfiniBand is a super-fast, efficient way for computers and devices to talk to each other, making it easier to build big, powerful computer systems.
It is argued InfiniBand offers lower latency and higher throughput compared to Ethernet, making it more suitable for performance-critical applications like HPC and AI workloads.
This is why InfiniBand has historically been the go-to networking solution for HPC and AI workloads - low latency, high bandwidth, and deterministic performance characteristics.
One of the key reasons for this performance is its use of RDMA (Remote Direct Memory Access) instead of TCP, making it suited for large, performance-critical workloads.
Nonetheless. the Ultra Ethernet Consortium, led by companies like Broadcom, Cisco, and Intel, is pushing for the adoption of Ethernet in AI networking. They argue that modern Ethernet can offer similar, if not better, performance compared to InfiniBand at a lower cost.
It is true that Ethernet has made strides with technologies like RDMA over Converged Ethernet (RoCE), but it still lacks the same level of performance as InfiniBand. Studies have shown that to achieve comparable performance, Ethernet needs to operate at speeds 1.3 times faster than InfiniBand.
So while Ethernet has caught up with InfiniBand in terms of raw bandwidth, with both technologies offering 400 Gbps speeds, InfiniBand still maintains an edge in terms of latency and deterministic performance.
As highlighted, InfiniBand natively supports RDMA, which is one of the reasons it has historically been preferred for HPC and AI workloads.
Ethernet, on the other hand, has traditionally relied on TCP/IP for data transport, which involves more overhead and higher latency.
However, more recent Ethernet standards, such as RoCE (RDMA over Converged Ethernet), have added support for RDMA over Ethernet networks, allowing Ethernet to achieve lower latency and higher throughput than traditional TCP/IP-based Ethernet.
In the context of networking, reliability refers to a network's ability to consistently deliver data without errors or loss. Two key aspects of reliability are fabric behavior and flow control.
Fabric behavior describes the overall structure and performance of a network.
In a reliable fabric, data is delivered consistently and without loss, even in the presence of network congestion or device failures.
For example, InfiniBand provides a lossless fabric, ensuring that no data is lost during transmission, regardless of network conditions. This is achieved through end-to-end flow control, which prevents data loss by signalling the sender to slow down or stop sending data when the receiver is unable to process incoming data at the same rate.
On the other hand, Ethernet, in its basic form, is a best-effort delivery system, meaning that it will attempt to deliver data but does not guarantee successful delivery.
However, recent Ethernet standards, such as priority flow control (PFC), have added support for lossless behavior, allowing Ethernet to provide more reliable data delivery, similar to InfiniBand.
InfiniBand provides a lossless fabric with built-in flow control and congestion management mechanisms. It guarantees reliable data delivery and maintains a consistent level of performance, even under heavy load.
Scalability refers to a network's ability to grow and accommodate increasing amounts of data and devices without compromising performance or reliability.
InfiniBand is designed to scale exceptionally well, thanks to its switched fabric architecture, allowing for efficient scaling of AI clusters. It supports a large number of nodes (up to 48,000) and enables the creation of high-performance, low-latency interconnects between GPUs, which is essential for distributed AI training and inference.
Features like the Subnet Manager (SM) and forwarding path calculation also add value.
These features allow InfiniBand to support tens of thousands of nodes without the need for complex network configurations or protocols that Ethernet faces - like or .
The argument is traditional Ethernet has limited scalability due to its reliance on and the need for complex protocols like spanning tree to prevent network loops. This can lead to performance degradation and reduced efficiency as the network grows.
However, again - some argue recent advancements in Ethernet have addressed these limitations and improved its scalability.
Technologies like VXLAN (Virtual Extensible LAN) and SDN (software-defined networking) have been introduced to tackle these challenges.
VXLAN allows Ethernet networks to scale to millions of nodes by encapsulating Ethernet frames within UDP packets, while SDN separates the network control plane from the data plane, enabling more flexible and scalable network configuration and management.
Nonetheless, there are networking experts out there, that remain adamant the packet-based nature of Ethernet can lead to congestion and performance degradation as the cluster size grows.
The consensus seems that despite these advancements, InfiniBand still maintains an advantage in terms of scalability, particularly in large-scale, high-performance computing environments where low latency and high bandwidth are critical.
One of the challenges in replacing InfiniBand with Ethernet is the familiarity and expertise that HPC professionals have with InfiniBand. The InfiniBand ecosystem is well-established and purpose-built for HPC and AI workloads.
The following
Message Passing Interface (MPI): This is a standardized and portable message-passing system designed to function on a variety of parallel computing architectures. MPI libraries like Open MPI are often optimised for InfiniBand to enhance communication speeds and efficiency in cluster environments.
NVIDIA Collective Communications Library (NCCL): Optimised for multi-GPU and multi-node communication, NCCL leverages InfiniBand's high throughput and low latency characteristics to accelerate training in deep learning environments.
RDMA (Remote Direct Memory Access) libraries: These allow direct memory access from the memory of one computer into that of another without involving either one's operating system. This enables high-throughput, low-latency networking, which is critical for performance in large-scale computing environments.
GPUDirect: This suite of technologies from NVIDIA provides various methods for direct device-to-device communication via PCIe and InfiniBand interconnects, enhancing data transfer speeds and reducing latency.
Intel’s Performance Scaled Messaging 2 (PSM2): This protocol is designed to exploit the features of high-performance networks like InfiniBand in large-scale HPC clusters, providing reliable transport and high-bandwidth capabilities.
These tools and libraries are critical for developers working in HPC and AI, as they provide necessary functionalities that harness the full potential of InfiniBand's network capabilities.
In saying this, Ethernet has a much larger installed base and a wider ecosystem of compatible devices and software due to its long history and widespread adoption - but not in AI and HPC.
Ethernet's main advantage lies in its flexibility and ubiquity. Most data centres and cloud providers already use Ethernet extensively, and having a single networking technology across the entire infrastructure could simplify management and reduce costs. So theoretically - it should not be much of a 'switching cost' to move to Ethernet if it is offers the same performance as Infiniband.
In summary, while InfiniBand offers superior performance and reliability for AI and HPC workloads, Ethernet's widespread adoption, extensive ecosystem, and lower costs make it the primary choice for most general-purpose networking applications.
Mid 2023, Ram Nalaga's from Broadcom argued that Ethernet is the technology of choice for GPU clusters. He argues Ethernet is the only networking technology needed, even for building large-scale GPU clusters with 1,000 or more GPUs.
Why?
Ethernet has proven its ability to adapt and meet new requirements over multiple decades.
Recent requirements for clustered GPUs include low latencies, controlled tail latency, and the ability to build large topologies without causing idle time on GPUs.
Ethernet has evolved to provide capabilities such as losslessness and congestion management, making it suitable for large-scale GPU clusters.
Ethernet is ubiquitous, with 600 million ports shipped and sold annually, leading to rapid innovation and economies of scale that are not available with alternative technologies like InfiniBand.
The scale of Ethernet and the presence of multiple players in the market drive innovation and provide economic advantages.
Ethernet offers both the technical capabilities and the economics needed to build large-scale AI/ML clusters.
In summary, it is argued Ethernet's adaptability, recent advancements, ubiquity, and economic advantages make it the best choice for building large-scale GPU clusters, and that alternative technologies like InfiniBand are not necessary.
While Ethernet is making inroads, InfiniBand's entrenched position and purpose-built design make it a formidable incumbent.
The ultimate outcome will depend on factors such as the rate of Ethernet's performance improvements, the willingness of HPC professionals to adopt new technologies, and the strategic decisions made by major vendors and cloud providers.
But the decision may not be up to the professionals, their are numerous reasons why you would consider Ethernet for a new network build. InfiniBand, despite its many advantages for high-performance applications, faces several significant shortcomings that influence its broader adoption:
High Cost: InfiniBand's network components, like cards and switches, are substantially more expensive than Ethernet equivalents, making it less economically viable for many sectors.
Elevated O&M Expenses: Operating and maintaining an InfiniBand network requires specialised skills due to its unique infrastructure, which can lead to higher operational costs and challenges in finding qualified personnel.
Vendor Lock-in: The use of proprietary protocols in InfiniBand equipment restricts interoperability with other technologies and can lead to dependency on specific vendors.
Long Lead Times: Delays in the availability of InfiniBand components can pose risks to project timelines and scalability.
Slow Upgrade Cycle: Dependence on vendor-specific upgrade cycles can slow down network improvements, affecting overall network performance and adaptability.
While there have been advancements in Ethernet technology and the many 'shortcomings' of Infiniband, there are are a range of technical and practical reasons why InfiniBand remains the preferred choice for high-performance AI workloads.
In the eyes of HPC technicians, Ethernet has not proven its performance credentials yet
While Ethernet has made improvements in terms of bandwidth and switch capacity, it still faces challenges when it comes to supporting the massive scale required by AI workloads.
InfiniBand has a well-established ecosystem in the high-performance computing (HPC) and AI domains. Many AI frameworks, libraries, and tools are optimised for InfiniBand
Transitioning to Ethernet would require significant effort in terms of software adaptation and optimisation. Existing AI pipelines and workflows would need to be modified to work efficiently with Ethernet, which could be a time-consuming and costly process.
InfiniBand provides a lossless fabric with built-in flow control and congestion management mechanisms. It guarantees reliable data delivery and maintains a consistent level of performance, even under heavy load. Despite Ethernet's advancement in this area, the experts are still not convinced.
While Ethernet is generally considered more cost-effective than InfiniBand, the cost difference becomes less significant when considering the total cost of ownership (TCO) for AI infrastructures. The higher performance and efficiency of InfiniBand can lead to better resource utilisation and reduced overall costs.
In conclusion, while Ethernet has made significant advancements, InfiniBand remains the preferred choice for AI computing due to its superior performance, scalability, ecosystem compatibility, reliability, and QoS guarantees.
In conclusion, InfiniBand and Ethernet are both powerful networking technologies with their own strengths and weaknesses.
While InfiniBand has been the dominant choice for AI and HPC workloads due to its superior performance and reliability, Ethernet is rapidly closing the gap with recent advancements in speed, features, and scalability.
As the demand for high-performance networking in AI and HPC continues to grow, the competition between these two technologies will likely intensify.
The ultimate winner will depend on a complex interplay of factors, including technological advancements, industry adoption, and strategic decisions by key players.
Regardless of the outcome, one thing is certain: the future of high-performance networking will be shaped by the ongoing battle between InfiniBand and Ethernet, with significant implications for the growth and development of AI and HPC applications in the years to come.