Had one of these fine Mellanox SN2700 32 port 100gb switches appear to go bad. All of the port lights were on, solid, the warning lights were on. Could not ping, nor SSH into it, a direct console cable gave no joy. Even rebooting it, there was nothing to display during the boot process.
Many people say the SSD's tend to fail. Fair enough, take it apart, and put it's mSATA SSD into an adapter, then put that into a PC to where I can run some diagnostics. Using tools from PartedMagic I found signs of wear, but no red flags. Then a replacement mSATA SSD was put into the switch to see if I could just load the OS onto it from scratch. No joy, still no post, nothing on the console cable, LEDs were the same status.
Ok, must be a bad system board or something, a replacement switch is obtained. Rack it, assign a management IP, erase the old configuration. Might as well update the firmware while we have the chance. Nvidia doesn't just give the firmware, OK so create an account, attempt to log on, get a warning about my account isn't valid....joy! More searching yields a Lenovo site that firmware for the same switch...OK great. I did several rounds of updates, doing it in steps as to not jump too many versions. v3.6->3.9->3.10
Fortunately updating firmware with the WebUI is easy!
Time to reconfigure to match the new environment. There are several 100gb->25gb*4 connections as well as 40gb->10gb*4 connections. Since I am not a Mellanox CLI expert, I get help from ChatGTP. After much trial and error, none of the commands work. ChatGTP tells me that it is because the generic "onie_x86" code for the entire SN2xxx model line is loaded and not the specific one for the SN2700. Well shoot. I can't find firmware specifically for this model, only the product line. I switch to Claude, and it gives me different commands, most of them seem to work. I was able to break the 100gb port into multiple sub interfaces. Also one must reboot the switch for the changes to take effect....very annoying! Also to do most any changes to the port; that port must be disabled first (just issue a shut command).
switch (config) # interface ethernet 1/1 module-type qsfp-split-4
switch (config) # interface ethernet 1/1 breakout 4x10G <--did not work
HOWEVER, I notice other ports are now offline. IE in this case, I had port #3 split into four ports. Now port 4 is disabled. Researching this, apparently it is a common thing as the ASSIC which is assigned to two ports can't handle the multiple configurations.
Here is what Claude had to say...FYI:
This is a fundamental hardware limitation. Here's why:
The Root Cause: Shared SerDes Lanes
The SN2700's ASIC (Spectrum-1) connects to QSFP28 ports using SerDes (Serializer/Deserializer) lanes. Each 100GbE port uses 4 lanes × 25Gbps = 100Gbps.
When you do a 4×10G breakout, those same 4 lanes are redistributed — and the neighboring port shares the same lane group from the ASIC's perspective.
How the ASIC Groups Ports
The Spectrum-1 chip groups ports into pairs that share a single 4-lane SerDes block:
Physical Port Pair | SerDes Block
----------------------|---------------
Port 1 + Port 2 | Block A (8 lanes total)
Port 3 + Port 4 | Block B
Port 5 + Port 6 | Block C
...and so onWhen port 1 is split into 4×10G, it consumes all 4 lanes of its side of the block — but the block's bandwidth/lane allocation is now "locked" in split mode, making port 2 unavailable because:
- The ASIC can no longer independently assign lanes to port 2
- The physical QSFP cage for port 2 has no SerDes lanes left to drive it
- The port 2 cage effectively becomes the physical connector for sub-ports 3 and 4 of the breakout
Think of It This Way
BEFORE BREAKOUT:
[Port 1 - 100G] [Port 2 - 100G]
Lane 0,1,2,3 Lane 4,5,6,7
AFTER 4x10G BREAKOUT on Port 1:
[1/1][1/2][1/3][1/4] [Port 2 = DEAD]
Lane: 0 1 2 3 Lanes consumedThe breakout cable's 4 individual fibers/conductors actually map across both physical QSFP cages on the faceplate.
Key Takeaway
| Factor | Detail |
|---|---|
| Lane sharing | Every 2 adjacent ports share one SerDes block |
| Breakout cable | Occupies both QSFP cages physically |
| Affected neighbor | Always the even/odd partner port |
| This is by design | Industry standard on Spectrum-1, Tomahawk, Tofino, etc. |
This is why on a 32-port SN2700, if you split all ports, you effectively get 32 usable 10G ports (not 64+) — every split sacrifices its neighbor. You trade port count for breakout flexibility.

No comments:
Post a Comment