Mellanox SN2700 100gb Switch

 Had one of these fine Mellanox SN2700 32 port 100gb switches appear to go bad.   All of the port lights were on, solid, the warning lights were on.  Could not ping, nor SSH into it, a direct console cable gave no joy.  Even rebooting it, there was nothing to display during the boot process.

Many people say the SSD's tend to fail.  Fair enough, take it apart, and put it's mSATA SSD into an adapter, then put that into a PC to where I can run some diagnostics.   Using tools from PartedMagic I found signs of wear, but no red flags.  Then a replacement mSATA SSD was put into the switch to see if I could just load the OS onto it from scratch.  No joy, still no post, nothing on the console cable, LEDs were the same status.

Ok, must be a bad system board or something, a replacement switch is obtained.  Rack it, assign a management IP, erase the old configuration.  Might as well update the firmware while we have the chance.  Nvidia doesn't just give the firmware, OK so create an account, attempt to log on, get a warning about my account isn't valid....joy!   More searching yields a Lenovo site that firmware for the same switch...OK great.  I did several rounds of updates, doing it in steps as to not jump too many versions.  v3.6->3.9->3.10

Fortunately updating firmware with the WebUI is easy!


I thought I would be smart and see if I could just copy off the old configuration file off of the old hard drive and put it onto the new one.   Well, even with ChatGTP I couldn't find it.  The AI told me to look in a few different places, none of them panned out.

Time to reconfigure to match the new environment.  There are several 100gb->25gb*4 connections as well as 40gb->10gb*4 connections.  Since I am not a Mellanox CLI expert, I get help from ChatGTP.  After much trial and error, none of the commands work.  ChatGTP tells me that it is because the generic "onie_x86" code for the entire SN2xxx model line is loaded and not the specific one for the SN2700.  Well shoot.  I can't find firmware specifically for this model, only the product line. I switch to Claude, and it gives me different commands, most of them seem to work.  I was able to break the 100gb port into multiple sub interfaces.  Also one must reboot the switch for the changes to take effect....very annoying!  Also to do most any changes to the port; that port must be disabled first (just issue a shut command).

switch (config) # interface ethernet 1/1 module-type qsfp-split-4 

switch (config) # interface ethernet 1/1 breakout 4x10G  <--did not work


HOWEVER, I notice other ports are now offline.  IE in this case, I had port #3 split into four ports.  Now port 4 is disabled.  Researching this, apparently it is a common thing as the ASSIC which is assigned to two ports can't handle the multiple configurations.

notice how even in the WebUI ports four and six are no longer present.

switch-97277a [standalone: master] (config) # interface ethernet 1/11 module-type qsfp-split-4
the following interfaces will be unmapped: 1/11 1/12
Type 'YES' to confirm split: YES
% Interface Eth1/12 should be disabled





Here is what Claude had to say...FYI:

This is a fundamental hardware limitation. Here's why:

The Root Cause: Shared SerDes Lanes

The SN2700's ASIC (Spectrum-1) connects to QSFP28 ports using SerDes (Serializer/Deserializer) lanes. Each 100GbE port uses 4 lanes × 25Gbps = 100Gbps.

When you do a 4×10G breakout, those same 4 lanes are redistributed — and the neighboring port shares the same lane group from the ASIC's perspective.


How the ASIC Groups Ports

The Spectrum-1 chip groups ports into pairs that share a single 4-lane SerDes block:

Physical Port Pair    |  SerDes Block
----------------------|---------------
Port 1  + Port 2      |  Block A  (8 lanes total)
Port 3  + Port 4      |  Block B
Port 5  + Port 6      |  Block C
...and so on

When port 1 is split into 4×10G, it consumes all 4 lanes of its side of the block — but the block's bandwidth/lane allocation is now "locked" in split mode, making port 2 unavailable because:

  • The ASIC can no longer independently assign lanes to port 2
  • The physical QSFP cage for port 2 has no SerDes lanes left to drive it
  • The port 2 cage effectively becomes the physical connector for sub-ports 3 and 4 of the breakout

Think of It This Way

BEFORE BREAKOUT:
[Port 1 - 100G] [Port 2 - 100G]
  Lane 0,1,2,3    Lane 4,5,6,7

AFTER 4x10G BREAKOUT on Port 1:
[1/1][1/2][1/3][1/4]  [Port 2 = DEAD]
Lane: 0   1   2   3    Lanes consumed

The breakout cable's 4 individual fibers/conductors actually map across both physical QSFP cages on the faceplate.


Key Takeaway

FactorDetail
Lane sharingEvery 2 adjacent ports share one SerDes block
Breakout cableOccupies both QSFP cages physically
Affected neighborAlways the even/odd partner port
This is by designIndustry standard on Spectrum-1, Tomahawk, Tofino, etc.

This is why on a 32-port SN2700, if you split all ports, you effectively get 32 usable 10G ports (not 64+) — every split sacrifices its neighbor. You trade port count for breakout flexibility.

A Quick look at the PowerEdge r6715

 A quick look at one of the newer products from Dell.  This PowerEdge r6715, powered by a single AMD CPU.

First impressions are that I am not impressed.  Of the two we got into the lab, one of them had a boot failure.  Long story short of of the cables leading to the backplane was not seated properly.  The other Is I am not a fan of have cables connecting the riser card.  It just seems like a fail point.The CPU heatsink is VERY interesting.








VMware Users' Group Connect Minneapolis 2026 and VMware Certification

 A pass is a pass.....even if it is by the skin of your teeth!  A minimum passing score is 300 points!  Passed my VMware by Broadcom VCP-VCF Vmware Cloud Foundation v9 Architect Certification  at the VMUG Connect Minneapolis!   Minneapolis VMware User Group VMware User Group (VMUG)















NVMe wearing out?!?

 One of the developer machines had an error, further digging revealed hard disk warnings.  The drive functions just fine, but was showing an error.  Turns out this this particular drive is well beyond its design specification.  There is no failed cells, no errors reading or writing, but the drive has 200% of amount of data written to it as designed.   Kind of like if one has a car that goes 250,000 miles; the manufacture didn't expect nor design the car to go that far.  Except in this case one gets a warning.  FWIW in this case the drive in question is HP M.2 NVMe drive.  I question HP's fuzzy match in calculating the endurance of the drive.





MAC OS on VMware ESXi

 Can one run a MAC OS as a VM on a VMware host?  Well technically yes.  Legally, No.  In my case we needed to just test how Saffari acts with some web pages.  

Disclaimer: my understanding is that Apple VMs are only legal on Apple Hardware.  The only hardware that supports ESX is the "TrashCan".  Otherwise Vmware Fusion is a free download. 

High Level Steps:

-on a Mac using the App Store, download the bits to the OS in question

-on that Mac, convert those bits to a DMG file

-on that Mac, convert the DMG to an ISO

-upload the ISO to VMware server

-create a VM, mount the ISO and install, however the following changes need to be made to the ".vmx" file.

smc.present = "TRUE"

smc.version = "0"

hw.model = "Macmini8,1"

board-id = "Mac-7BA5B2D9E42DDD94"

serialNumber = "C02ZK0XXXXXX"

efi.nvram.var.ROM = "A1B2C3D4E5F6"

efi.nvram.var.MLB = "C02712300Q6NNNJA8"


Once installed, install VMware tools.  I had a really hard time finding them, the file is called "Darwin.ISO"






New life to really old Macs??


 Apple is really strict as to what OS can be loaded on their hardware.  In this case I have a Mac Mini 2014, which is an i5-4278u.  The newest OFFICIALLY supported OS it will run is Monterey.  However with the help of "OpenCore Legacy Patcher" I was able to put Sequoia on it. 

The machine is slow, but that is to be expected on 11 year old hardware, including a spinning hard disk.  I should put in an SSD but I didn't want to take an hour to do that swap.  No thanks to Apple Engineers making things overly complicated. 



Sequoia installed on the infamous Trashcan, which isn't so bad considering it has an actual GPU, a Xeon processor, and a NVMe hard drive.

Somewhat ironically many Mac Mini 2012's actually outperform a 2014, due to the type of processor used.  Here is a few of the different models I found floating around:




Swapping out the 5400rpm hard drive isn't the most straight forward but very doable on the 2012, and more of a PITA on the 2014.




And for a bit of blasphemy: Windows 11 installed on a 2012 Mac Mini


Even more blasphemy......running Windows 11 on the infamous Trashcan.





Perle IOLAN Serial-Over-IP switch

 Do you have network devices that you need to occasionally log in via a console/serial port?   For example one wants a backup access to a switch at a remote site?   Perle has a solution.  In this case the IOLAN SDS32c has 32 ports for serial access.

Here are somethings I discovered.

They have an "Admin" port which I assumed was a serial port to access the box.   I could never get any connection, despite trying various serial cables and what not.   

The way to set up the box is to power it up, hold the RESET button down for three seconds, once it reboots, launch the software "EasyConfig", it will scan the network looking for Perle devices, once discovered, assign an IP address, then use a web browser to connect to that IP.  The default username is "admin" and the password: "superuser"

The models I have also require a CROSS-OVER in the mix.  I purchased little adapter/couplers from both MonoPrice and Amazon.  





In order to connect to a device, one telnets into the IP/DNS name of the serial switch followed by the port number that matches the port of on the serial switch.  IE if one wanted to connect to "stella", then telnet to:  IP-Of-Serial-switch:10002

One thing that REALLY annoys me, is that ANY change requires a reboot of the device.  Even changing the label to a port requires a reboot!




I also have a IOLAN SCS48 which is different far as the serial port wiring and I haven't been able to successfully get a terminal session to work.  Supposedly it prewired as RS232 Sun/Cisco pinout for direct console meaning it should just be a straight through cable for most connections.






mechanical 4gb USB "thumb" drive?

 This came through during some e-wasting recycling.  It is a 4gb hard drive (yes 4gb, not 4tb....CIRCA 2001?), but it is has a mechanical/spinning platters!  It appears to be in the Compact Flash form factor.   The card it is plugged into is a USB adapter.  Some searching says it is mostly likely from an early PlayStation memory expansion card.




10gb network card

 I ran across this and had to research what the heck it was.  I don't think I have ever seen a connector like that before. Turns out it's a 10gb network card, this particular one is a Chelsio.


Dell A/I GPU Server: Dell PowerEdge C4140

 

Another interesting build.  Dell has a line of GPU servers, and they are aged enough that they are on the 2nd hand market.  The cost of PCIe Tesla's or equivalent are kind of ridiculous.   However, Nvidia has another form factor, called SMX or NVlink.  It has more bandwidth than PCIe 3.0.  Because it has a completely different interface, it means there is less machines than can run this GPU, thus the prices are much lower.  In this case I was tasked with converting one from PCIe GPU's to SMX GPU's.  There is not a lot of information on these machines so maybe this post will help someone else. 

Pictured with Nvidia SMX GPU's

PowerEdge c4130 vs c4140


Side by side picture of the 2400 watt power supply next to the 1100watt power supply.  Note the 2400w power supply take a different power cord!  It takes a C19 connection.  As the power supply has a 16amp draw.  The system will not run on two 1100 watt power supplies, well kind of.  The system powers on, get through 1/2 of the posting process then powers down.  Looking in iDRAC, no errors are logged!  See this video:
System powers on, starts the posting process then shuts down.


Picture showing SMX2 system board with the GPU's.
To put this board in the bottom tray had to be swapped out.  To swap the trays the front frame had to be removed.  Which unfortunately meant there is now way to mount the power button, and LED indicator.  I couldn't find a replacement component listed anywhere.  I might be able to cut the old one down.  For now the LCD board is held in with foam and a zip tie.

Official cable routing for PCIe and power cables, there is not circuit board connector between the SMX board and the system board.

Picture showing post power, and PCIe cables, and GPU's installed.

Notes:
-System MUST use a 2400 watt power supply, dual 1100 watt supplies is not sufficient
-the third PCIe slot, the riser card is the same as the PowerEdge r640
-the 2nd PCIe slot does NOT support PCIe fraction 
-Does not support booting from NVMe; at least not from a PCIe->M.2 NVMe drive
-There is a "kit" to mount two 2.5" drives in the spot where the 2nd power supply would normally live, I just could not find the parts to purchase
-Used the same "modified" OCD SATA connector as other x40 PowerEdge servers
    -tried both the generic and Dell cable and could not get it to recognize any SATA drives.
-There is a fan shroud that goes between the GPU's..couldn't find the part number and/or a place to purchase
-Doing Automatic Updates on Windows 2019 would cause it to bluescreen.  My theory is that the Microsoft video drives conflicted with the Nvidia GPU's.

pcie cable  0y6tgj
pcie cable 0688n0
pcie cable  02f5p8
pcie cable  086khr
power cable   0ynp05
power cable   0528cn
power cable   0528cn
power cable   0ryj56