Jungle-Information Technology

LoadBalancer.org

LoadBalancer.org Is one of many layer 7 load balancers on the market today. I got turned onto them because they have an alignment with Cloudian object storage devices. Think of Cloudian, as an on premises S3 storage buckets. They are great for "cheap and deep" storage. Cloudian nodes are not load balanced natively, and during jobs such as backups where it is the target, a single node can get over worked while the other nodes are bored. This is normal behavior when one is relaying on round robbing DNS for distributing load.

The LoadBalancer.org product is significantly cheaper than some of their competitors. Their support is based out of the UK, so it is a bit more difficult to get a support person on the phone if one is in a different time zone. They offer both virtual appliances and physical.

I did a proof of concept built as an appliance, running on a retired VMware ESXi host that had a 10gb networking. From beginning to end, I had it functioning in roughly an hour. Super easy and straight forward, instructions for many use cases are laid out for users on their support site. For the POC it did it's job by alleviating the "hot node" issue and allowing backups to take less time as multiple Cloudian nodes could do work at the same time.

The physical appliances are rebranded Dell PowerEdge servers. I had several problems bonding the 10gb NICs on our appliance. Support was not much help as they know their product really well but not so much the network switch side of things. Our issued ended up being odd behavior out of the Cisco Nexus 9k. Word of advise: when re-using ports on a Cisco 9k run the "set default" command on the port before configuring it for it's new purpose. Something was sticking in the configuration and the team was not cooperating until doing this for each port.

When using multiple VLANs the appliance breaks out the VLAN as a separate interface. Think of it, and manage it just as if it was a separate physical NIC. Also when looking at performance graphs keep in mind the difference between MBps and mbps.

Repurposed/Recycled Sophos Firewall

This Sophos XG210 came to me to be recycled, after pulling the cover off I noticed that it has DDR3 RAM, an SSD, and a VGA port. I then said myself: "Hey wait a minute, this looks like a normal PC". Hooked up a VGA cable and a USB keyboard, powered it on; I was greeted with a very familiar American Megatrends BIOS, then it booted into a specialized Linux OS. Next I used my PartedMagic bootable USB boot drive to erase the drives then install Windows 10 just as a proof of concept. Windows saw all the hardware, including all the NICs!

Intel Celeron G1820 CPU at 2.7ghz

8gb DDR3 RAM

Intel 120gb SSD hard drive

six 1gbps network ports

USB 3.0 ports

There is an internal PCI-e slot, that could be used but might require some creativity.

Repurposed/Recycled Citrix SD WAN Appliance

I recently got three of these Citrix CB-140 SD WAN devices to recycle but I decided to play with them a bit first. Open the hood and a SuperMicro motherboard is in there! It has a mSATA slot, the smallest SATA SSD drive I have ever seen, it literally is a thumbnail sized circuit board that plugs into a SATA port with two small power leads. Then a 16gb USB drive plugged into the USB header. Six total SATA ports. 8gb of unregistered ECC RAM. Seven 1gbps RJ45 network ports plus remote management. A quad core Intel Atom C2558 CPU @ 2.4ghz. Room for two 3.5" hard drives.

I upgraded the RAM to 16gb erased both drives, verified that the NICs actually work (those white chips are physical switches for detecting LAN cable disconnects, I had issues with those back on some older RiverBed appliances. I then was able to install both Windows 10 and then VMware ESXi v6.7.

I will probably end up selling these. Even though they are a tad bit too noisy to be in the living room as a media player; they would be a pretty energy efficient VMware server, Plex Server, FreeNAS, or PFSense Firewall.

The thumbnail sized SSD drive plugged directly into the SATA port.

Synology

I did some basic setup on a Synology DS920+ for a friend of mine. The NAS had the DSM 7.0.1 operating system on it. The drives that were used were a pair of Toshiba 12tb, model MG07ACA12TE9, 10.9tb usable. Also a pair of Toshiba 14tb, MG08ACA14TE drives, 12.7tb usable. Neither of these drives are on the compatibility list, but they work anyways.

During the setup, all four drives where installed into the system and we could not create ANY volumes. After some trouble shooting we powered down the system, installed a single 12tb drive, created a single drive pool and a single storage pool with success. Powered down the system added a 2nd drive and added it to the existing pools. This worked however the Synology RAID levels it took 48 hours to recalculate the RAID. Mind you this was a completely blank set of drives!

I then added the remaining 14tb drives. I could not add them to the existing pools that contained the matching pair of 12tb drives. I was able to add the two 14tb drives as a second storage and volume pools.

Then I blew away both sets of pools then created a new volume containing all four drives using the Synology default RAID(SMR) level (because it allowed for drives of different sizes and the loss of a single drive). The amount of usable space of 35tb, unfortunately it it gave an estimate of 24 hours to optimize the pools.

Rubrik take aways.....

A couple of random notes on the product; if I got something out of place or newer version addresses the issue, let me know.

-Rubrik is a "forever delta" technology, meaning it will do one full backup, then from that point forward only backup the changed blocks. EXCEPT when using another storage device for archiving. So if one is using something 'cheap and deep' in terms of storage, say a Synology NAS, Cloudian, or any S3 blob storage, a full backup will occur every 60 jobs or so. The delta chain gets to be too long and time consuming to traverse, so restores become messy. Rubrik's way around this is collapse all of the deltas, them and make another full backup. Not a big deal except when one has multiple 10TB volumes to backup. When those jobs take over a 24 hour period to ingest the impact to the production infrastructure and other backup windows can be a concern. One way to make this less painful is to scale back the number of backups. IE instead of backing up every 3 hours, which means a new backup would occur every 8 days, change it to backup every 12 hours so it only does a full every 30 days.

-"Direct Backup to Archive" means that the backups are stored directly on the secondary storage vs. being stored on the Rubrik then having old backups age off to secondary storage. Rubrik has a checkbox that can only be chosen when creating a SLA. I will mix up the terminology here but "direct archive" isn't necessarily direct-archive. When setting up the SLA's if one doesn't select this option the most current data is stored on the Rubrik and all of the old changed data is on the archive location. This is great if current and prompt restores are needed, however it does eat into ones storage plans. If the direct archive location is chosen during SLA creation the data bypasses the Rubrik, goes to the archive location, and only the meta-data lives on Rubrik. This option cannot be changed midstream, if one changes their mind the SLA needs to be deleted, the data deleted or aged off, and recreate the SLA and backup the target as if it was new.

NetApp API's: like most storage vendors some of their inner secrets have been exposed to third parties for a more efficient ecosystem. Rubrik is one of them, they can utilize the same NetApp snapshots which greatly cuts down the impact to production, the amount of time of processing to scan for changes, and thus quicker backups. Newer versions of SnapDiff API's, V3, may or may not be open to third parties. Leaving companies like Rubrik out in the cold. Not a horribly huge deal, as the Rubrik will just talk to the NetApp like any other target, but again larger backup windows will be needed and more impact to the infrastructure.

Related to the SnapDiff API's. I forget where the actual short coming is but, Rubrik can only utilize the NetApp API's one a single NetApp SVM (Storage Virtual Machine). Some installs have their NetApp setup for multi-tenancy or have have multiple SVM's for DR purposes.

Not necessarily agent free. Rubrik will talk to VM's using the standard VMware VSS and VM level snapshots, so there isn't a need to touch every VM. This yields crash consistent backups. When it comes to granular restores, and the ability to restore Active Directory components, Exchange and SharePoint items, one needs to have both the Rubrik Agent and Rubrik Service installed. The agent will add an additional Volume Shadow Service driver onto the system so it shouldn't interfere with any other production. Installing the agent can be scripted via PowerShell, to make it less painful; the install of the service requires a few clicks from the Rubrik console.

Backing up VM's twice? When it comes to SQL what sort of recovery options does one want? How much space is available? If the desire for table level restore is needed one needs to Rubrik agent and Service installed. A backup job (aka SLA) will need to be created to point at the SQL server, SQL instance, or individual DB. That works great for DB restores, but what if the "you know what" hits the fan and one needs to bring back them entire VM? Well a second backup/SLA needs to be taken at the Vsphere level. In addition to that, can one afford the to back up the complete VM? Or is space going to be tight so backup just the OS drive/VMDK and maybe the SYS and Logs drives?

If one wants the backups to be encrypted, and I don't know why one wouldn't, one must make that decision during setup. It is an all or nothing setting, and during setup is the ONLY time one can turn it on or off.

Dilbert weighs in on Certifications

Sorry, I had to! :) I am offering my services up an IT career consultant. A number of times I have assisted AssholeConsulting.com with clients seeking advice when starting a career in IT. Let me be your sounding board.

FYI: Don't do VMware Tools & Windows Updates at the same time.

Apparently this has been a issue for a while, as VMware has a KB about it. If one has their cluster set to auto-update VMware Tools, and that VM(s) is(are) shall we say "queued" for an update, and if it comes time for patch Tuesday to come around and that VM reboots, the machine can come back without a NIC. Apparently VMware Tools gets uninstalled, and not reinstalled, so there is no VMware specific drivers on the machine. One must use VMware remote console and manually install VMware Tools. Unfortunately, IP information is not retained! In a few instances the VMware Tools ISO wouldn't mount through the menu in the VCenter GUI doing the "Install VMware Tools", so one would have to manually mount the ISO in the console. So choose to manually upgrade VMware Tools, then do Windows Updates.

https://blogs.vmware.com/vsphere/2020/09/introducing-vmware-skyline-health-diagnostic-tool.html

Disk Replacement on NetApp 8200/Shelf DS460c

https://docs.netapp.com/platstor/topic/com.netapp.doc.hw-ds-sas3-service/GUID-EFFF38EC-C136-44E6-88D5-6539A55C5985.html?resultof=%22%44%53%34%36%30%43%22%20%22%64%73%34%36%30%63%22%20

Locating the correct drive isn't hard but make sure one is positive. Calling NetApp for assistance might actually do more harm than good as that support person might not know either.

The hard drive numbering starts at 0, not 1, both in the software and physically. Each shelf contains five drawers, the drawers are numbers 1 through 5, the top drawer is number 1 the bottom drawer is number 5. Each shelf contains twelve drives, so drawer #1 contains drives 0 through 11, drawer #2 contains drives 12 through 23, and so on. The 60th drive is actually #59.

The system and OnCommand will email and alert stating where the drive is. IE: "FILESYSTEM DISK FAILED Shelf 0, Drawer 5, Slot 6, Bay 54" The system is correct, kind of, the failed drive in this case is in the 5th drawer, in the 6th position, and is number 54.

Unlike other NetApp shelves the "blink drive" command is only mildly useful. With this shelf it will only blink the drawer, not the drive bay indicator; so the blink command only tells us what drawer it is in. Despite the CDOT operating system needs the drive bay parameters. Likewise there is no way to know where the failed drive on the system is just by looking at it.

storage disk set-led -disk 4.0.54 -action off <--turns off the LED's

storage disk set-led -disk 4.0.54 -action blink -duration 5 <--turns on the shelf light indicator on for 5 minutes

(note amber light on the bottom shelf)

Log into OnCommand go to Storage->Aggregates & Disks->Disks->Inventory; Filter the Container type by "broken", make note of the serial number of the failed drive. Now physically go to the drive shelf, open the drawer (email alerts will be sent out), located the failed drive. Make note of the drive layout drive layout picture on the front of the shelf. Front left, is drive #0, drive #11 is back right. In our example it is the 6th position, so front row, three over. Confirm the proper drive is chosen by looking at the serial number. If it matches remove it from the shelf, let the system sit for a minute; long enough for OnCommand to recognize the failed drive is gone. Once it is gone from the Inventory screen, install the new drive, and assign it to a shelf.

Cisco M3 C220 VMware ESX v6.5 upgrade to 7.0 and notes

Upgraded a Cisco M3 C220 from ESXi v6.5 to v7.0 and had some bumps along the way.

Here are two sites that I got some information from that I wish I had found earlier.

https://jc-lan.org/2021/03/07/upgrade-from-esxi-6-7-to-7-0-via-command-line-ssh/

https://tinkertry.com/easy-update-to-latest-esxi

This particular machine had the Cisco customized version of v6.5 installed. During the install of v7.0 several drivers had to be removed. Something that I wished I had known about earlier was to use the "--ok-to-remove" to have VMWare automatically VIB's it doesn't like; this would have saved a bunch of time!

Error message: 'Could not find a trusted signer.'

Workaround: esxcli software acceptance set --level=CommunitySupported

and tack on "--no-sig-check" at the tail end of the esxcli software line

Error Message: '[Errno 28] No space left on device'

Workaround: Make sure a datastore has been selected for swapfile space

Typing in: "/sbin/reboot" will reboot a host if no VM's are currently running.

If that doesn't work, try downloading the patch do a datastore and run the patch from there. IE:

esxcli software profile update -p ESXi-6.7.0-20210301001s-standard -d /vmfs/volumes/300gbRAID1/ESXi670-202103001.zip

HP G6 VMware ESX 6.5 upgrades to v6.7 an 7.0 and notes

Decided to finally upgrade my home lab HP G6 ML350 ESXi servers to something newer than v6.5. Usually an SD card is chosen to as the location for ESXi OS installations. This allows certain flexibility that would otherwise be harder on hard drive based installs. Namely for backing up, testing, and running multiple versions to name a few. The SD card was removed and an image of it was made, then that image was written out to a different SD card using software by the name of OSFClone. Now a second copy of the ESXi OS exists so experiments and testing can be done; and if it implodes, oh well! Just swap back to the original SD card.

One method to patch VMware ESXi servers is to do it from the Command line, and download the source from VMware's software repository. In some cases it is quicker than using Update Manager. These steps are to SSH into a server, enable HTTP & HTTPS on the firewall, download and install the update, reboot (assuming the server has no running VM's). I use this site for reference:

https://esxi-patches.v-front.de/ESXi-6.5.0.html

esxcli network firewall ruleset set -e true -r httpClient
esxcli software profile update -p ESXi-6.7.0-20201004001-standard \
-d https://hostupdate.vmware.com/software/VUM/PRODUCTION/main/vmw-depot-index.xml

Turns out this method also works really well for doing version upgrades! I ran the above lines to do the v6.5 to v6.7 upgrade.

The first attempt greeted me with dependency warnings. The generic version of ESX is installed, then the HP Offline bundle and HP Smart Storage Adapter VIBs were installed, so those needed to be removed.

esxcli software vib list (this will get the list the names of all VIBs installed)

esxcli software vib remove --vibname name of VIB

After that the upgrade went smooth and everything works. Excellent! Can we go newer?

I swapped SD card back to the original, so essentially physically reverting back at v6.5u3. Speaking of which. VCenter v7.0 does not seem to care that one server was running v6.5, then v6.7, then back to v6.5. There seems to be no issues concerning random version changes on the ESXi hosts.

Putty back into the host and ran:

esxcli network firewall ruleset set -e true -r httpClient
esxcli software profile update -p ESXi-7.0U1c-17325551-standard \
-d https://hostupdate.vmware.com/software/VUM/PRODUCTION/main/vmw-depot-index.xml

Success!!  ESXi 7.0 running on a HP Proliant ML350 G6

Certifications:

Vmware:
-VCP5-DCV510 updated to v5.5, Updated to V6.5
-Presales Tech. Certified (vSphere 4 &5)

Microsoft:

-MCTIP Microsoft Certified Tech. Internet Pro. (Server 2008 R2)

-MCSE Server 2012r2

RiverBed (2007)Presales Tech. Certified

Juniper (2008):
-JNCIA-Junos
-JNCIA-EX (Ent. Switching)
-JNCIA-ENT (Ent. Routing)
-JNCIA-SEC (Security)

Dell:
-Ent. Architecture Server Speciality
-Ent. Architecture v12
-Ent. Architecture Advanced Infrastructure v2
-Virtualization v1
-Equallogic Sales & Technical v5.0

Symantec:
-Backup Exec 2010 & 2012 Sales & Presales
-Endpoint Sales & Presales

Watchguard:
-Certified Pro. XTM v11.4, 11.7, v11.9