Jungle-Information Technology: 2021

Repurposed/Recycled Citrix SD WAN Appliance

I recently got three of these Citrix CB-140 SD WAN devices to recycle but I decided to play with them a bit first. Open the hood and a SuperMicro motherboard is in there! It has a mSATA slot, the smallest SATA SSD drive I have ever seen, it literally is a thumbnail sized circuit board that plugs into a SATA port with two small power leads. Then a 16gb USB drive plugged into the USB header. Six total SATA ports. 8gb of unregistered ECC RAM. Seven 1gbps RJ45 network ports plus remote management. A quad core Intel Atom C2558 CPU @ 2.4ghz. Room for two 3.5" hard drives.

I upgraded the RAM to 16gb erased both drives, verified that the NICs actually work (those white chips are physical switches for detecting LAN cable disconnects, I had issues with those back on some older RiverBed appliances. I then was able to install both Windows 10 and then VMware ESXi v6.7.

I will probably end up selling these. Even though they are a tad bit too noisy to be in the living room as a media player; they would be a pretty energy efficient VMware server, Plex Server, FreeNAS, or PFSense Firewall.

The thumbnail sized SSD drive plugged directly into the SATA port.

Synology

I did some basic setup on a Synology DS920+ for a friend of mine. The NAS had the DSM 7.0.1 operating system on it. The drives that were used were a pair of Toshiba 12tb, model MG07ACA12TE9, 10.9tb usable. Also a pair of Toshiba 14tb, MG08ACA14TE drives, 12.7tb usable. Neither of these drives are on the compatibility list, but they work anyways.

During the setup, all four drives where installed into the system and we could not create ANY volumes. After some trouble shooting we powered down the system, installed a single 12tb drive, created a single drive pool and a single storage pool with success. Powered down the system added a 2nd drive and added it to the existing pools. This worked however the Synology RAID levels it took 48 hours to recalculate the RAID. Mind you this was a completely blank set of drives!

I then added the remaining 14tb drives. I could not add them to the existing pools that contained the matching pair of 12tb drives. I was able to add the two 14tb drives as a second storage and volume pools.

Then I blew away both sets of pools then created a new volume containing all four drives using the Synology default RAID(SMR) level (because it allowed for drives of different sizes and the loss of a single drive). The amount of usable space of 35tb, unfortunately it it gave an estimate of 24 hours to optimize the pools.

Rubrik take aways.....

A couple of random notes on the product; if I got something out of place or newer version addresses the issue, let me know.

-Rubrik is a "forever delta" technology, meaning it will do one full backup, then from that point forward only backup the changed blocks. EXCEPT when using another storage device for archiving. So if one is using something 'cheap and deep' in terms of storage, say a Synology NAS, Cloudian, or any S3 blob storage, a full backup will occur every 60 jobs or so. The delta chain gets to be too long and time consuming to traverse, so restores become messy. Rubrik's way around this is collapse all of the deltas, them and make another full backup. Not a big deal except when one has multiple 10TB volumes to backup. When those jobs take over a 24 hour period to ingest the impact to the production infrastructure and other backup windows can be a concern. One way to make this less painful is to scale back the number of backups. IE instead of backing up every 3 hours, which means a new backup would occur every 8 days, change it to backup every 12 hours so it only does a full every 30 days.

-"Direct Backup to Archive" means that the backups are stored directly on the secondary storage vs. being stored on the Rubrik then having old backups age off to secondary storage. Rubrik has a checkbox that can only be chosen when creating a SLA. I will mix up the terminology here but "direct archive" isn't necessarily direct-archive. When setting up the SLA's if one doesn't select this option the most current data is stored on the Rubrik and all of the old changed data is on the archive location. This is great if current and prompt restores are needed, however it does eat into ones storage plans. If the direct archive location is chosen during SLA creation the data bypasses the Rubrik, goes to the archive location, and only the meta-data lives on Rubrik. This option cannot be changed midstream, if one changes their mind the SLA needs to be deleted, the data deleted or aged off, and recreate the SLA and backup the target as if it was new.

NetApp API's: like most storage vendors some of their inner secrets have been exposed to third parties for a more efficient ecosystem. Rubrik is one of them, they can utilize the same NetApp snapshots which greatly cuts down the impact to production, the amount of time of processing to scan for changes, and thus quicker backups. Newer versions of SnapDiff API's, V3, may or may not be open to third parties. Leaving companies like Rubrik out in the cold. Not a horribly huge deal, as the Rubrik will just talk to the NetApp like any other target, but again larger backup windows will be needed and more impact to the infrastructure.

Related to the SnapDiff API's. I forget where the actual short coming is but, Rubrik can only utilize the NetApp API's one a single NetApp SVM (Storage Virtual Machine). Some installs have their NetApp setup for multi-tenancy or have have multiple SVM's for DR purposes.

Not necessarily agent free. Rubrik will talk to VM's using the standard VMware VSS and VM level snapshots, so there isn't a need to touch every VM. This yields crash consistent backups. When it comes to granular restores, and the ability to restore Active Directory components, Exchange and SharePoint items, one needs to have both the Rubrik Agent and Rubrik Service installed. The agent will add an additional Volume Shadow Service driver onto the system so it shouldn't interfere with any other production. Installing the agent can be scripted via PowerShell, to make it less painful; the install of the service requires a few clicks from the Rubrik console.

Backing up VM's twice? When it comes to SQL what sort of recovery options does one want? How much space is available? If the desire for table level restore is needed one needs to Rubrik agent and Service installed. A backup job (aka SLA) will need to be created to point at the SQL server, SQL instance, or individual DB. That works great for DB restores, but what if the "you know what" hits the fan and one needs to bring back them entire VM? Well a second backup/SLA needs to be taken at the Vsphere level. In addition to that, can one afford the to back up the complete VM? Or is space going to be tight so backup just the OS drive/VMDK and maybe the SYS and Logs drives?

If one wants the backups to be encrypted, and I don't know why one wouldn't, one must make that decision during setup. It is an all or nothing setting, and during setup is the ONLY time one can turn it on or off.

Dilbert weighs in on Certifications

Sorry, I had to! :) I am offering my services up an IT career consultant. A number of times I have assisted AssholeConsulting.com with clients seeking advice when starting a career in IT. Let me be your sounding board.

FYI: Don't do VMware Tools & Windows Updates at the same time.

Apparently this has been a issue for a while, as VMware has a KB about it. If one has their cluster set to auto-update VMware Tools, and that VM(s) is(are) shall we say "queued" for an update, and if it comes time for patch Tuesday to come around and that VM reboots, the machine can come back without a NIC. Apparently VMware Tools gets uninstalled, and not reinstalled, so there is no VMware specific drivers on the machine. One must use VMware remote console and manually install VMware Tools. Unfortunately, IP information is not retained! In a few instances the VMware Tools ISO wouldn't mount through the menu in the VCenter GUI doing the "Install VMware Tools", so one would have to manually mount the ISO in the console. So choose to manually upgrade VMware Tools, then do Windows Updates.

https://blogs.vmware.com/vsphere/2020/09/introducing-vmware-skyline-health-diagnostic-tool.html

Disk Replacement on NetApp 8200/Shelf DS460c

https://docs.netapp.com/platstor/topic/com.netapp.doc.hw-ds-sas3-service/GUID-EFFF38EC-C136-44E6-88D5-6539A55C5985.html?resultof=%22%44%53%34%36%30%43%22%20%22%64%73%34%36%30%63%22%20

Locating the correct drive isn't hard but make sure one is positive. Calling NetApp for assistance might actually do more harm than good as that support person might not know either.

The hard drive numbering starts at 0, not 1, both in the software and physically. Each shelf contains five drawers, the drawers are numbers 1 through 5, the top drawer is number 1 the bottom drawer is number 5. Each shelf contains twelve drives, so drawer #1 contains drives 0 through 11, drawer #2 contains drives 12 through 23, and so on. The 60th drive is actually #59.

The system and OnCommand will email and alert stating where the drive is. IE: "FILESYSTEM DISK FAILED Shelf 0, Drawer 5, Slot 6, Bay 54" The system is correct, kind of, the failed drive in this case is in the 5th drawer, in the 6th position, and is number 54.

Unlike other NetApp shelves the "blink drive" command is only mildly useful. With this shelf it will only blink the drawer, not the drive bay indicator; so the blink command only tells us what drawer it is in. Despite the CDOT operating system needs the drive bay parameters. Likewise there is no way to know where the failed drive on the system is just by looking at it.

storage disk set-led -disk 4.0.54 -action off <--turns off the LED's

storage disk set-led -disk 4.0.54 -action blink -duration 5 <--turns on the shelf light indicator on for 5 minutes

(note amber light on the bottom shelf)

Log into OnCommand go to Storage->Aggregates & Disks->Disks->Inventory; Filter the Container type by "broken", make note of the serial number of the failed drive. Now physically go to the drive shelf, open the drawer (email alerts will be sent out), located the failed drive. Make note of the drive layout drive layout picture on the front of the shelf. Front left, is drive #0, drive #11 is back right. In our example it is the 6th position, so front row, three over. Confirm the proper drive is chosen by looking at the serial number. If it matches remove it from the shelf, let the system sit for a minute; long enough for OnCommand to recognize the failed drive is gone. Once it is gone from the Inventory screen, install the new drive, and assign it to a shelf.

Cisco M3 C220 VMware ESX v6.5 upgrade to 7.0 and notes

Upgraded a Cisco M3 C220 from ESXi v6.5 to v7.0 and had some bumps along the way.

Here are two sites that I got some information from that I wish I had found earlier.

https://jc-lan.org/2021/03/07/upgrade-from-esxi-6-7-to-7-0-via-command-line-ssh/

https://tinkertry.com/easy-update-to-latest-esxi

This particular machine had the Cisco customized version of v6.5 installed. During the install of v7.0 several drivers had to be removed. Something that I wished I had known about earlier was to use the "--ok-to-remove" to have VMWare automatically VIB's it doesn't like; this would have saved a bunch of time!

Error message: 'Could not find a trusted signer.'

Workaround: esxcli software acceptance set --level=CommunitySupported

and tack on "--no-sig-check" at the tail end of the esxcli software line

Error Message: '[Errno 28] No space left on device'

Workaround: Make sure a datastore has been selected for swapfile space

Typing in: "/sbin/reboot" will reboot a host if no VM's are currently running.

If that doesn't work, try downloading the patch do a datastore and run the patch from there. IE:

esxcli software profile update -p ESXi-6.7.0-20210301001s-standard -d /vmfs/volumes/300gbRAID1/ESXi670-202103001.zip

HP G6 VMware ESX 6.5 upgrades to v6.7 an 7.0 and notes

Decided to finally upgrade my home lab HP G6 ML350 ESXi servers to something newer than v6.5. Usually an SD card is chosen to as the location for ESXi OS installations. This allows certain flexibility that would otherwise be harder on hard drive based installs. Namely for backing up, testing, and running multiple versions to name a few. The SD card was removed and an image of it was made, then that image was written out to a different SD card using software by the name of OSFClone. Now a second copy of the ESXi OS exists so experiments and testing can be done; and if it implodes, oh well! Just swap back to the original SD card.

One method to patch VMware ESXi servers is to do it from the Command line, and download the source from VMware's software repository. In some cases it is quicker than using Update Manager. These steps are to SSH into a server, enable HTTP & HTTPS on the firewall, download and install the update, reboot (assuming the server has no running VM's). I use this site for reference:

https://esxi-patches.v-front.de/ESXi-6.5.0.html

esxcli network firewall ruleset set -e true -r httpClient
esxcli software profile update -p ESXi-6.7.0-20201004001-standard \
-d https://hostupdate.vmware.com/software/VUM/PRODUCTION/main/vmw-depot-index.xml

Turns out this method also works really well for doing version upgrades! I ran the above lines to do the v6.5 to v6.7 upgrade.

The first attempt greeted me with dependency warnings. The generic version of ESX is installed, then the HP Offline bundle and HP Smart Storage Adapter VIBs were installed, so those needed to be removed.

esxcli software vib list (this will get the list the names of all VIBs installed)

esxcli software vib remove --vibname name of VIB

After that the upgrade went smooth and everything works. Excellent! Can we go newer?

I swapped SD card back to the original, so essentially physically reverting back at v6.5u3. Speaking of which. VCenter v7.0 does not seem to care that one server was running v6.5, then v6.7, then back to v6.5. There seems to be no issues concerning random version changes on the ESXi hosts.

Putty back into the host and ran:

esxcli network firewall ruleset set -e true -r httpClient
esxcli software profile update -p ESXi-7.0U1c-17325551-standard \
-d https://hostupdate.vmware.com/software/VUM/PRODUCTION/main/vmw-depot-index.xml

Success!!  ESXi 7.0 running on a HP Proliant ML350 G6

HP G6 and upgrading RAID cards

My home lab has a pair of HP G6 ML350 servers. Each has a Smart Array P410 w/ 1gb cache RAID card. The card is not on the Hardware Compatibility List (HCL) for ESX v6.7 and newer; it might not even be approved for v6.5 considering the fiasco I had with them previously. These cards will not see 4k drives, some people claim they will see drives larger than 4tb if the drive is 512e. That being said, I thought I'd try upgrading the RAID card, as the prices are very reasonable on the second hand market. I purchased a used Smart Array P430 with 2gb cache from eBay. These cards will do 4k drives, have a faster processor, supports for 12gb SAS, more and faster cache, and is supported by newer VMware versions. Here is some of what I discovered.

-The Smart Array P440 has a wire that connects the cache module to the motherboard. The battery for that cache module has a cable connecting to the motherboard. So unlike nearly all RAID cards, that RAID card can't easily be used in non-supporting systems.

-The Smart Array P430 & P440 are almost identical. In fact when physically looking at the cards the only way to tell the difference is to look for stickers or silk screening. The cache module physically swaps between the two; I haven't tried the older cache module on the new card to see if it will function or not.

-Putting the Smart Array P430 into a HP ML350G6, all drive LED indicators go away. Thus I have no way to tell physical drive activity or if a drive fails.

-Newer HP RAID cards give the option to boot into the cards ROM soon after post. Where as older card, boot each item separately then giving the option to go into that ROM. IE my G6 when booting, I have to wait for it to finish posting, then ILO, then the RAID card stuff comes up, at which point I press F8 to configure the RAID card. On a newer machine say a HP G8, once one sees the "Sea of Sensors" page one can press the function key to get into the RAID card's ROM. Here is an issue, putting the P430 into my HP G6, I never see the message to get into the RAID card's ROM, at no point during the posting process do I see press a Function key to configure. When the RAID card "boots" I do see a "press Escape to continue" but that is it. This makes it incredibly hard to look at virtual, logical, and physical drive(s).

The work around, albeit a crummy one.

-"Offline HP Smart Storage Administrator (HPSSA). It is a bootable ISO, newest version is v1.50(4b) CIRCA 2013. It does allow one to manipulate the RAID card as one would expect.

-HP Service Pack for Proliant (SPP) This is the bootable ISO meant to upgrade firmware on a ProLiant Server. I tried the one for a G7. It doesn't find any firmware to update, not even the RAID card. However that same SPP includes Storage Administrator, which is a few revisions newer than the offline version.

Swapping the RAID card was easy. It should also be known that these RAID cards use a special-ish cable. It has one internal x8 wide mini-SAS port. The P430 automatically imported the logical drive. In my case I have a pair of 4tb SATA drives in a mirror, the new card just accepted it without any input from me. I did run a quick benchmark on a VM that lives on that volume. It was a tiny bit faster, not much can be expected as the two 5900rpm SATA drives is the bottle neck.

Using Rufus to make bootable USB thumb drives

Rufus is one my favorite utilities to use to take an ISO and make it into a bootable thumb drive. However, once in a while, usually with HP SPP's (Service Pack for Proliant), the machine will not boot and give an error message saying it cannot find vesamenu.c32

Here's what you need to the USB drive created by Rufus to work:

Edit \syslinux.cfg on the root of the USB

Replace its content with:

DEFAULT loadconfig

LABEL loadconfig

CONFIG /system/isolinux.cfg

APPEND /system/

The problem is there are multiple isolinux/syslinux on the ISO (one in usb/ the other n system/) and of course Rufus has to try to guess which one is the right one. Unfortunately, it's the one in system/, whereas Rufus picks the one in usb/ by default.

Rubrik Node Swap

Had the "fun" of swapping out a failed Rubrik R6410 node, here are a few notes:

https://rubrik.docebosaas.com/customers/learn/course/405/play/1612/physical-replacement-r6xxx-live-demo?hash=3637a3a6fceb9dc03d7076cdf1c771ab2b26d60b&generated_by=58301

R6410 Rubrik nodes (120tb raw storage) twelve 10tb drives, apx 75tb useable

-If the node is still "functioning" remove the node from the cluster (logically), else it will have to be forcefully removed from the cluster

-Turn on UID

-Power off the node

-pull out the node

-To unlock the node from the chassis, the right pull tap ear, needs a slight downward movement to unlock it, then gripping both pull tabs pull.

-The encryption module (TPM) must be swapped, from the existing (defective node) to the replacement node.

-The SSD drive may also need to be swapped, verify the amount of ram and the NVMe SSD drive

-The TPM module is circled in yellow. To get to it the SSD drive tray must be unscrewed from the node and set off to the side. The screws (which are different lengths) are pointed out in red.

-The TPM module pulls straight up. Notice that that on the TPM module itself there is one hole for the pins that blocked out, this is so it only goes in one way and cannot be plugged in backwards.

-Once things have been cabled up, go to the front of the rack, pull off the vanity cover. There are four sets of LED’s and buttons, one set for each node. One set of those will be off, that will be for the node you just swapped. Press the power button.

-Put the defective node back in the box, seal up, apply the shipping label that came with the package, and get it to the respective delivery service. Please hold on to the defective node for at least one day before sending back, so I can be sure all is well with the swap.

Random notes on doing a restore to a Physical computer from Rubrik v5.x

High level instructions for restoring a physical machine from a Rubrik CDM v5.x I had some difficulties so here are my notes:

I took much of my instructions from here:

https://www.it-muscle.com/windows-bare-metal-recover-on-rubriks-andes-5-0/

Rubrik instructions on how to make their boot ISO

https://support.rubrik.com/s/article/000001905

Rubrik instructions to do the restore

https://support.rubrik.com/s/article/000002677

Create an a WindowsPE Boot ISO using the info from the above links. When creating the boot ISO, one will need both the ADK for Windows AND the Windows PE Add-On and also the Rubrik Recovery Tool Install kit. When installing the ADK check all of the features.

Boot the machine from that ISO. Once there make note of the IP, if there is no IP address, assign one.

In Rubrik, find the server to be restored, choose a snapshot to "Mount", choose "No Host" , leave domain, usernames, and AD groups blank, I did put the IP's of a management station and the IP of the WinPE machine.

In Rubrik, go to the "Live Mounts" section, the "Windows Volumes", one should see the server we are working with mounted. Make note of the "Restore Script Path". What Rubrik has done is present a CIFS share that contains a PowerShell script and a VHDX (Hyper-V virtual hard drive). Test this by browsing to the share (make note of the security limitations that were set during the presenting of the snapshot. IE if the snapshot was limited to certain IP's, test the share from a machine having a matching IP.

On the machine to be restored, map the Z-drive to share off of the Rubrik, far as I can tell this is for authentication purposes. Use: "administrator" and no password.

net use z: \\<IP of Rubrik>\<sharename> /user:<winpe_client_ip>\administrator *

On the machine to be restored, the next step is to launch the power shell script that will take the contents of the VHDX and dump it to the hard drive.

Type:

"powershell" then "set-executionPoloicy unrestricted" then "\\<IP of Rubrik>\<sharename>\with_layout\RubrikBMR.ps1

Random notes on Vmware Vcenter v6.5 & 7.0

Clearing out the VMware Update Manager before an upgrade will cut down on the time required to update, as by default the VCenter upgrade will drag all that data with. Also during the migration one might see an error message saying it needs an export location because the root partition is only 4gb.

https://www.stephenwagner.com/2020/07/22/vcsa-vcenter-upgrade-7-enter-new-export-directory-source-machine/

Log in to your vCSA source appliance via SSH or console
Run the applicable steps as defined in the VMware KB 2147284 to reset VUM (WARNING: commands are version specific). In my case on vCSA 6.5 I ran the following commands:
1. shell
2. service-control --stop vmware-updatemgr
3. /usr/lib/vmware-updatemgr/bin/updatemgr-util reset-db
4. rm -rf /storage/updatemgr/patch-store/*
5. service-control --start vmware-updatemgr
Open your web broswer and navigate to https://new-vcsa-IP:5480 and resume the migration. You will now notice a significant space reduction and won’t need to specify a new mount point

Forgot the administrator@vsphere.local password? SSH into the VCenter, start the shell, and type: /usr/lib/vmware-vmdir/bin/vdcadmintool

http://www.virtubytes.com/2017/10/25/reset-vcenter-sso-password-vsphere-6-5/

Need to adjust DNS settings, like the FQDN?

https://www.virtualvmx.com/2018/05/changing-fqdn-of-vcenter-appliance.html

Access the VCSA from console or from Putty session.
Login with root permission
Use above command in the command prompt of VCSA : /opt/vmware/share/vami/vami_config_net
Opt for option 3 (Hostname)
Change the hostname to new name
Reboot the VCSA appliance.

If there is any change to the naming of the vCenter, logon issues with using Active Directory may occur. In my case one could logon using @vsphere.local, and clicking the check box to use local credentials worked. However one could not type their credentials in, regardless of UPN format. The fix for us what find the AD record of the vCenter server, go to the Attribute Editor, and change the "dNSHostName" entry. FWIW in one instance the vCenter was using the internet name, which is different than the Active Directory name.
When using the built in backup feature to backup the VCSA, when using FTP the destination folder must be empty.
https://www.altaro.com/vmware/backing-up-vcsa-6-5-natively-using-ftps/
https://sfitpro.blogspot.com/2016/11/configuring-vcsa-65-backup-lessons.html
When using the built in backup feature to backup the VCSA, all of the services need to be running, including ones for unused features. Again from the shell "service-control --start --all"
When doing the in place upgrade, use the hostname of an ESXi server for the source of the existing vCenter server and the destination. Things get wonky if one tries to deploy a vCenter Server on top of the old on.
DNS is super important (duh), my home lab kept having weird DNS lookup failures, one can use the IP addresses for the upgrade process.
If you have plug-ins that don't work after the upgrade and seems to uninstall them, look into using JXplorer; think of it as a regedit for vCenter. The plugins will be in the "ServiceRegistrations" section.