vCenter Backup/Restore and "Upstream Server not found"

I learned somethings on Virtual Center 7.x this weekend.  Here are a few key take aways that hopefully helps someone else out there:


Restoring Virtual Center using Veeam

If one needs to restore a vCenter using Veeam; and assuming that Veeam was setup where the vCenter was added to Veeam, like most environments, versus having each ESXi host added to Veeam.  Veeam cannot restore vCenter the same way one would normally do a VM restore.  Since during the restore process Veeam NEEDS to communicate with the now non-existent vCenter.  The restore cannot happen and the process will waste great amounts of time.  

The work around is to first add a single ESXi host to Veeam, add that host using the IP (assuming it the friendly name is known by Veeam; the point is we are tricking Veeam this is a new ESXi server.  Then restore the vCenter backup to an alternate location...aka this "new" server.  One isn't really restoring the VM to an alternate location, one can totally restore over the top of the existing VM, this is just a necessary step to get past the requirment of having to circumvent communicating with a vCenter that is currently not functioning. 


Restoring Virtual Center using built-in restore options.

For those you who are using vCenter's built in backup feature, the restore process works as follows.   Start the vCenter installation process, choose "Restore", give it the backup repository details, and follow the rest of the prompts.  What happens is a new vCenter VM is deployed and the backup information is replayed into it.  HOWEVER, the backup files and the install media need to be on the very same version.  For me, I attempted to use vCenter 7.0.3-209 installaton media, and my broken vCenter started out life as a v7.0.1x , then upgraded several times and currently on 7.0.3.01000.  It would not restore and complained about a versioning mismatch. I did not bother trying to acquire previous versions of vCenter to attempt the recovery.  Apparently, one should keep the install media around, and when patching vCenter, also acquire the full ISO to install from. 


VCenter Error when logging in:  "No Healthy Upstream Server"

For some unknown reason my vCenter quit working and I was presented with the no healthy upstream server message when attempting to log in.  The error message is very vague and somewhat of a "catch all".  Hence,  there is no one golden fix out there, but a dozen different fixes.  Many people give up and just re-deploy from scratch. 

I could log into the appliance administration page, where I attempt to start the non-running services with no luck, restarting the appliance didn't work.  

My server the "VMware vCenter Server" (VMware-vpxd service) would not start, it would process for a good two minutes before failing.

A few usefull commands to type in at the shell:

service-control --list <--list all of the running services

#service-control --start --all <--start all services #service-control –-start {service-name} <--start a specific service

DNS both forward and backwards worked, resolution with the FQDN and short name worked. Disk space was not an issue, as mentioned by dozens of others. I did attempt to change my IP settings as recommended but some sort of GUI bug prevented me of doing that.  Changing the IP via console did not help.  NTP did not seem to be an issue.  

An attempt to upgrade vCenter, thinking the upgrade would replace/repair any broken bits, was tried.  This made things way worse, and the upgrade failed; so I had to roll back.

I downloaded and ran python script from a VMware KB to check my SSL certificates, and they passed.   

The logs to vmware-vpxd are found here: /storage/log/vmware/vpxd-svcs/vpxd-svcs.log

After several hours of beating on it, I found several other pages talking about certs. and a different way to check for cert. valitity.  This check showed mine as being failed.  Turns out the .py script I ran previously, actually lead me astray, I did indeed have expired certs.

for i in $(/usr/lib/vmware-vmafd/bin/vecs-cli store list); do echo STORE $i; sudo /usr/lib/vmware-vmafd/bin/vecs-cli entry list --store $i --text | egrep "Alias|Not After"; done

 Running the Cert. Tool contained within vCenter to reset all the Certs fixed my issue.  FYI it does take quite a while to run.

/usr/lib/vmware-vmca/bin/certificate-manager <---to launch the cert. utility; #8 to reset the cert




Post Recovery
After all is said and done, Veeam will be broken.  When a backup job is attempted to be ran it will fail.   Since the SSL certs have changed, as far as Veeam is concerned vCenter is not the same machine.  To fix it go into Veeam:
-Backup Infrastructure->click the vCenter, choose properties->Next->accept the dialog box to update the Certificate->Finish


No comments:

Post a Comment