VCF 3.9 and stretching vSAN Cluster- behind the scenes

In this blog post i will describe what is going on in the background when ‘./sos –stretch-vsan’ command is initiated.

It’s not a rocket science because all these steps are done when you stretch cluster manually but i thought it can be helpful for someone who never did that. Some tasks are described as ‘all clear’. I assumed they don’t require a comment because their names are quite clear.

All prerequisites and procedure how to stretch a cluster across two availability zones are described in VMware documentation.

Ok, so when you ssh to SDDC Manager and you enter the command:

” ./sos –stretch-vsan –sc-domain <DOMAIN NAME> –sc-cluster <CLUSTER NAME> –sc-hosts <HOSTFQDN,HOSTDQND2,…> –witness-host-fqdn <WITNESS HOST FQDN> –witness-vsan-ip <WITNESS VSAN IP> –witness-vsan-cidr <WITNESS VSAN CIDR> –esxi-license-key <LICENSE KEY>”

just start monitor the task in SDDC Manager UI. You will see long list of tasks that are executed in order as in table below to stretch the cluster:

TASK NAME DETAILS
1 Validate Stretch Spec
– validating hosts for duplicate entries;
– getting ESXi info for hosts;
– checking if hosts exist in inventory;
– checkin Un-assigned esxi id’s;
– checkin Esxi compliant versions;
– checking if hosts are not Dirty, if partitions are erased;
– checking if hosts have ACTIVE status;
– checking if all hosts are from same Network Pool
Validate Host and vSAN Licenses– all clear
2
Acquire Lock for ESXi Host Addition
– all clear
3
Validate Storage Compatibility for Hosts

– checking if the hosts are having homogeneous storage types
4
Generate Internal Model for ESXi Host Addition

– checking VSAN network (MTU 9000), – checking VMOTION network (MTU 9000);
– saving vmknics configuration
5Assemble Fault Domain Spec
– checkin if Esxi hosts in both fault domain are visible in inventory
6
Validate Internal Model for ESXi Host Addition
– validating management VC SSO credentials;
– validating AZ2 hosts’s management IPs, credentials, Networking State and VMs;
– validate if sufficient number of vmnics are available in hosts;
– listing and checking all vmnics;
– validating if there are any VMs in hosts;
7
Allocate ESXi Host IP Addresses
– building Esxi networkPool and network types: VMOTION and VSAN;
– fetching the networks associated with the network pool;
– fetching the IPs from the network pool for the network type: VMOTION and VSAN for all the Esxi hosts;
– allocating IPs for all hosts
8Validate Vmotion network connectivity – setting VDS with PortGroup.TransportType VMOTION;
– creating vmknic on vSwitch vSwitch0 and vmotion portgroup with IP address
9 Validate vSAN network connectivity – creating vmknic on vSwitch vSwitch0 and VSAN portgroup with IP address
10Stretch Cluster HA Validation– looking for gateways for all hosts (AZ1 and AZ2);
– updated isolationAddress0 in advanced HA settings;
– checking if Primary zone esxists
11Validate VRLI entity in inventory– validating if vRLI exists in inventory;
– validating API credentials for vRLI
12Validate primary zone host connectivity – validating if the primary zone hosts of existing cluster are reachable from SDDC Manager or not
13Validate secondary zone host connectivity– validating if the secondary zone hosts of existing cluster are reachable from SDDC Manager or not
14Validate interzone host connectivity – checking if all Esxi host (AZ1 and AZ2) can ping eachother
15Validate primary and secondary zone host conflicts– validating if secondary zone hosts are conflicting with primary zone hosts (primary and secondary zone hosts must be unique)
16Check if vSAN Cluster is Network Partitioned– checking cluster- how many Esxi hosts are already added to cluster;
getting the vSAN system details for the hosts
17Check if secondary zone has minimum required hosts– if primary zone has 4 Esxi hosts, secondary zone has to have same number of hosts
18Validate witness registration with vCenter– validating if witness host is registered in VCenter
19Validate Witness Host not in Cluster to be Stretched – validating if witness host is already present in cluster
– validating if witness host is not part of cluster
20Validate witness host storage options– validating if Witness host has required storage options (if ssd and non-ssd disks exists);
– claiming vSAN storage SSD disk Local VMware Disk (mpx.vmhba0:C0:T2:L0) and NonSSD disk Local VMware Disk (mpx.vmhba0:C0:T1:L0) for witness
21Update ESXi Host Data in Inventory – updating Esxi hosts details: domainId, clusterId, vcenterId, networkPoolId, privateIpAddress , vsanIpAddress, vmotionIpAddress, status (“ACTIVATING”), hostAttributes (i.e: {“vendor”:”Dell Inc.”,”model”:”VxRail E460F })…..
22Register Current Task – registering taskId
23Prepare ESXi Host– establishing SSH session to hosts: deleting VSAN UUID marker file to host, preparing hosts as a non-primary hosts with vsanUuid, starting to execute command [ cd /tmp/; python /tmp/removevd.py 2>&1, Starting to execute command [ cd /tmp/; python /tmp/createvd.py 2>&1 (output: vmware-esx-storcli VIB not installed), starting to execute command [ cd /tmp/; python /tmp/capacityflash.py 2>&1 (output: All flash disks:), Starting to execute command [ esxcfg-advcfg -s 100000 /LSOM/diskIoTimeout && esxcfg-advcfg -s 4 /LSOM/diskIoRetryFactor ] (output: Value of diskIoTimeout is 100000)
24Get vSphere Cluster MOID– retrieve and update the vSphere Cluster MOID for Add ESXi Host
(to find Managed Object ID for cluster, hosts, vms etc. go to https://VCSAFQDN/mob)
25Create vCenter and Platform Services Controller Host Group– creatin default host group ( if you want to create it manually please follow VMware doc )
26Create Primary Availability Zone Host Group – creating host group ‘clustername_primary-az-hostgroup’ with cluster in vCenter
27Create Primary Availability Zone VM Group– creating VM group ‘clustername-primary-az-vmgroup’ with vm list with cluster in vCenter
28Create Primary Availability Zone VM Host Rule– creating VM/Host rule ‘clustername_VMs’ with vms that should be on Primary Site
29Add ESXi Hosts to Data Center– retrieving ESXi license;
– adding host to DC
30Apply License(s) to ESXi Host in vCenter Server – licenseKey for entityId host-84 (name: mec10mgt003.nx5dpc.next) is found, currently Applied license does not match the required license, Applying new license to hosts – checking if currently applied license match the required license. If not new license is applied to Esxi hosts.
31Enter Maintenance Mode on ESXi Hosts– all clear
32Add ESXi Hosts to vSphere Distributed Switch— adding host with vmnic vmnic1 to DV switch, connect vmnic1 to UplinkPorgroup
33Create vMotion vmknic(s) on ESXi Host– creating vmknic VMOTION, connecting to DVPortgroupkey on DVS;
– assigning netstack instance key as vmotion
34Create vSAN vmknic(s) on ESXi Host– creating vmknic VSAN;
– assigning default gateway for vmknic VSAN;
– creating vsan vmknic on host and attached to DvSwitch
35Migrate ESXi Host Management vmknic(s) to vSphere Distributed Switch– migrating vmknic vmk0 to DvSwitch
36Detach vmknic(s) from vSphere Standard Switch– detaching vmnics [vmnic0] from vSwitch0
37Attach vmknic(s) to vSphere Distributed Switch – listing all vmnics;
– checking physical nic vmnic0 (if is available- connected and not in-use), speed: 10000MB);
– checking rest of physical nic(vmnics);
– attaching vmnic1 to DvSwitch;
– attaching vmnic vmnic0 to DVS
38Remove vSphere Standard Switches from ESXi Hosts– removing standard switch vSwitch0 on hosts
39Add ESXi Hosts to vSphere Cluster– getting thumbprint for Esxi hosts; – adding hosts to cluster
40Configure Static Routes on ESXi Hosts– all clear
41Configure Power Management Policy on ESXi Host – checking current power management policy on host: if DYNAMIC – modifying to static
42Create vSAN Disk Groups – all clear
43Clear Alarms on ESXi Hosts – clearing alarms on Esxi hosts from gray/red to green
44Enable Log Collection for vSphere– configuring vCenter hosts to LogInsight
45Resolve ESXi Host Preparation Issues for NSX – validating NSX manager is up and running
46Enable Reconfigure VM task for NSX Controllers– exceuting Enable ReconfigureVM Method for NSX Controllers.
This step has to be done before task ‘Update and Re-apply vSAN Storage Policy’.
Few years ago William Lam wrote post on how to use the vCenter MOB to enable vim.VirtualMachine.reconfigure
47Enable Reconfigure VM task for NSX Edges
– as above: executing Enable ReconfigureVM Method for NSX Edges
This step has to be done before task ‘Update and Re-apply vSAN Storage Policy’.
48Exit Maintenance Mode on ESXi Hosts – all clear
49Configure HA Admission Control– configuring: CPU 50% and Memory 50%
50Configure HA Isolation Address– checking and updating isolationAddress0
51Configure HA Isolation Response– initianing of ConfigureHaIsolationResponseAction;
– configuring of ConfigspecEx {“dasConfig”:{“defaultVmSettings”:{“isolationResponse”:”powerOff”}
52Re-Acquire the Lock and Prepare the ESXi Host(s) for stretch in Inventory– updating ESXI hosts status to ACTIVATING in Logical Inventory
53Configure Witness Static Routes on ESXi Hosts– all clear
54Configure vSAN Fault Domains and Stretch Cluster– configuring fault domains in cluster;
– claiming vSAN storage SSD disk Local VMware Disk (mpx.vmhba0:C0:T2:L0) and NonSSD disk Local VMware Disk (mpx.vmhba0:C0:T1:L0) for witness
55Update and Re-apply vSAN Storage Policy – checking if Policy with name vSAN Default Storage Policy exists;
– adding property replicaPreference with value RAID-1 (Mirroring) – Performance; – adding new property ‘checksumDisabled’ with value ‘false’ in storage policy vSAN Default Storage Policy;
– updating property hostFailuresToTolerate with value 1;
– adding property subFailuresToTolerate with value 1;
– adding property locality with value None;
– adding property iopsLimit with value 0;
– reapplying the storage policy vSAN Default Storage Policy
56Configure and Enable vSAN Health and Performance Service – enabling vSAN performance service for the cluster
57Disable Reconfigure VM task for NSX Controllers – execute Disable ReconfigureVM Method for NSX Controllers
58Disable Reconfigure VM task for NSX Edges– Execute Disable ReconfigureVM Method for NSX Edges
(to do that manually just go to https://VCSAFQDN/api/4.0/edges
59Update ESXi & Cluster status in Inventory – updating Cluster status and Updating status in logical inventory for Esxi hosts
60Release Lock for ESXi Host Addition– releasing deployment lock