High Availability Support for ExtremeCloud IQ Site Engine with vSphere Version 6.5

Introduction

While ExtremeCloud IQ Site Engine provides you with visibility into your entire network, there are circumstances that may result in ExtremeCloud IQ Site Engine becoming impaired or unresponsive. These include:

The ExtremeCloud IQ Site Engine server becomes unresponsive or a virtual machine becomes isolated from the management network.
Applications become unresponsive.

ExtremeCloud IQ Site Engine's failover solution, the VMware vSphere High Availability (HA) feature, provides a recovery plan in the event of a host power or hardware failure (such as a hard disk, memory, or CPU failure), an application failure, or if a virtual machine losses connectivity with the management network. With vSphere HA, you can monitor your ExtremeCloud IQ Site Engine servers, services, and applications. In addition, VMs can be reinitiated in the event of server and service failures without manual intervention.

vSphere HA does not require user intervention or reliance on database backups, which can be hours or days old. It does not require a secondary ExtremeCloud IQ Site Engine server to be separately maintained and kept up-to-date, and it does not rely on administrators or other networks that monitor for either the detection or recovery of host failures.

vSphere HA falls under the Active/Cold-Standby category. vSphere HA enables you to pool physical servers into a logical group called an HA cluster. In vSphere HA, one instance of ExtremeCloud IQ Site Engine is deployed in the cluster on a host. In the event of a failure to the host, this instance is migrated and restarted across the available active ESXi hosts, depending on the type of failure. The ExtremeCloud IQ Site Engine user interface typically takes about 10 minutes to restart.

	NOTES:	vSphere HA is supported by VMware versions 6.0 and 6.5 only. vSphere HA is not extended to the Hyper-V platform.

Protecting Against Server Failure and Virtual Machine Isolation

ExtremeCloud IQ Site Engine's servers are organized into clusters within the vCenter Server. The cluster servers running the vCenter Server software are called hosts. The vCenter Server monitors the health of hosts in a cluster by sending heartbeat requests to each host. If a host does not respond within a configured interval, the vCenter Server identifies that the host that failed and restarts the VM on a new host within the cluster. After the VM starts on another host, normal ExtremeCloud IQ Site Engine server operation resumes.

vSphere HA also protects virtual machines against network isolation by restarting the VMs if the host becomes isolated on the management network, or loses connectivity on any of the management or data storage interfaces. Additionally, with the ExtremeCloud IQ Site Engine vSphere HA solution the virtual machine resets on the same host in the event that certain mission-critical services (such as database and server services) have stopped and not restarted under normal automated operating and recovery procedures.

Protecting Against Application Failure

ExtremeCloud IQ Site Engine uses a Watchdog Service (a Java process) to start database and/or server processes and services in the engine. Should the database and/or server processes fail, the Watchdog Service monitors their status and restarts any that fail without the need for user intervention. Application monitoring resets the ExtremeCloud IQ Site Engine virtual machines based on criteria defined in the vSphere HA > VM Monitoring options for the cluster. Modify the options for resetting during an application monitoring failure event to meet the needs of your network.

If the services fail to recover after multiple restarts, a VM restart might be the recommended solution. This is not a function of the ExtremeCloud IQ Site Engine Watchdog Service. vSphere HA includes an Application Monitoring SDK that performs VM resets if one or more heartbeats are missed.

The Watchdog Service sends an enable request to vSphere HA start the application monitoring, followed by a heartbeat signal. After the application monitoring is enabled and the Watchdog Service is initialized, the application monitoring status changes from gray to green. The application monitoring program continues to send the heartbeat and the status of the application monitoring remains green.

The vSphere infrastructure sends the signal from the HA application monitoring program to the VM, and then to the ESXi host. If a monitored service or process fails to recover even after three service restarts by the Watchdog Service, the application monitoring program stops sending the heartbeat. The status of application monitoring turns red if the heartbeat is not received in the first 30 seconds. After another 30 seconds, if the vSphere HA fails to receive another heartbeat, it activates the VM to reset.

This solution is a fully automated recovery process that provides protection against application failure by continuously monitoring VMs and resetting if failure is detected.

	NOTES:	vSphere HA application monitoring is limited to the Linux platform. vSphere HA application monitoring currently monitors Wildfly service and the MySQL DB service.

You can configure ExtremeCloud IQ Site Engine so the watchdog service so the Watchdog Service starts monitoring applications as part of the ExtremeCloud IQ Site Engine application initialization. When started, it continues to monitor and remedy the application service health.

	NOTE:	If the watchdog service stops unexpectedly, application monitoring stops.

Logs for the Watchdog Service and application monitoring functionality can be found in the following locations:

<installation directory>/appdata/logs/watchdog.out - Contains logs of starting/stopping watchdog process.
<installation directory>/appdata/logs/watchdog.log - Contains runtime watchdog logs.
<installation directory>/appdata/logs/appmonitor.out - Contains logs of starting/stopping app monitor.
<installation directory>/appdata/logs/appmonitor.log - Contains runtime app monitor logs.

By default appmonitor.log and watchdog.log contain only informational logs. To get debug level logs, change the log4j configuration. To set the log level from info to debug for these logs:

Stop the Watchdog Service by entering service nswatchdog stop in the ExtremeCloud IQ Site Engine engine command line.
Navigate to the <installation directory>/services/ directory. The properties files, watchdog.log4j.properties and appmonitor.log4j.properties contain the log configuration for watchdog service and Application monitor program respectively.
Open the watchdog.log4j.properties file and change info to debug in the following lines:
- log4j.category.com.enterasys.netsight.watchdog=info
- log4j.category.com.enterasys.netsight.watchdog.NetSightProcessController=info
- log4j.category.com.enterasys.netsight.watchdog.NetSightDbProcessController=info
Open the appmonitor.log4j.properties file and change info to debug in the following lines:
- log4j.category.com.enterasys.netsight.watchdog=info
- log4j.category.com.enterasys.netsight.watchdog.VMGuestAppMonitor=info
Start the Watchdog Service by entering service nswatchdog start in the ExtremeCloud IQ Site Engine engine command line.

The vSphere HA feature also monitors the health of guest operating systems and their applications, and restarts the VM if a guest OS or application failure occurs. The vSphere HA feature monitors guest OS and application heartbeats via the VMware Tools process.

Hardware Configuration

The following diagram shows a typical hardware configuration for the ExtremeCloud IQ Site Engine Failover solution using vSphere's High Availability (HA) feature.

Hardware Configuration

The vCenter Server is configured to manage two or more ESXi hosts in a cluster. In this configuration, there are two hosts with VMware's ESXi servers installed on them: Host A and Host B. Host A and Host B provide their servers redundancy by backing each other up in case of an outage or failure. Both hosts utilize the same data storage. VMDK (virtual machine disk) files must be stored on shared storage that is accessible to all servers. Also, the VMDK should be housed on a separate network to avoid network outage. vMotion networks are required within the vCenter Server clusters, and should use the same private storage network for quicker and seamless VM migrations.

	NOTE:	The ExtremeCloud IQ Site Engine Failover solution requires that vCenter Server is used to manage both hosts, and that the hosts are licensed for vSphere HA. vCenter, the ESXi hosts, and the ExtremeCloud IQ Site Engine VM must be in sync with the enterprise NTP server.

vSphere HA monitors ESXi host availability and to restart failed VMs. Each host in the HA cluster communicates via a heartbeat, which indicates whether it is running as expected. If there’s a failure to detect a heartbeat from any host within the cluster, vSphere HA is instructed to take corrective action. If the VM fails, vSphere HA attempts to restart the VM on the same host. If the host fails, the VMs from that host are migrated to other hosts in the cluster.

The datastore is the persistent storage for the virtual machine. It is where the ExtremeCloud IQ Site Engine server software is installed, the configuration information is kept, and where the database tables are stored. The virtual engine code is executed on the host. When a host fails, the virtual engine execution moves to the other host and all of the data is maintained in the datastore. This allows the ExtremeCloud IQ Site Engine server to continue functioning without having to restore a database backup.

	NOTES:	A Distributed Resource Scheduler (DRS) is often used with vSphere HA to redistribute workloads on alternate physical servers. A DRS is not required for vSphere HA.

Enabling vSphere HA

When the licensed hosts are included in a cluster, you can enable vSphere HA using the cluster's settings in the New Cluster window. Refer to your VMware documentation for more information on configuring clusters.

Log in to your vSphere Client.
Select Hosts and Clusters in the Inventories section.
Select the datacenter in the left-panel on which you are adding the cluster.
Right-click the data center and select New Cluster.

The New Cluster window displays.
Configure the fields in the New Cluster window.
Select vSphere HA in the left-panel and select the box to enable.
Select OK to create an empty cluster.
Expand the datacenter in the left-panel so the cluster displays.
Right-click the cluster and select Move Hosts into Cluster.

The Move Hosts into Cluster window displays.
Select the appropriate hosts to add to the cluster and select OK.
Expand the cluster in the left-panel.
Right-click a host from the left-panel and select Add Networking.

The Add Networking wizard displays.
Select VMkernal Network Adapter and select Next.
Select Select an existing standard switch radio button and select Browse.

The Select Switch window displays.
Select an unused existing vmnic from the list, select OK, and select Next.
Enter Storage for Network label and select the vMotion traffic checkbox, then select Next.
Select Use static IPv4 settings, enter the IP address for the storage server for the host being configured in the IPv4 address field, enter 255.255.255.0 in the Subnet mask field and select Finish.
Repeat for all hosts in the cluster.
Select a host in the cluster in the left-panel and select the Manage tab.
Select Storage, Storage Adapters, and select the Add icon.

The adapter appears in the iSCSI Software Adapter section of the table.
Select the new adapter.
Select the Targets tab.
Select Dynamic Discovery and select Add.

The Add Send Target Server window displays.
Enter the IP address of the storage server or full domain name in the iSCSI Server field and the port in the Port field.
Select OK.
Select the Network Port Binding tab and select the Add icon.

The Bind with VMkernal Adapter window displays.
Select the Storage created in Step 16.
Select the cluster you created in Step 6 in the left-panel.
Select the Manage tab in the right-panel and select Settings.
Select vSphere HA in the Services menu and select Edit.

The Edit Cluster Settings window displays.
Select Failures and Responses in the left-panel.
Select the Enable Host Monitoring checkbox.
In the Host Failure Response drop-down list, select the action vSphere HA initiates if a host fails.
In the Response for Host Isolation drop-down list, select the action vSphere HA initiates if a host becomes isolated from the management network.

	NOTE:	Selecting Leave Powered on may cause impaired performance in ExtremeCloud IQ Site Engine or cause ExtremeCloud IQ Site Engine to become unresponsive.

In the VM Monitoring drop-down list, select VM Monitoring Only or VM and Application Monitoring:
- VM Monitoring Only — Virtual machines restart if vSphere does not receive the heartbeat within a certain amount of time.
- VM and Application Monitoring — Enables ExtremeCloud IQ Site Engine monitoring via the Watchdog Service.
This is optional. In the Datastore with PDL and Datastore with APD drop-down lists, select a datastore to monitor hosts and virtual machines when the management network fails.
Select vSphere DRS in the left-panel of the window.
Select the Turn ON vSphere DRS checkbox.
Use the default values in the fields on this screen and select OK.
Additionally, you can create network redundancy on the host management network.
1. Select the host, open the Configure tab, and select Virtual switches in the left-panel.
2. In the Switch table in the right-panel, select the vSwitch on which you are configuring teaming and select the Edit icon.
  
  The Edit Settings window displays.
3. Select the appropriate teaming settings for the vSwitch.
4. Select OK to save.

ExtremeCloud IQ Site Engine Upgrade Process

To upgrade an ExtremeCloud IQ Site Engine server on which VMware HA is currently configured, use the following instructions:

Log into your vSphere Client.
Select Hosts and Clusters.
Select the cluster on which HA is enabled.
Select the Configure tab in the right-panel and select VM Overrides.

Select the virtual machine for ExtremeCloud IQ Site Engine in the table and select Edit.

The Edit VM Overrides window displays.
Select Disabled in the VM Monitoring drop-down list.
Select OK.
Upgrade ExtremeCloud IQ Site Engine.
Log into your vSphere Client.
Select Hosts and Clusters.
Select the cluster on which HA is enabled.

Related Information

For information on related help topics:

High Availability Support for ExtremeCloud IQ Site Engine with vSphere Version 6.0

Select the Configure tab in the right-panel and select VM Overrides.
Select the virtual machine for ExtremeCloud IQ Site Engine in the table and select Edit.

The Edit VM Overrides window displays.
Select Enabled in the VM Monitoring drop-down list.
Select OK.