Tuesday, 29 December 2015

Exchange 2016 Database Availability Group Maintenance

Introduction


In this post, I’ll demonstrate how to do maintenance on a two node single site Exchange 2016 Database Availability Group.


For more information on Exchange 2016 Database Availability Groups, see here.




Lab Setup


In this lab, I have two Exchange 2016 servers in a DAG with mailbox databases replicated between them for high availability. The Exchange servers are:

  • LITEX01
  • LITEX02


We have four mailbox databases:
  • MDB01
  • MDB02
  • MDB03
  • MDB04

There’s a copy of each of these mailbox databases on LITEX01 and LITEX02.




Put a mailbox server into maintenance mode



We’ll start with putting LITEX01 into maintenance mode so we can install Exchange updates, Windows Updates, hardware maintenance etc.


In Exchange 2016, the mailbox servers include all the Exchange roles, CAS and MBX. If your Exchange 2016 server is providing CAS services for your clients, you should remove it from the load balanced array. How you do this will depend on how you have configured load balancing.


Also note that your incoming and outgoing external messages need to be routed through both servers so that when you put one into maintenance mode, this won’t stop external message delivery. This will depend on how you have your message routing configured.
Other than the CAS service, the server will be performing the below functions:


  • Message delivery
  • Unified Messaging (Call routing)
  • Cluster services (Primary Active Manager)
  • Mailbox service (either active or passive mailbox databases)

Message Delivery


The HubTransport component on LITEX01 needs to be drained. To do this, we put the HubTransport component into a draining state, restart the Transport Service then redirect messages that are pending delivery to LITEX02. Log into LITEX01 and run these commands from the Exchange Management Shell running as administrator:


Set-ServerComponentState LITEX01 -Component HubTransport -State Draining -Requester Maintenance

Restart-Service MSExchangeTransport
Redirect-Message -Server LITEX01 -Target LITEX02.litwareinc.com


Press y when prompted


image
The server should now not be involved in message transport. We can confirm this by checking that the HubTransport component on LITEX01 is draining:

Get-ServerComponentState LITEX01 -Component HubTransport


image


Unified Messaging


You may or may not be using the server for Unified Messaging but if you are, just run this command to prevent the server handling calls. Calls will be drained which means that ongoing calls will complete:


Set-ServerComponentState LITEX01 -Component UMCallRouter -State Draining -Requester Maintenance


image
Confirm that the UMCallRouter component is draining (maintenance mode):
Get-ServerComponentState LITEX01 -Component UMCallRouter
image


Cluster Services


If you’re wondering what the Primary Active Manager (PAM) is, well it’s the term given to the server that owns the quorum and reacts to server failures. Although a failure of the server that holds the PAM causes a failover to the Standby Active Manager, (SAM), it’s best to fail this over gracefully. To do this, we need to pause the cluster node, LITEX01. This not only moves the PAM from LITEX01 to LITEX02 but it prevents LITEX01 owning this role till the cluster node is resumed.


First, let’s confirm where our PAM is located:


Get-DatabaseAvailabilityGroup -Status | fl Name,PrimaryActiveManager


image
Here we see that it’s currently on LITEX01 which means we need to move it (yes, more work, excellent!).

Right, let’s move it to LITEX02 by running this command:


Move-ClusterGroup "Cluster Group" -Node LITEX02


image

We also need to prevent LITEX01 becoming the PAM by pausing the cluster node. You need to run this command from an elevated PowerShell window:

Suspend-ClusterNode LITEX01


image
We’ll just confirm this has in fact worked:
Get-ClusterNodeimage
Get-DatabaseAvailabilityGroup -Status | fl Name,PrimaryActiveManager

image
Ok, the PAM has been moved just fine and the cluster node LITEX01 is paused. We can move on to the next step.



Mailbox service


We need to move any active mailbox databases off LITEX01. They should fail over when we shut down the server or when the services stop but we’ll move them off manually which is the recommended approach.

Let’s just see what databases are mounted on LITEX01 before we start this step:


Get-MailboxDatabaseCopyStatus -Server LITEX01


image
Ok, we can see mailbox databases MDB01 and MDB02 are mounted on LITEX01. To move these to LITEX02, we use this command:

Get-MailboxDatabaseCopyStatus -Server LITEX01 | ? {$_.Status -eq "Mounted"} | % {Move-ActiveMailboxDatabase $_.DatabaseName -ActivateOnServer LITEX02 -Confirm:$false}
image
We can now confirm our databases have been moved to LITEX02:

Get-MailboxDatabaseCopyStatus -Server LITEX02


image

All our mailbox databases are mounted on LITEX02.

The next step is to prevent LITEX01 automatically mounting the databases in case of a problem with LITEX02. To do this, we set the DatabaseCopyAutoActivationPolicy property to blocked on LITEX01:


Set-MailboxServer LITEX01 -DatabaseCopyAutoActivationPolicy Blocked


image

We can confirm that this was done by running this command:

Get-MailboxServer LITEX01 | ft Name,DatabaseCopyAutoActivationPolicy


image

Our mailbox service on LITEX01 is now in maintenance mode.

We then put the server itself into maintenance mode:


Set-ServerComponentState LITEX01 -Component ServerWideOffline -State Inactive -Requester Maintenance


image


We can confirm that LITEX01 is now inactive by running the command below:

Get-ServerComponentState LITEX01 -Component ServerWideOffline


image

Congratulations! Your server is now in maintenance mode and we can now do the required work on it.



Take a mailbox server out of maintenance mode



When we’re done with our maintenance, we can take LITEX01 out of maintenance mode. We’ll reverse the changes we’ve made to put the server into maintenance mode.



Set the mailbox server as active


Set-ServerComponentState LITEX01 -Component ServerWideOffline -State Active -Requester Maintenance

image

Confirm this has worked:

Get-ServerComponentState LITEX01 -Component ServerWideOffline


image



Set the Unified Messaging component to active


Set-ServerComponentState LITEX01 -Component UMCallRouter -State Active -Requester Maintenance

image
Confirm this has worked:
Get-ServerComponentState LITEX01 -Component UMCallRouter

image


Resume the cluster node


Run this command from an PowerShell window with elevated permissions:
Resume-ClusterNode LITEX01

image

Confirm the node is now up in the cluster:
Get-ClusterNode

image


Set the mailbox server DatabaseCopyAutoActivationPolicy


Here we set the DatabaseCopyAutoActivationPolicy property to Unrestricted to allow LITEX01 to mount databases automatically if needed:

Set-MailboxServer LITEX01 -DatabaseCopyAutoActivationPolicy Unrestricted


image

We can confirm this has worked by running this command:
Get-MailboxServer LITEX01 | ft Name,DatabaseCopyAutoActivationPolicy

image


Set the HubTransport component to active


Set-ServerComponentState LITEX01 -Component HubTransport -State Active -Requester Maintenance
Restart-Service MSExchangeTransport

image


Confirm that the HubTransport component is active:

Get-ServerComponentState LITEX01 -Component HubTransport

image


Confirm that our server is not in maintenance mode


To confirm that our server is no longer in maintenance mode, we can run the command below to check that all required components are active:
Get-ServerComponentState LITEX02 | ft Component,State -AutoSize

image

Optional tasks


Optionally, you can re-balance your mailbox databases as after these steps, all mailbox databases are mounted on LITEX02. Instructions on how to do this are here.
You can now repeat the above tasks to do maintenance on LITEX02.


Conclusion


In this post, I’ve done a run-through of how you can perform maintenance on your DAG members without downtime. 

No comments:

Post a Comment