In this post, I’ll provide a background on DNS round robin to load balance Exchange 2013 or Exchange 2016 CAS services.
Once done, I’ll demonstrate how this can be set up and then we’ll do a quick test.
DNS Round Robin is where you have multiple A records for the same host name but different IPs for each record. For example:
mail.litwareinc.com IN A 10.2.0.21
mail.litwareinc.com IN A 10.2.0.22
When a client attempts to resolve the name mail.litwareinc.com, it will receive a response from the DNS server that includes two A records. Some clients can select an IP to connect to and then connect to the next IP in the list if that IP is not providing the expected service. Internet Explorer and some other Web Browsers can do this, in addition to Outlook.
DNS Round Robin is fully supported as a CAS service load balancing method for Exchange 2013 and Exchange 2016. For Exchange 2010 it is supported for some CAS protocols.
Session affinity in Exchange 2013 and 2016
Round Robin DNS is quite basic. It does not maintain session affinity which means that the client may not continue to connect to the same CAS server for the duration of the session. For Exchange 2010, this was a problem because of the processing that was done on the CAS server. In Exchange 2013 and 2016, the CAS services no longer require session affinity and basically act as a proxy to proxy the traffic through to the mailbox server where the mailbox database for that mailbox is active.
Advantages of DNS Round Robin for CAS load balancing
The advantages of using DNS round robin over a hardware load balancer are below:
- Cost, well actually it’s free.
- Quick and easy to set up as we’ll see later.
- Simple – you no longer need to know anything about load balancing solutions such as Kemp, Loadbalancer.org or F5. In fact, this is how Microsoft load balance connections to their Office 365 datacenter pairs.
Disadvantages of DNS Round Robin for CAS load balancing
Right, this is where we should pay attention as there are significant disadvantages:
- No health monitoring - DNS Round Robin is done entirely at the DNS level which is separate from Exchange. For this reason, an Exchange server failure will not stop that IP being passed on to clients for them to connect to until an administrator removes the A record from DNS. The failover happens at the client level as when it fails to connect to an IP, it’ll connect to the next IP.
- No load monitoring - for the same reason as above, DNS is unaware if one of your Exchange servers has an extremely high load or other issue causing a performance impact on the server.
- No ‘weighting’ - with DNS round robin, you cannot specify that 70% of connections are handled by one server with more compute resources whereas the other server handles only the remaining 30%. DNS round robin gives equal weight to each server. For example, if you have two servers, they will be load balanced 50:50 and this cannot be changed.
- No active/passive load balancing - for the same reason as above, you cannot have an active/passive setup. Each server has handles the same load.
- No reporting or logging - some load balancers provide failover reporting and almost all provide logging. This can be helpful if you repeatedly have failovers and you’d like to troubleshoot in more detail.
- Stopping a server is not instant - if you find that an Exchange server is still accepting client connections but has a problem and you need to remove it from the load balancer, you need to remove the A record associated with the server. The time this takes will depend on the Time To Live for the A record and it certainly won’t be instant as when you force stop a server in a hardware load balancer.
How to set up CAS DNS Round Robin Load Balancing
In this demo, I have two Exchange 2016 servers which are configured in a DAG. This provides high availability for the mailbox role. Instructions on how to set up an Exchange 2016 DAG can be found here. The servers and their IPs are below:
- LITEX01: 10.2.0.21
- LITEX02: 10.2.0.22
Users will connect to OWA, ActiveSync, Outlook Anywhere and all other services using the name mail.litwareinc.com. For autodiscover they will connect to autodiscover.litwareinc.com.
First we need to configure the virtual directories on both our Exchange 2016 servers to use these names. For instructions on how to do this, see here. This ensures that any autodiscover response directs clients to one of the hostnames that we’ll configure DNS round robin for and that no clients will connect to the server hostname itself and therefore not be load balanced.
Our next step is to create the required A records:
mail.litwareinc.com IN A 10.2.0.21
mail.litwareinc.com IN A 10.2.0.22
autodiscover.litwareinc.com IN A 10.2.0.21
autodiscover.litwareinc.com IN A 10.2.0.22
Below you can see a screenshot of how these records look on one of my domain controllers:
We can confirm that DNS round robin is in fact working by doing an nslookup for these names from one of our client machines. We’ll run the commands below:
Above we can see that the DNS answer includes both IPs for each response. If you look carefully, you’ll see that the same IP is not provided for each record - it’s random.
That’s it for internal DNS round robin load balancing setup.
For the external connections, create two NAT rules for your Exchange 2016 servers and open 443 from the internet to each server. Each server needs its own public IP.
Next, create two A records in your public DNS zone which resolve to your public IPs. Below are the DNS records that are created on my isolated ‘virtual’ internet:
mail.litwareinc.com IN A 18.104.22.168
mail.litwareinc.com IN A 22.214.171.124
autodiscover.litwareinc.com IN A 126.96.36.199
autodiscover.litwareinc.com IN A 188.8.131.52
I’ve put a computer on the virtual network and repeated the nslookup commands:
Testing CAS load balancing using DNS round robin
To do a quick test, I’m using Outlook 2016 in cached mode and have used autodiscover to create the Outlook profile.
If we display the Connection Status window, we can confirm that we are indeed connecting to the correct hostname - mail.litwareinc.com:
This however does not tell us which CAS server we are connected to. To do this, I’m using Resource Monitor to show the TCP connections for the Outlook.exe process.
Above we can see that Outlook is connected to 10.2.0.22 which is LITEX02. To simulate a failure of LITEX02, I’ve force powered off the virtual machine which would be the equivalent to a sudden total failure of the server.
Above we can see that Outlook connections to LITEX02, (10.2.0.22) are failing and showing up as greyed out.
Above shows more connections failing but now we’re starting to see new connections to 10.2.0.21 which is LITEX01.
Now we can see that all connections to LITEX02 have ended and been replaced with connections to LITEX01 and our failover is complete. From the point of view of Outlook, nothing was noticed. Outlook was responsive throughout the failover and in fact did not disconnect.
In this post we’ve gone through a bit of the background on DNS load balancing in Exchange 2016 and I’ve demonstrated how to set up DNS round robin to load balance your CAS services.
In the next post I’ll do further demonstrations with Outlook in online mode and we’ll also test out OWA and simulated partial CAS failures (e.g. single virtual directory failures).