Although we have so many
article over the internet for the datacenter failover and site resilency
thought to summarize all of them in short note what is need on failover period
instead reading 2 to 3 hours on getting the concept what we need.
Exchange 2013 Terminology
Primary Active Manager which runs inside the Microsoft Exchange Replication
Service used to notify and react in case of server failure. The PAM owns the cluster quorum resource and holds
the information about active, passive and mounted databases.
Standby Active Manager provides information of the server hosting the active copy of a mailbox
database to the Client Access or Transport services.
Datacenter
Activation Coordination uses a protocol called Datacenter Activation
Coordination Protocol (DACP) to avoid split brain .When a DAG is running in DAC
mode, When the server reboots, the Active Manager starts up the bit as 0
(Database Dismount state). It communicates with other members in the DAG when
it responds the bit set to 1 and allowed to mount database
Quorum Details
Odd number of nodes ---> Node Majority
Even number of nodes (but not a multi-site cluster)
---> Node and Disk Majority
Even number of nodes, multi-site cluster ---> Node and File Share Majority
Even number of nodes, no shared storage ---> Node and File Share Majority
Continous replication uses initial File Mode to replicate 1 MB of file to
the passive database. When File Mode completes it moves to Block Mode for imediate
updates
Port 3343 is used Nodes for listening incoming
connections from other nodes of the DAG Members
I believe it more enough to
know the definition let us move pratically what we do in our Exchange infra. It’s
always good to have documentaion of the below component information which will
helps in case if our servers are in disaster.
Verification of Exchange
2013 DAG Components:
- To move PAM on different DAG Member
Get-Mailboxserver
| FL Name, AutoDatabaseMountDial
BestAvailability (default) - Copy queue length of ≤12 Logs count
GoodAvailability - Copy queue length ≤6
Logs count.
Lossless - Copy queue length Zero Log Count
Get-DatabaseAvailablityGroup
–Identity | FL Name, DataCenterActivationMode l
Get-Counter
-ComputerName <> -Counter “\MSExchange Replication(*)\Continuous
replication - block mode Active”
Get-MailboxDatabaseCopyStatus
-Server -ConnectionStatus | FL Name, Incominglogcopyingnetwork,
Seedingnetwork
Get-DatabaseAvailabilityGroup
| FL Name, ManualDagNetworkConfiguration
Get-ExchangeServer
–Identity -Status | FL
Exchange 2013 Datacenter SwitchOver
When the primary site fails
due to disaster on the odd nodes due to power Outage or server failure follow
the below steps
1. Verify
the Started Server and Stopped servers in the DAG
Get-DatabaseAvailabilityGroup
-Status | FL Name,
*Servers
2.
Use
the Stop-DatabaseAvailabilityGroup to
mark the primary site DAG members are in failed state.
Stop-DatabaseAvailabilityGroup
–Identity -ActiveDirectorySite PrimarySite
3.
Verify
the Started Server and Stopped servers in the DAG
Get-DatabaseAvailabilityGroup
-Status | FL Name,
*Servers
4.
Stop
the cluster service in all the passive node of the secondary site
Stop-service
clussvc
5.
Use
the Restore-DatabaseAvailablityGroup to remove the stoppedmailbox server from
the DAG and re-establish the quorum using the alternate Witness server
Restore-DatabaseAvailabilityGroup
-Activedirectorysite DR
6.
When the service or power is restored in the Primary
site is up run Start-DatabaseAvailabilityGroup to revert the datacenter
Start-DatabaseAvailabilityGroup
-ActiveDirectorySite ProductionSite
7.
Check
out the Quorum model
Get-ClusterQuorum
| fl
8.
Still if it’s show the older quorum model execute the below powershell
cmdlet
DatabaseAvailabilityGroup -Identity DAG01
No comments:
Post a Comment