Saturday, April 7, 2012

High Availability MBX in Exchange 2007

High Availability
Applies to: Exchange Server 2007 SP3, Exchange Server 2007 SP2, Exchange Server 2007 SP1, Exchange Server 2007
This High Availability content area includes topics that you can use to design, build, and operate a highly available messaging system based on the release to manufacturing (RTM) version of Microsoft Exchange Server 2007 and Exchange 2007 Service Pack 1 (SP1). The documentation in this area includes:
We recommend that you review the applicable documentation prior to designing or deploying a highly available messaging solution based on Exchange 2007 SP1.
The documentation in this area has been updated to include the latest recommendations and best practices for deploying Exchange 2007 SP1 on Windows Server 2008 and Windows Server 2003 Service Pack 2 (SP2).

While minimum uptime requirements vary among organizations, every organization would like to achieve a high level of uptime. Organizations for which messaging is business-critical often choose to design a highly available messaging system to provide this uptime.
Exchange 2007 RTM and Exchange 2007 SP1 include the following built-in features that can provide quick recovery, high availability, and site resilience for Exchange 2007 Mailbox servers:
  • Local continuous replication (LCR)   LCR is a single-server solution that uses built-in asynchronous log shipping technology to create and maintain a copy of a storage group on a second set of disks that are connected to the same server as the production storage group. LCR provides log shipping, log replay, and a quick manual switch to a secondary copy of the data.
  • Cluster continuous replication (CCR)   CCR, which is a non-shared storage failover cluster solution, is one of two types of clustered mailbox server (CMS) deployments available in Exchange 2007. CCR is a clustered solution (referred to as a CCR environment) that uses built-in asynchronous log shipping technology to create and maintain a copy of each storage group on a second server in a failover cluster. CCR is designed to be either a one or two data center solution, providing both high availability and site resilience. CCR is very different from clustering in previous versions of Exchange Server. For details about some of the differences, see Cluster Continuous Replication Resource Model and Cluster Continuous Replication Recovery Behavior.
  • Standby continuous replication (SCR)   SCR is a new feature introduced in Exchange 2007 SP1. As its name implies, SCR is designed for scenarios that use or enable the use of standby recovery servers. SCR extends the existing continuous replication features and enables new data availability scenarios for Exchange 2007 Mailbox servers. SCR uses the same log shipping and replay technology used by LCR and CCR to provide added deployment options and configurations by providing the administrator with the ability to create additional storage group copies. SCR can be used to replicate data from stand-alone Mailbox servers and from clustered mailbox servers.
  • Single copy clusters (SCC)   SCC, which is a shared storage failover cluster solution, is the other of two types of clustered mailbox server deployments available in Exchange 2007. SCC is a clustered solution that uses a single copy of a storage group on storage that is shared between the nodes in the cluster. SCC is somewhat similar to clustering in previous versions of Exchange Server; however, along with numerous improvements, there are also some significant changes. For details about some of those changes, see Single Copy Cluster Resource Model and Single Copy Cluster Recovery Behavior.
For details about other high availability features and functionality introduced in SP1, see New High Availability Features in Exchange 2007 SP1.

High availability for Mailbox servers comes in two forms: service availability and data availability. Service availability is provided through the use of a Windows Server failover cluster. Data availability is provided through a built-in feature called continuous replication.

Both CCR and SCC are solutions that are deployed in a Windows Server failover cluster. Only the Mailbox server role can be installed in a failover cluster. No other roles can be installed in a failover cluster. A Mailbox server that is deployed in a failover cluster is referred to as a clustered mailbox server. Clustered mailbox servers running in a CCR environment are very different from clustered mailbox servers running in an SCC environment. Furthermore, clustered mailbox servers in Exchange 2007 RTM and Exchange 2007 SP1 are very different from clustered mailbox servers in previous versions of Microsoft Exchange.
You can use Get-MailboxServer <CMSName> | fl Name, ClusteredStorageType in the Exchange Management Shell to determine if a clustered mailbox server is hosted in a CCR environment or in an SCC. A value of NonShared indicates that the clustered mailbox server is in a CCR environment, and a value of Shared indicates that the clustered mailbox server is in an SCC. A value of Disabled indicates that the Mailbox server is a stand-alone server.
You can also check Active Directory to determine if a clustered mailbox server is hosted in a CCR environment or in an SCC by examining the value for the msExchClusterStorageType attribute of the Mailbox server object. A value of 1 for the msExchClusterStorageType attribute indicates that the clustered mailbox server is hosted in a CCR environment, and a value of 2 indicates that the clustered mailbox server is in an SCC. A value of <Not Set> indicates that the Mailbox server is a stand-alone server.

Exchange 2007 RTM and Exchange 2007 SP1 support a maximum of two nodes that have the Mailbox server role installed (one active and one passive) in a CCR environment. A three-node failover cluster that uses a voter node and a traditional Majority Node Set quorum is also supported, but it is not the preferred cluster model. However, we recommend that most customers deploy CCR environments that use only two nodes, and either a Node and File Share Majority quorum (Windows Server 2008) or a Majority Node Set with File Share Witness quorum (Windows Server 2003). Thus, the documentation about CCR is oriented toward two-node failover clusters that use one of these quorum models.

Note:
A single node failover cluster deployed in a CCR environment is also supported, but it is not considered to be a high availability solution because no redundancy exists in the cluster. When using a single node failover cluster deployed in a CCR environment, you should use a Majority Node Set quorum (traditional, without a file share witness).


Exchange 2007 RTM and Exchange 2007 SP1 support a maximum of eight nodes in an SCC. Valid combinations of Exchange 2007 SP1 SCCs on Windows Server failover clusters include:
  • 7 Active / 1 Passive
  • 6 Active / 1 or 2 Passive
  • 5 Active / 1, 2, or 3 Passive
  • 4 Active / 1, 2, 3, or 4 Passive
  • 3 Active / 1, 2, 3, 4, or 5 Passive
  • 2 Active / 1, 2, 3, 4, 5, or 6 Passive
  • 1 Active / 0, 1, 2, 3, 4, 5, 6, or 7 Passive

Note:
The 64-bit version of Windows Server 2008 supports up to 16 nodes in a single failover cluster; however, Exchange 2007 supports a maximum of 8 nodes in the cluster. The failover cluster can still contain up to 16 nodes, but Exchange 2007 should be installed on no more than 8 nodes in the failover cluster.

Typically, there is no need for more than one passive node in the cluster for each active node in the cluster. As a result, a configuration of one active node and one passive node is preferred over configurations with one active node and multiple passive nodes. When using a single node SCC, you can use either a shared storage quorum, or a Majority Node Set quorum (traditional, without a file share witness). Although single-node SCCs are supported, they are not considered to be a high availability solution because no redundancy exists within the cluster.

A stretch cluster, also known as a geographically dispersed cluster, is a failover cluster that is stretched (that is, it spans) more than one physical datacenter. Stretch clusters can be used as part of a site resilience design for your Exchange organization. Because CCR does not use shared storage, it can be easily deployed in a geographically dispersed failover cluster, including a multi-subnet stretch cluster on Windows Server 2008. SCC is also supported in a stretch cluster; however, stretching SCC requires third-party synchronous replication technology. For more information about stretch clusters, see Site Resilience Configurations.

Another type of cluster that is supported by Exchange 2007 and Exchange 2007 SP1 is called a standby cluster. A standby cluster is a Windows Server failover cluster that does not contain a clustered mailbox server, but can be quickly provisioned with a replacement clustered mailbox server in the event of a disaster, another failure of the production failover cluster, or some other recovery scenario.

Continuous replication, also known as log shipping, is the process of automating the replication of closed transaction log files from a production storage group to a copy of that storage group that is located on a second set of disks on the local computer or on another server altogether. After being copied to the second location, the log files are then replayed into the copy of the database, thereby keeping the storage groups synchronized with a slight time lag.
Continuous replication is available in two forms in Exchange 2007 RTM (LCR and CCR) and three forms in Exchange 2007 SP1 (LCR, CCR, and SCR).

High availability for the Hub Transport, Edge Transport, Client Access, and Unified Messaging server roles is achieved through a combination of server redundancy, Network Load Balancing (NLB), hardware load balancing, Domain Name System (DNS) round robin, as well as proactive server, service, and infrastructure management. In general, you can achieve high availability for the Client Access, Hub Transport, Edge Transport, and Unified Messaging server roles by using the following strategies and technologies:
  • Edge Transport   You can deploy multiple Edge Transport servers and use multiple DNS Mail Exchanger (MX) records to load balance activity across those servers. You can also use NLB to provide load balancing and high availability for Edge Transport servers.
  • Client Access   You can use NLB or a third-party hardware-based network load-balancing device for Client Access server high availability. For more information about NLB, see Windows Server TechCenter.
  • Hub Transport   You can deploy multiple Hub Transport servers for internal transport high availability. Resiliency has been designed into the Hub Transport server role in the following ways:
    • Hub Transport server to Hub Transport server (intra-org)   Hub Transport server to Hub Transport server communication inside an organization automatically load balances between available Hub Transport servers in the target Active Directory directory service site.
    • Mailbox server to Hub Transport server (intra-Active Directory site)   The Microsoft Exchange Mail Submission service on Mailbox servers automatically load balances between all available Hub Transport servers in the same Active Directory site.
    • Unified Messaging server to Hub Transport server   The Unified Messaging server automatically load balances connections between all available Hub Transport servers in the same Active Directory site.
    • Edge Transport server to Hub Transport server   The Edge Transport server automatically load balances inbound Simple Mail Transfer Protocol (SMTP) traffic to all Hub Transport servers in the Active Directory site to which the Edge Transport server is subscribed.
For additional redundancy (for example, applications that require an SMTP relay), you can create a new DNS record (for example, relay.company.com), assign an IP address, and use a hardware load balancer to redirect that IP address to multiple Hub Transport servers. In Exchange 2007 SP1, you can also use NLB for the client connectors on Hub Transport servers. When using a hardware load balancer, you need to confirm that no intra-org Exchange 2007 traffic will be crossing the hardware load balancer because intra-org traffic uses built-in load balancing algorithms (as previously described). For more information about load balancing and transport servers, see Deployment Options for Hub Transport Servers and Load Balancing and Fault Tolerance for Transport Servers.
  • Unified Messaging   Unified Messaging deployments can be made more resilient by deploying multiple Unified Messaging servers where two or more are in a single dial plan. The Voice over IP (VoIP) gateways supported by Unified Messaging can be configured to route calls to Unified Messaging servers in a round-robin fashion. In addition, these gateways can retrieve the list of servers for a dial plan from DNS. In either case, the VoIP gateways will present a call to a Unified Messaging server and if the call is not accepted, the call will be presented to another server, providing redundancy at the time the call is established.

The basic premise of the Exchange 2007 high availability architecture is to introduce redundancy into the deployment. A failure is recovered using the remaining computing resources to support the Exchange services. As the failures are repaired, computing resources are again available to Exchange and its clients. In this context, the computing resources may be computers or storage for mailbox or other Exchange data.
Redundancy can be introduced within a single datacenter. This approach is typically done to protect against individual server failure. For example, introducing a second Hub Transport server into your organization's primary datacenter enables mail flow to continue if one of the two servers fails.
Alternatively, or in addition, redundancy could be introduced into a secondary datacenter. Two datacenter configurations enable service continuity after a datacenter failure. If an additional Hub Transport server is introduced into a secondary datacenter, there is the opportunity to have the second Hub Transport server handle mail flow when the primary Hub Transport server experiences a failure, or when the production datacenter is unavailable. If three Hub Transport servers are deployed, two of them can be in the production datacenter and the third can be in the secondary datacenter.
The key deployment point is that redundancy can prevent outages that, without redundancy, result in a variety of failures. How the redundant computers and services are deployed determines the failures that can occur without affecting data or service availability. Organizations must understand their requirements and then look at the operational issues to understand what solution is best for them. For example, one organization may want to activate a backup data center only after a 20 minute failure of the production datacenter. In this case, the organization must have the necessary processes in place to regularly validate backup data center activation and operation. A different organization may decide that ongoing validation of the backup datacenter is critical for their success; thereby leading to a different deployment configuration for that organization.

No comments: