Wednesday, July 8, 2009

SAP NetWeaver Application Server High Availability

SAP High Availability and the UNIX Environment
How does SAP support High Availability solutions on Unix/Linux?
The High Availability setup on UNIX/Linux is supported by our partners and not by SAP directly.

Where can I find documentation and resources for SAP HA setups on Unix/Linux? SAP does not provide any specific cluster guides for the implementation of SAP clusters within Unix HA environments due to this task is handled by our hardware partners.

I want to set up my Dialog Instance on Windows and the SCS on Unix. Is this type of setup supported by SAP?
At the moment heterogeneous system landscapes are not supported by SAP officially. Preparations for a support of this kind of setup are currently evaluated.

SAP NetWeaver Application Server High Availability

SAP High Availability and the Windows Environment

How does SAP support High Availability in Windows environments?
SAP provides an out of the box setup for windows cluster environments using the Microsoft Cluster Service (MSCS). For all other windows cluster solutions please get in touch with the specific cluster software vendor.

What is a cluster resource group?
Cluster resources are software and hardware components that are managed by the cluster software. Several resources can be grouped into resource groups. These groups are collections of resources which can be managed by the cluster service as a single unit.

Why I cannot access the shared disks when there is owned by another cluster node?
The shared disk always belongs to the cluster node which owns the cluster group at the moment. However, you should never access the SAP installation by choosing the disk drive but by using the shared name, that means “/sapmnt”. This prevents the loss of the disk during sensitive operations like installations etc.

How many failover nodes are currently supported by MSCS/SAP?
The number of SAP cluster nodes is only limited by the maximum of cluster nodes supported by MSCS. For Windows Server 2003 the maximum is 8 cluster nodes which mean 7 failover nodes. When using the SAP Replication Server with a cluster configuration the number of nodes is currently limited by two nodes.

How can I initiate a failover on MSCS manually?
On MSCS, failovers can be initiated for testing purposes manually. You can open a command prompt a type in “cluster res /FAIL”, or open the Cluster Administrator then right click on a resource and choose “Initiate Failover”.


What is the "Threshold"?
Within MSCS the term Threshold is referred to the number of failures before a resource fails over to the second cluster node. Please note that if you are using the Replication Server it is very important to set the “Threshold” to “0”. Otherwise the replication will not work.

Are there other cluster environments available for the Microsoft Windows platform?
SAP supports an out of the box installation for the Microsoft Cluster Service. However, there are also additional cluster software vendors for windows.As in UNIX environments these setups are handled by the partner.

Where can I find additional information regarding the Microsoft Cluster Service?
Internet
“Guide to Creating and Configuring a Server Cluster under Windows Server 2003”
“Clustering Services” - general introduction to MSCS
“Step-by-Step Guide to Installing Cluster Service”
“Technical Overview of Windows Server 2003 Clustering Services”
“Windows 200 Clustering Technologies” - information for W2K clustering services
Clustering Technology Community

Books
“WINDOWS NT Microsoft Cluster Service,” by Richard R. Lee ISBN: 0-07-882500-8
“Implementing SAP R/3 using Microsoft Cluster Server,” by David V. Watts, Mauro Gatti, Ralf Schmidt-Dannert ISBN: 0-13-019847-1
“Tuning Microsoft Server Clusters: Guaranteeing High Availability for Business Networks,” by Robert W. Buchanan, Robert Buchanan ISBN: 0071417397
“Windows Server 2003 Clustering & Load Balancing”, by Robert Shimonski ISBN: 0072226226

SAP NetWeaver Application Server High Availability

SAP High Availability Setup

What is ASCS/SCS?
With SAP NetWeaver 04 Java, the Message Server and the Enqueue Server are separated from the Central Instance. These two services are grouped within the SAP Central Services Instance (SCS) as services. From NW04s the ABAP Central Services can be also separated from the Central Instance. Each stack, ABAP and Java, has its own Message Service and Enqueue Service. For ABAP systems the Central Services are referred to as ASCS, for Java systems the Central Services are referred to as SCS. The ASCS and the SCS are leveled as SPOF and require a High Availability Setup therefore. If the ASCS is integrated within the ABAP Central Instance (standard in NetWeaver 04) the Central Instance of the ABAP system needs a HA setup also.

What is the difference between HA for ABAP and HA for Java?
Within SAP NetWeaver 6.40 ABAP the Message Server and the Enqueue Server are integrated within the ABAP Central Instance (CI).
In SAP NetWeaver 6.40 Java the Message Server and the Enqueue Server are implemented as services within the SAP System Central Services Instance (SCS) and separated from the Central Instance (CI) this way.
With SAP NetWeaver 04s ABAP the Message Server and the Enqueue Server can be separated from the Central Instance (CI) to the ABAP SAP Central Services Instance (ASCS) in the ABAP stack also (which is recommended for HA setups due to the ASCS is a light component that can be switched over easily).

Is SAP HA also supported for heterogeneous system landscapes?
SAP does not support heterogeneous HA cluster environments at the moment officially. However, preparations for an official support of this type of setup are currently evaluated

Does SAP support Microsoft Geo Clusters?
SAP does not support Microsoft Geo Cluster at this time.


Is there a session failover mechanism for the J2EE?
The J2EE supports a session failover mechanism using DB or local persistence which can be implemented in applications. Please take a look into the documentation for further information on how to do that.

What is an "Enqueue Replication Server"?
The Enqueue Server contains the central locking table for the SAP cluster. Besides database locks it also consists of infrastructure locks of system wide objects. It is therefore necessary to secure the locking table in case of a Standalone Enqueue Server failure. The SAP Enqueue Replication Server provides a replication mechanism for the Enqueue Server by holding a copy of the locking table within its shared memory segment. After a failure of the Enqueue Server the locking table can be restored this way. Since SAP NW04 SP15/ NW04s SR1 an automated installation of the Enqueue replication server is available for Windows environments. UNIX/ Linux installations are handled by SAP hardware partners.
Note: you can only protect Stand-Alone Enqueue Servers with an Enqueue Replication Server. The standard Enqueue Server in an ABAP CI (Enqueue work process) cannot be protected by an Enqueue Replication Server.

Is an "Enqueue Replication Server" necessary for setting up an SAP HA system landscape?
With NW04 SP15 the Replication Server is required for all SAP JAVA HA scenarios.

Is the J2EE Engine available also if a central service or the database fails?
The SAP Web Application Server Java uses the database extensively. A loss of the database is very critical for the functionality of the J2EE. However, some rudimentary functions and cached information are still available. After the database is available again the Engine is in a full operational state again. The picture below shows the approximated impact after a loss of the database and the central services. Please note that the diagram is only a rule of thumb and not an empiric study.

What is the difference between Enqueue Server and Standalone Enqueue Server?
Since NW04 Java the Enqueue Server and the Message Server for a J2EE Engine are standalone services hosted by the SAP Central Services instance (SCS). The difference between a Standalone Enqueue Server and Enqueue Service is therefore only formal: The term “service” refers to the Enqueue as part of the SCS; the term “server” refers to the Enqueue as process, either enserver (.exe) or the enqueue work process within the ABAP stack. From NW04s the Enqueue Server and the Message Server are also implemented as Services within the ABAP Sap Central Services Instance (ASCS).

Why is SAP using an active-passive cluster configuration?
Due to the SAP System Central Services Instance is a light component with low workload an active-active cluster configuration does not make a lot of sense.

Are my transactions lost after a database failover?
The impact of a database loss depends on the implementation of the database. Some databases supports a session failover mechanism, others do not. Please consult the database specific documentation for further information.

Is there a certification for SAP HA solutions?
There is no official certification program for SAP High Availability partners. SAP only provides information about the services that need to be high available. Any specific information on how to do this has to be provided by the partner. You can find a list of current HA partners at service.sap.com/ha and click ”HA Customers and Partners”. However, this list does not exclude solutions provided by other vendors from being able to cluster SAP applications as well.

Who performs the High Availability setups for SAP solutions?
The setup can be performed by SAP certified consultants. In Non-MSCS environments on windows and in UNIX environments information about High Availability has to be supplied by the HA vendor or the setup has to be performed in collaboration with the partner.

SAP NetWeaver Application Server High Availability

General High Availability


What is a "High Availability Cluster"?
The aim of a High Availability Cluster (HAC) is to provide the availability of specific services within a system landscape. This is reached by reducing downtimes using redundant cluster nodes. A common HAC setup consists of two cluster nodes, which is the minimum for redundancy. If one server node crashes, the second node takes over the clustered services and secures the availability this way. This is also called a "two-node cluster" or a "failover cluster." According to the demands of availability, the number of server nodes can be increased to minimize the risks of failure of a node.

What is "switchover"?
“Switchover” is referred to as a planned switchover of a primary server to a standby server, which means without a failure of the primary server. A switchover is always initiated by the system administrator.

What is "failover"?
“Failover” is referred to the process of an unplanned switchover from a primary server to a standby server system in case of a system fail of the primary server node. Other than a switchover a failover is performed automatically by the cluster software. Some cluster software, such as Microsoft Cluster Service, does also provide an option for a manual failover for testing purposes.

What is "fallback"?
“Fallback” is referred to the process of switching back from a secondary server node to the primary server node after a failover occurred and the primary server node is available again. A fallback can be done automatically by the cluster software or intelligent, which means manual by the system administrator.

What is an "active-passive cluster"?
An active-passive cluster consists of two independent server nodes at a minimum. The primary server node performs all operations. A secondary node acts as a so called “standby system.”
In case of a system failure of the primary node, the cluster software fails over automatically to the standby server node, which starts the processes and resumes the work of the primary server node. Cluster groups are only active on one server node at the same time. Cluster groups are only active on one server node at the same time.
Please note that an active-passive cluster configuration do not implicate that the standby server node does not contain any workload. The active-passive configuration only referrers to the cluster group, which means that in an active-passive configuration the resource, can only be active on one server node at the same time. However, if a cluster environment contains more than one cluster group, theses groups can be distributed within the cluster environments.
The SAP Central Services Instance is implemented as an active-passive cluster.

What is a "standby system"?
A standby system is a redundant cluster node that takes over the processes if the primary cluster server fails. This is referred to as a “failover” and is performed automatically by cluster software. There can be several standby cluster nodes. The amount is only limited by the capabilities of the cluster software. For example, the Microsoft Cluster Service provides up to 8 server nodes, which means 7 standby nodes as a maximum.
“Standby systems” can be in a “hot” state or a “cold” state. A “hot standby” means that the processes run on the standby node also, which means that in case of a failure the cluster resource is running already and does not need to be started on the standby system. A “cold standby” means that in case of a failover the clustered resource needs to be started on the standby system which means that a (short) downtime during the failover occurs. The SAP Central Services Instance is usually implemented as a “cold standby” system due to the fact that SCS is a light component and does not need a long time for startup.

What is an "active-active cluster"?
An active-active cluster consists of two independent server nodes at a minimum. The workload within a cluster resource is shared between the server nodes. If a cluster node crashes the processes are resumed by the remaining cluster nodes. An active-active cluster configuration means that a cluster resource is active on all cluster nodes. The aim of an active-active cluster is not only to provide high availability system but to distribute the workload between the cluster nodes. Applications with a very high workload like databases benefit from an active-active setup. Due to the SAP Central Services Instance is a light component an active-active setup does not make any sense. Therefore the SCS is implemented as an active-passive cluster resource.
Shared Nothing
Shared All

What does "virtualization" mean?
The term “virtualization” in the context of HA refers to a kind of abstraction performed by the cluster software. The software creates a virtual host that owns a virtual hostname, virtual disk, and so on. “Virtual” in that manner means that such resources cannot only be owned by one physical machine but by all of them. Which node currently owns or runs a resource is managed by the cluster software. Related “resources” are usually grouped to logical containers (for example Groups on MSCS or packages on HPSG) that can perform failovers independently.


What do "shared nothing" and "shared all" mean?
The term “shared nothing” and “shared all” specifies a type of architecture within an active-active cluster. “Shared nothing” means that every cluster node contains its own data partition, which implicates that these kinds of setup are not highly available due to the fact that in case of a failure the data of the failed node is no longer available. In a “shared all” environment the different cluster nodes that run the same service shares a data partition and accesses the data concurrently.
These options have to be supported by the cluster software. For example, MSCS does not support the “shared all” option.

What is a "SPOF"?
A Single Point of Failure (SPOF) is any component within a system that, if it fails, causes a loss of a runtime critical service. A SPOF can be hardware or a software component. However, this FAQ only covers the SAP identified SPOF software components that are Message Server, Enqueue Server, the central file system and Database. In a High Availability manner it is necessary to eliminate these SPOF.