High Availability Clustering and Linux
The Internet helps businesses grow by cutting costs and increasing efficiency. The “always on” nature of the Internet is pushing organizations, such as banks and online stores, to deliver high-availability services to an ever-increasing number of users. In such a context, downtime means an immediate loss of revenue.
The adoption of Linux as the platform of choice for high-availability clusters is driven by a number of factors. First, many organizations find that Linux clusters match or exceed the performance of proprietary solutions while dramatically reducing the total cost of ownership (TCO). Lower TCOs are the consequence of combining inexpensive commodity hardware with open source software. Second, the open source character of Linux gives organizations the flexibility to tailor a solution to fit their specific needs. Finally, by using open source software, organizations avoid licensing fees that can otherwise become prohibitive as the number of nodes increases in a cluster, and also avoid vendor lock-in.
What is High Availability?
Mission critical services and systems must provide high levels of availability and reliability. High availability (HA) makes critical services and systems as resilient as possible against unplanned outages due to hardware, software, network or power failures.
Reliability is usually defined in terms of “mean time between failures” (MTBF). This is a measure, in number of hours, of how reliable a hardware component is. For example, a typical hard disk drive may have a mean time between failures of 300,000 hours. Hardware manufacturers develop a figure for MTBF by intensive testing and analysis of known factors. Given the MTBF, customers can get an idea of how much service to plan for. Actual downtime, “mean time to repair” (MTTR), is a measurement of the time required to bring a failed system or service back on line.
Availability is defined as the probability that a system is available at any given instant, and can be expressed in terms of MTBF and MTTR with the formula…
Availability = MTBF/(MTTR + MTBF)
So, availability can be maximized by increasing the time between failures and decreasing the time required to restore a failed system.
What causes service disruptions?
Service disruptions can be either planned or unplanned. Planned events are those that have been determined ahead of time, usually as part of a maintenance schedule. Upgrades to hardware and software or system configuration changes, may require the system to be rebooted or powered down, resulting in downtime.
Most service disruptions are caused by unplanned events. Software bugs in applications, operating systems or device drivers, coupled with system design problems, account for almost 40 per cent of the downtime caused by unplanned events. Another 30 per cent is due to hardware problems, such as failure of moving parts, like CPU fans, power supply fans and hard disk drives. The remaining 30 per cent is due to operator error, usually unplugging the wrong power or network cable. Disruptions that are beyond the scope of HA system design are breaches in physical system security and natural disasters (such as floods, fires, earthquakes).
Designing an HA cluster
Performance, redundancy and data sharing are key factors to consider when designing a high-availability system. It is important to look for single points of failure (SPOFs) in your design. These are components, hardware or software, whose failure can bring down the whole HA system. Analyzing these factors will help you shape an HA system configuration tailored to meet your requirements.
One of the components of HA performance is the amount of time an HA configuration takes to recover from a failure. In other words, how long will a critical service be unavailable before it is noticed by users. Much of the design process will be structured around this time constraint. To determine the overall time constraint, we have to consider fault detection, notification and recovery. Fault detection is typically implemented by nodes exchanging “heartbeat” messages via Ethernet or serial links. To ensure reliable interconnection for heartbeat messaging, dedicated Ethernet or serial ports on each node are connected via crossover or null modem cables. The actual heartbeat message is usually a simple string like “node is up”. A node that fails to provide heartbeats after the preset time interval has elapsed is considered failed by peer nodes. This elapsed time interval before notification and the heartbeat polling interval should be tuned to meet the time requirements of the specific applications running on the cluster.
When a fault is detected, the cluster management application performs a recovery procedure known as a “failover”. The failed node is deactivated and a substitute node takes over its workload. The amount of time necessary for the failover depends on the number and complexity of the steps required for initializing specific applications that were running on the failed node. If the cluster does not use shared external storage, then data synchronization activities, like replaying large database query logs or performing file system integrity checks on large disk partitions, can take some time. With the emergence of journaling file systems such as ext3 and ReiserFS, lengthy integrity checks are becoming a thing of the past.
Redundancy provides the basis for recovery after fault detection. The idea is simple: if one node fails, another should take over its function, with minimal impact on users. Redundancy in an HA system is configured according to how applications will be distributed across individual nodes in the cluster. The two most commonly used configurations in dual-node systems are active-active and active-passive.
In an active-active configuration each node in the cluster runs one or more unrelated application, sharing the total cluster workload. If a node fails, the other one will pick up the load of running all its services. Although active-active configurations are more difficult to implement, the financial investment in replicated hardware may be worth the dramatic improvements in load sharing that are possible. The total capacity of the HA cluster is maximized because all nodes are active and hardware is not just sitting idle waiting for a failure. In a dual-node system, while cluster performance is maximized when both nodes are active, it is reduced to half when one of them fails. This should be kept in mind when deciding which configuration to go with for your HA cluster.
An active-passive configuration is desirable when a single large application needs to be run on the cluster. Few applications support concurrent data access models and are consequently restricted to running on a single node. In a dual-node system, a primary active node handles all requests, while a secondary passive node sits idle. The unused computing power can be used by running additional applications to replicate and/or synchronize data between both nodes, so reducing downtime in the event of a failover. One of the benefits of active-passive clusters is that overall performance remains constant, since only one node is active at any given time.
Online services, such as banking and travel reservations, demand high-performance shared disk storage solutions, which are always available. In addition to high performance and availability, a shared disk storage solution should provide data sharing across all nodes and disks. A cluster file system, such as Sistina’s Global File System (GFS), allows multiple nodes to share a single, common view of storage data, mitigating data synchronization issues. GFS provides locking mechanisms to allow multiple simultaneous writes to the same file, storage allocation control by assigning quotas to applications, data journaling for small file performance, and out-of-box integration with Red Hat and SuSE. Red Hat has promised to make GFS open source, following its acquisition of Sistina late last year.
Linux HA solutions
Whether you want to build your own HA cluster or buy an off-the-shelf solution, here is a brief look at some of the well-known high availability solutions that are being used by many large organizations building and deploying Linux clusters.
Linux HA project: Widely used in many real-world high availability solutions, Linux HA implements network and serial heartbeating for node monitoring and combines multiple IP address takeover for simple dual-node active-passive clusters. Find out more at linux-ha.org.
Linux Virtual Server (LVS): LVS achieves scalability and high availability by virtualizing a cluster of real servers, which are front-ended with load balancers. Users interact with this cluster as if it were only a single virtual server, allowing nodes to be added or removed transparently to improve scalability. Similarly, node failure detection reconfigures the cluster as needed to deliver high availability. Find out more at www.linuxvirtualserver.org.
Red Hat Cluster Suite: Originally based on Kimberlite from MissionCriticalLinux, this solution is well suited for database, file server and Web server applications. It offers both active-active and active-passive configurations along with GUI management software. Additionally, you can combine Red Hat Enterprise Linux and Piranha load balancing software to deliver a variety of high availability services. Find out more at www.redhat.com/software/rha/cluster.
EmicNetworks: The Emic Application Cluster (EAC) software targets high performance, highly available database applications, and Web servers with dynamic load balancing as well as tools for failover and recovery. Find out more at www.emicnetworks.com.
By using a combination of open source software components, successful online services, like Amazon and Google, have redefined the standard for high-availability applications. With the emergence of frameworks like High Availability Open Source Cluster Application Resources (HA-OSCAR) (www.cenit.latech.edu/oscar), we will see ideas from the worlds of high performance computing (HPC) and high availability converge to deliver even more innovative and powerful solutions in the near future.