Linux Administration: Building a Load Balancer with LVS

In previous blogs I spent some time setting up load balancers using HAProxy, Pound and Nginx. What is common among them is that they all act as a Layer 7 reverse proxy. The common disadvantage of these technologies is that they are not very efficient at distributing Layer 4 traffic. They suffer from lots of context switching between user space and kernel space, which introduces delays, especially under heavy traffic with many short lived connections.
A better solution that runs entirely in kernel space is LVS [1]. Linux Virtual Server has been around since 1998, very mature and stable code that is compiled in the kernel, since 2.4.23 branch.
Layer 4 Switching determines the path of packets based on information available at layer 4 of the OSI 7 layer protocol stack. This means that the IP address and port are available as is the underlying protocol, TCP/IP or UDP/IP.
There are five Forwarding Types in LVS - LVS-NAT, LVS-DR, LVS-Tun, LVS-FullNAT and LVS-SYNPROXY:

LVS-NAT as the name implies uses NAT from the Load Balancer (or the Director in LVS speak) to the back-end servers (or the Real Servers). The Director uses the ability of Linux kernel to change the network IP addresses and ports as packets pass through the kernel.There used to be a significant overhead when using this method, but not anymore. I'll demonstrate how to set this up later in this article.
LVS-DR stands for direct routing. The Director forwards all incoming requests to the nodes inside the cluster, but the nodes inside the cluster send their replies directly back to client computers.
LVS-Tun uses IPIP tunneling. IP tunneling can be used to forward packets from one subnet or virtual LAN, to another subnet or VLAN, even when the packets must pass through another network or Internet. Building on the IP tunneling capability that is part of the Linux kernel, the LVS-TUN forwarding method allows you to place cluster nodes on a cluster network that is not on the same network segment as the Director.
LVS-FullNAT, this is a relatively new module that introduces local ip address (IDC internal ip address, lip), IPVS translates cip-vip to/from lip-rip, in which lip and rip both are IDC internal ip address, so that LVS load balancer and real servers can be in different vlans, and real servers only need to access internal network.
LVS-SYNPROXY is based on tcp syncookies.

Please note that FullNAT and SYNPROXY have limited testing at the time of writing this article.

Now that we have the basics covered let's create a load balancers that listens on port 80, and distributes TCP connections in a round-robin fashion to two back-end nodes using NAT.

First let's install the user-space tools used to manage LVS:

[root@host1 ~]# yum install ipvsadm

view raw gistfile1.sh hosted with ❤ by GitHub

Then let's describe the topology:

	[root@host1 ~]# ipvsadm -A -t 192.168.122.53:80 -s rr
	[root@host1 ~]# ipvsadm -a -t 192.168.122.53:80 -r 192.168.122.50 -m
	[root@host1 ~]# ipvsadm -a -t 192.168.122.53:80 -r 192.168.122.51 -m

view raw gistfile1.sh hosted with ❤ by GitHub

File: gistfile1.sh ------------------ [root@host1 ~]# ipvsadm -A -t 192.168.122.53:80 -s rr [root@host1 ~]# ipvsadm -a -t 192.168.122.53:80 -r 192.168.122.50 -m [root@host1 ~]# ipvsadm -a -t 192.168.122.53:80 -r 192.168.122.51 -m

Line 1 adds a TCP virtual service on 192.168.122.53 port 80, using round-robin algorithm. This is your Director, or load balancer.
Lines 2 and 3 add two real servers (back end nodes, running Apache) to the virtual service specified on line 1.

To list the current configuration and the various stats, run:

	[root@host1 ~]# ipvsadm -L -n --stats
	IP Virtual Server version 1.2.1 (size=4096)
	Prot LocalAddress:Port Conns InPkts OutPkts InBytes OutBytes
	-> RemoteAddress:Port
	TCP 192.168.122.53:80 5 25 25 2155 2740
	-> 192.168.122.50:80 2 10 10 862 1096
	-> 192.168.122.51:80 3 15 15 1293 1644

view raw gistfile1.sh hosted with ❤ by GitHub

File: gistfile1.sh ------------------ [root@host1 ~]# ipvsadm -L -n --stats IP Virtual Server version 1.2.1 (size=4096) Prot LocalAddress:Port Conns InPkts OutPkts InBytes OutBytes -> RemoteAddress:Port TCP 192.168.122.53:80 5 25 25 2155 2740 -> 192.168.122.50:80 2 10 10 862 1096 -> 192.168.122.51:80 3 15 15 1293 1644

To save the current configuration use:

[root@host1 ~]# ipvsadm -S -n > ipvsadm.conf

view raw gistfile1.sh hosted with ❤ by GitHub

To restore previously saved config execute:

[root@host1 ~]# ipvsadm -R < ipvsadm.conf

view raw gistfile1.sh hosted with ❤ by GitHub

To clean the current setup run:

[root@host1 ~]# ipvsadm --clear

view raw gistfile1.sh hosted with ❤ by GitHub

To test the configuration just connect to the load balancer using curl or nc:

	[root@host1 ~]# curl 192.168.122.53
	Web Server 1
	[root@host1 ~]# curl 192.168.122.53
	Web Server 2

view raw gistfile1.sh hosted with ❤ by GitHub

And that's all it takes to configure a TCP load balancer that distributes connections to two real servers listening on port 80.

One thing to keep in mind is that LVS does not know when a real server (back-end node) is down and it will still send traffic to it. LVS blindly forwards packets based on the configured rules and this is all it does. This of course is not very useful in production environments

To solve this problem we need some monitoring in place that will remove real servers from the LVS configuration if they are no longer able to accept connections.

There are many tools out there that do just that, but in this example I am going to use mon. I am not going to go into great details about how mon works, but in a nutshell it's a daemon that runs custom tests (in this case I'll use http test) and based on if the test fails or passes mon will execute a script that does something. It's extremely extendable and one can write its own monitoring or action scripts.

Let's first install it:

[root@host1 ~]# yum install -y mon

view raw gistfile1.sh hosted with ❤ by GitHub

The configuration file is in /etc/mon. Here's an example using the two real servers configured earlier:

	[root@host1 ~]# cat /etc/mon/mon.cf

	### global options
	cfbasedir = /etc/mon
	pidfile = /var/run/mon.pid
	statedir = /var/lib/mon/state.d
	logdir = /var/lib/mon/log.d
	dtlogfile = /var/lib/mon/log.d/downtime.log
	alertdir = /usr/lib64/mon/alert.d
	mondir = /usr/lib64/mon/mon.d
	maxprocs = 20
	histlength = 100
	randstart = 60s
	authtype = pam
	userfile = /etc/mon/userfile

	### group definitions (hostnames or IP addresses)
	hostgroup HTTP1 192.168.122.50

	watch HTTP1
	service http
	interval 5s
	monitor http.monitor
	allow_empty_group
	period wd {Sun-Sat}
	alert test.alert
	upalert test.alert

	hostgroup HTTP2 192.168.122.51

	watch HTTP2
	service http
	interval 5s
	monitor http.monitor
	allow_empty_group
	period wd {Sun-Sat}
	alert test.alert
	upalert test.alert

	### See /usr/share/doc for the original example...

view raw gistfile1.sh hosted with ❤ by GitHub

File: gistfile1.sh ------------------ [root@host1 ~]# cat /etc/mon/mon.cf ### global options cfbasedir = /etc/mon pidfile = /var/run/mon.pid statedir = /var/lib/mon/state.d logdir = /var/lib/mon/log.d dtlogfile = /var/lib/mon/log.d/downtime.log alertdir = /usr/lib64/mon/alert.d mondir = /usr/lib64/mon/mon.d maxprocs = 20 histlength = 100 randstart = 60s authtype = pam userfile = /etc/mon/userfile ### group definitions (hostnames or IP addresses) hostgroup HTTP1 192.168.122.50 watch HTTP1 service http interval 5s monitor http.monitor allow_empty_group period wd {Sun-Sat} alert test.alert upalert test.alert hostgroup HTTP2 192.168.122.51 watch HTTP2 service http interval 5s monitor http.monitor allow_empty_group period wd {Sun-Sat} alert test.alert upalert test.alert ### See /usr/share/doc for the original example...

Lines 18 and 29 define a hostgroup, which consist of our real servers to be monitored.
Line 22 defines the interval the monitor should run.
Line 23 sets up the monitor type. You can see all monitors that come with the mon package in /usr/lib64/mon/mon.d/
Line 26 specifies what script to execute when the test fails.
Line 27 defines what script to run when the test succeeds after a failure. You can see all alert scripts that come with the mon package in /usr/lib64/mon/alert.d/

Let's create our own test.alert script that will add and remove real servers from LVS:

	[root@host1 ~]# vi /usr/lib64/mon/alert.d/test.alert

	#!/bin/sh
	#
	# $Id: test.alert,v 1.1.1.1 2004/06/09 05:18:07 trockij Exp $
	#echo "`date` $*" >> /tmp/test.alert.log

	if [ "$9" = "-u" ]
	then
	echo "`date` Real Server $6 is UP" >> /tmp/test.alert.log
	ipvsadm -a -t 192.168.122.53:80 -r $6:80 -m
	else
	echo "`date` Real Server $6 is DOWN" >> /tmp/test.alert.log
	ipvsadm -d -t 192.168.122.53:80 -r $6:80
	fi

view raw gistfile1.sh hosted with ❤ by GitHub

File: gistfile1.sh ------------------ [root@host1 ~]# vi /usr/lib64/mon/alert.d/test.alert #!/bin/sh # # $Id: test.alert,v 1.1.1.1 2004/06/09 05:18:07 trockij Exp $ #echo "`date` $*" >> /tmp/test.alert.log if [ "$9" = "-u" ] then echo "`date` Real Server $6 is UP" >> /tmp/test.alert.log ipvsadm -a -t 192.168.122.53:80 -r $6:80 -m else echo "`date` Real Server $6 is DOWN" >> /tmp/test.alert.log ipvsadm -d -t 192.168.122.53:80 -r $6:80 fi

With everything in place let's start the service:

[root@host1 ~]# service mon start

view raw gistfile1.sh hosted with ❤ by GitHub

When Apache is no longer accessible on port 80 on the first real server mon will put the following message in /tmp/test.alert.log and remove the node from LVS.

Wed Jan 9 15:44:39 UTC 2013 Real Server 192.168.122.50 is DOWN

view raw gistfile1.sh hosted with ❤ by GitHub

When Apache is accessible again (-u will be passed from mon to the test.alert script as argument at $9), the test.alert script will add the node back in LVS.

Resources:
[1] http://www.linuxvirtualserver.org/

Linux Administration

Pages

Building a Load Balancer with LVS - Linux Virtual Server