What and Why HAProxy?

So, what is HAProxy?

An internet search defines HAProxy as a free, open-source software that can be used as a reverse proxy server and load balancer for TCP and HTTP-based applications that spread requests across multiple servers. Before diving deep into this, let me elaborate the terms “Reverse Proxy” and “Load Balancer”.

What is “Reverse proxy” and what does the word “Reverse” stand for?

So basically, we have a forward proxy and a reverse proxy. 
The word “proxy” describes someone or something acting on behalf of someone else. In the computer world, we are talking about one server acting on behalf of another server, it can be a client-server or a web server.
In one line, the difference between these two is that the proxy used by the client for any purpose is called as the forward proxy or just ‘proxy’. Whereas the proxy which is used by the servers for its own purpose is called a reverse proxy.

The forward proxy retrieves data from another web site on behalf of the original requester. Why would anyone want to do that? Why would anyone retrieve data on someone else’s name?

There can be multiple reasons, let’s us understand this with an example- Suppose we have 3 servers- Alpha (our client computer or simply client), Beta (our proxy server or forward proxy website) and Gamma (the website which alpha wants to access). Now normally, Alpha would like to directly connect to Gamma but in some cases, Alpha would like Beta to access Gamma on its behalf. First reason for it can be that someone with administrative authority over Alpha’s internet connection has decided to block all access to Gamma, like in colleges some sites are restricted for students or in companies, the employees have been wasting too much time on social websites and thus the company’s admin decided to block access to Gamma for all the employees.

The second reason can be that the Gamma administrator has blocked Alpha’s access because Alpha might try to spam or abuse Gamma’s contents. In this case, Alpha would like to give access to its contents to Beta because Gamma would have no problem with Beta’s access.

Similarly Reverse proxy is the one which is used by Gamma for various reasons but not by Alpha. Here in the same example, Beta would be the Reverse Proxy server. The reasons could be many- first can be that the administrator of Gamma is worried about its content hosted on the server and does not want to expose the main server directly to the public. The second reason could be that a single web server cannot handle all the traffic because Gamma has a large website with huge traffic. So Gamma sets up many servers and puts a reverse proxy on the internet that will redirect the users to particular Gamma wants when they visit the website. In this case, Alpha does not know whether Gamma exists or not and he doesn’t care, because user Alpha only sees that he is communicating with Beta.

Now comes the Load Balancer part- I have already mentioned in the reverse proxy example, mentioning it again to make it clearer.

As the name suggests, the load balancer means to balance the load, to distribute the load among different servers. Suppose, googley.com has a lot of traffic, each hour, ten million users hit the website and it is not possible for one web server to handle all the request and also the “googley.com” team doesn’t want to compromise with the speed of the site which means the time in the request is served too. So they come up with a reverse proxy server. All the requests will go first to this proxy server, and this server will redirect requests to actual web servers depending on the algorithm googley.com wants to use like “RoundRobin” which means an equal request to all servers irrespective of the location of servers. Other algorithms might want to redirect the request to the closest server located as this will reduce network latency.

Now you have got a bit of what HAProxy is.
But the next question that comes into your mind is- Why HAProxy?

First of all, to understand in simple words- it’s an open source which is easy to use. Secondly, it can be used as both Reverse Proxy and Load Balancer. Now there is no need to have two separate tools and it also works efficiently.

Let’s go through an example of how HAProxy works-

Let’s assume we have web service (website.com) running on three different servers with IP’s 172.172.172.100, 172.172.172.200, 172.172.172.300 and on port number 9010.

And HAProxy is running on a different server with IP 172.100.100.100
A typical HAProxy block would look like this-

listen website server #line 1
bind *:1701 #line 2
mode HTTP
timeout client 10800s
timeout server 10800s
balance roundrobin #line3
option tcp-check
option allbackups
server website-server-web-server-0 172.172.172.100:9010 check #line4
server website-server-web-server-1 172.172.172.200:9010 check #line5
server website-server-web-server-2 172.172.172.300:9010 check #line6

Line 1- In this line, you can give any name to your server (usually give some meaningful name)

Line 2- It is the port number by which you are associating this service to, so that if anyone has to access the website server, it has to hit reverse proxy machine IP and this port number (in this case it would be 172.100.100.100:1701).

Line 3- That would be the algorithm by which we want to redirect the requests to multiple servers. We will discuss these algorithms in the next topic.

Line 4,5 and 6- These are the IP’s and Ports of the machines where actual service is running and these are not exposed to the public.

HAP On the left-hand side we have clients who all want to access our website server. In the middle, we have HAProxy server and on the right side, we have all the web servers.

Now if the client wants to access the web servers, he can’t directly hit them as these are not exposed to public on the account of content risk. The client has to hit HAProxy server (in this case 172.100.100.100:1701), and then HAProxy server will redirect this request to the website server-0,1 or 2 depending on the algorithm used.

There are a lot of algorithms that can be used in HAProxy servers depending on the requirement, some of these are-

Round Robin- The algorithm chooses the server sequentially in the list. Once it reaches the end of the server, it forwards the new request to the first server in the list.

Least connection algorithm- This algorithm selects the server with few active transactions and then forwards the user request to the back end.
Source- This algorithm selects the server based on the source IP address using the hash to connect it to the matching server.

The next question that arises is- What if the load balancer or HAProxy server is getting crowded with a lot of requests?

To solve this issue we can use another load balancer which we can call as the secondary load balancer. If the primary load balancer is getting crowded, the second load balancer will come to its rescue.

HAProxy also supports multiple plugins that can be used as per the requirement. One such beautiful plugin is “Socat” which is supported by HAProxy. This plugin comes in handy if you want to disable some particular server and want to redirect requests to other servers.

In the above example- you have 3 website servers- Server 0, 1 and 2. Now the requirement is to deploy some new changes in these servers. Let’s assume we are sequential, first in server-0 the 1 and then 2. Now, if we are deploying in server-0, for few seconds or milliseconds this server will go down and in the meantime all the requests which are being redirected to this server would get killed and that would be a bad user experience.To overcome this problem, what we can do is disable server-0, so that all the requests go to server-1 and server-2 and then enable it again. Once its up then disable the next one and deploy and continue. This would be a smooth deployment and the users won’t get any flick or fluctuation even during deployment.

Now, the issue remains how to disable a particular server?

There are 2 ways. The first one is a bit hard- you have to change the HAProxy configuration by removing the server where the service is deployed and then restart the HAProxy server. Add this again and remove the other one and continue. This is strongly not recommended as it is not a good practice to change the configuration again and again while deploying because it increases the chances of errors. After changing every configuration, you have to reload the HAProxy server which in itself is not recommended as reloading takes only few milliseconds but it might serve hundreds of requests in these milliseconds.

Here, HAProxy plugin “Socat” comes to the rescue. With the help of this plugin we can disable and enable any server without the need of restarting the complete HAProxy server.

listen website server #line 1
bind *:1701
mode HTTP
timeout client 10800s
timeout server 10800s
balance roundrobin
option tcp-check
option allbackups
server website-server-web-server-0 172.172.172.100:9010 check #line 4
server website-server-web-server-1 172.172.172.200:9010 check 
server website-server-web-server-2 172.172.172.300:9010 check

In this case disabling any server block using Socat command, would be like-

echo “disable server website server/website-server-web-server-0”|sudo socat stdio /run/haproxy/haproxy.sock

The order of the command structure is as follows:

  1. Disable (or enable) the current server
  2. Keyword “server”, the server name which we have used in line 1, followed by “/”
  3. Server name which we have in line 4, 5, or 6 (the server which we want to disable)
  4. Server-0 will be disabled and all the incoming requests will now go to other remaining servers
  5. Deploy in server-0 and after it is up again, we can use the following command to enable
echo “enable server website server/website-server-web-server-0”|sudo socat stdio /run/haproxy/haproxy.sock

Now, go ahead and disable server-1, deploy it there and enable it again.

Written by-
Ankur Garg


Originally published at medium.com on June 28, 2018.


Also published on Medium.

Leave a Reply

Your email address will not be published.