fail-over

We will now test our fail-over.

1. Before breaking

This command will show you the IP addresses of the AWS Load-Balancers: they are forwarded to you by the Cloudflare Load-Balancer.

You can check in the AWS Console, these are the IP addresses from the LB in the region1: Terraform configured Cloudflare with this behaviour.

[workstation] ~/
$ dig echo.terror.ninja
echo.terror.ninja.	1666	IN	A	35.181.88.3
echo.terror.ninja.	1666	IN	A	15.188.7.176

2. Breaking socat hosts in region1

Now, stop the 2 socat hosts in region1

Go to the AWS console in the region1 and stop the hosts named echo-socat-<ip>.

Retry to connect to the echo service and you should be able to see the 2 different IP addresses of the two socat hosts from the region2:

[workstation] ~/
$ nc echo.terror.ninja 8181
ip-10-1-1-6+v1.0.0
test4
test4

The fail-over is thus working in case of socat servers failure

3. Breaking HAProxy hosts in region1

  1. Validate that we are still using the HAProxy hosts from region1.

    If you look at the diagram of the infrastructure, you should see that we are using Connect to allow the HAProxy hosts to find local socat hosts but also to find those from region2.

    [workstation] ~/
    $ dig echo.terror.ninja
    echo.terror.ninja.	1666	IN	A	35.181.88.3
    echo.terror.ninja.	1666	IN	A	15.188.7.176

    The Consul Prepared Queries take care of finding other hosts in other datacenters in case of failure.
    Thus the end-to-end tests used by Cloudflare to check if a datacenter failed are still responding OK.

  2. Now, stop the 2 HAProxy hosts in region1

    Go to the AWS console in the region1 and stop the hosts named echo-haproxy-<ip>.

    Retry to connect and you should stille be able to see the 2 different IP addresses of the two socat hosts from the region2:

    [workstation] ~/
    $ nc echo.terror.ninja 8181
    ip-10-1-1-29+v1.0.0
    test5
    test5

    The DNS update to the IP addresses or region2 is not be immediate:

    • the Load-Balancer has to detect the failure
    • the DNS must be updated and updates need to be propagated
  3. Validate that we are now using the HAProxy hosts from region2.

    This time, the end-to-end tests used by Cloudflare detected that something was broken in region1.
    And the Load-Balancer failed-over on region2.

    [workstation] ~/
    $ dig echo.terror.ninja
    echo.terror.ninja.	30	IN	A	34.254.30.193
    echo.terror.ninja.	30	IN	A	3.248.174.184

    The fail-over is thus alos working in case of HAProxy servers failure

This architecture may not be well-optimized.
We probably don’t want that our HAProxy hosts from the region1 go to region2 to query our echo service: it could be faster for the client to already switch on region2.
Of course, it depends on your use-case and own tests.


But my main goal was to test Consul Connect and it worked perfectly.

4. Next page!

Restart the 4 hosts that were brought down.

And we are done, let’s upgrade our “socat” services and try the blue/green deployment