February 1, 2019February 1, 2019 Srinivasa Reddy

Replication, High Availability & Load Balancing – PostgreSQL

Replication in PostgreSQL is an internal DB activity that is used to maintain high availability, disaster recovery and also for load balancing as all the reads can go to the slave. There are different ways to set it up that are explained below.

Streaming Replication

Streaming replication allows a standby server to stay more up-to-date than is possible with file-based log shipping. The standby connects to the primary, which streams WAL records to the standby as they’re generated, without waiting for the WAL file to be filled.

In recent years, streaming replication has been one of the major features introduced in PostgreSQL. The idea is to use the PostgreSQL transaction log (WAL) to synchronize an arbitrary number of services and to replicate data inside a cluster.In recent years, streaming replication has been one of the major features introduced in PostgreSQL. The idea is to use the PostgreSQL transaction log (WAL) to synchronize an arbitrary number of services and to replicate data inside a cluster.

In PostgreSQL replication can be made in 2 ways:

Asynchronous
Synchronous

You can decide which method is more beneficial to you according to your needs.

Asynchronous replication in PostgreSQL

In the event of server failure and disaster, it is a good idea to have a standby / replica of your master server within reach. Fortunately, PostgreSQL offers the means to achieve exactly that. Administrators can create read-only replicas of a master server easily and use those replicas for various purposes such as:

Continuous backups
Automatic Failovers
Scaling out read-only workloads
Achieving geo-redundancies

Asynchronous is the standard way to replicate data in the PostgreSQL world and it offers a reliable and easy way to distribute data and make your setups more failsafe.

The main advantages are a low overhead, simplicity, and robustness. As a result, it is the ideal solution for automatic fail overs and enterprise-grade redundancies.

Synchronous replication in PostgreSQL

If you are not able to take the risk of losing a single COMMIT, synchronous might be what you are looking for. In PostgreSQL, you can replicate synchronously to as many slaves as you want to ensure that a COMMIT is only valid once it has been confirmed by the desired number of PostgreSQL servers.

It ensures the highest possible security for your transactions because a single crashing server can no longer cause data loss.

Image result for synchronous replication postgresql — Image Source

Cascading Replication

The cascading replication feature allows a standby server to accept the connections and stream WAL records to other standbys, acting as a relay. This can be used to reduce the number of direct connections to the master and also to minimize inter-site bandwidth overheads.

A standby acting as both a receiver and a sender is known as a cascading standby. Standbys that are more directly connected to the master are known as upstream servers, while those standby servers further away are downstream servers. This does not place limits on the number or arrangement of downstream servers, though each standby connects to only one upstream server which eventually links to a single master/primary server.

A cascading standby sends not only WAL records received from the master but also those restored from the archive. So even if the replication connection in some upstream connection is terminated, streaming replication continues downstream for as long as new WAL records are available.

Cascading replication is currently asynchronous. Synchronous settings have no effect on cascading at present.

Read our other Blog – Things to know before starting with DynamoDB

Logical Replication

Let me start with briefly mentioning what logical replication is and what’s it good for. I expect that most people know the PostgreSQL streaming master-standby one that has been part of PostgreSQL for years and is commonly used both for high availability and read scaling.

Logical replication is a method of replicating data objects and their changes, based upon their replication identity (usually a primary key). We use the term logical in contrast to the physical one, which uses exact block addresses and byte-by-byte replication. PostgreSQL supports both mechanisms concurrently. Logical replication allows fine-grained control over both data replication and security.

Logical Replication can replay, logically, the changes (as in insert, update, delete of rows) happening to one or more persistent tables in a database.

The source server must create a named object called a Publication. A publication serves as an endpoint from which a log of changes can be fetched by a Subscription. It creates the subscription on another server, which is the destination server. The subscription includes a standard connection string that tells how to connect to the source server.

The typical use-cases are:

Sending incremental changes in a single database or a subset of a database to subscribers as they occur.
Firing triggers for individual changes as they arrive on the subscriber.
Consolidating multiple databases into a single one (for example for analytical purposes).
Replicating between different major versions of PostgreSQL.
Replicating between PostgreSQL instances on different platforms (for example Linux to Windows)
Giving access to replicated data to different groups of users.
Sharing a subset of the database between multiple databases.

Image result for logical replication postgresql — Image Source

Third party Tools

There few third-party tools can be use for postgresql replication, developed by few companies like EDB, OpenSCG, 2^nd Quadrant etc. Below are the tools and the links given.

Repmgr

repmgr is an open-source tool suite for managing replication and failover in a cluster of PostgreSQL servers. It enhances PostgreSQL’s built-in hot-standby capabilities with tools to set up standby servers, monitor and perform administrative tasks such as failover or manual switchover operations.

https://repmgr.org/

EFM

EDB Postgres Failover Manager (EFM) provides automatic failover and high availability—ensuring your data is safe to support mission-critical applications.

https://www.enterprisedb.com/products/edb-postgres-platform/edb-postgres-failover-manager

BDR

Postgres-BDR (Bi-Directional Replication for PostgreSQL) is a ground-breaking multi-master tool for PostgreSQL databases that has been in full production status.

https://www.2ndquadrant.com/en/resources/postgres-bdr-2ndquadrant/

Bucardo

Bucardo supports any number of sources and targets (aka masters and slaves). It is asynchronous and trigger based.

https://bucardo.org/

Londiste

https://wiki.postgresql.org/wiki/Londiste_Tutorial_(Skytools_2)

pglogical

The pglogical extension provides logical streaming replication for PostgreSQL, using a publish/subscribe module

https://www.2ndquadrant.com/en/resources/pglogical/pglogical-docs/

slony

https://severalnines.com/blog/experts-guide-slony-postgresql

Reference links-