postgres streaming replication

master db

1. setup pg_hba.conf so remote connection is allowed (requires db restart)

add:

host    replication     replication_user         S.OM.E.IP/0                 md5
host    replication     replication_user         S.OM.E.IP/0                 md5

notice the name of the db - replication:

The value replication specifies that the record matches if a physical replication connection is requested (note that replication connections do not specify any particular database). More in postgres docs.


2. configure firewall to allow incoming traffic from the replica server for each port dbs are listening on; add the replica ip to the white list


3. adjust postgresql.conf (requires db restart)

wal_level = replica
max_wal_senders = 10
wal_keep_segments = 10
hot_standby = on
primary_conninfo = 'host=S.OM.E.IP port=some_port user=replication_user password=some_pass'
primary_slot_name = 'physical_slot_1'


4. Create replication role

create role replication_user with login password 'some_pass' replication;


5. Create replication slot

SELECT * FROM pg_create_physical_replication_slot('physical_slot_1');


replica db

1. backup and restore dbs on replica server (requires the db data folders to be empty, as all the stuff should be brought over)

pg_basebackup --host=S.OM.E.IP --port=some_port --username=replication_user --wal-method=fetch --pgdata=destination_dir

create standby.signal file in the data dir of each database


2. start replica; logs should show the replication is working:

2020-04-17 20:16:18.224 CEST [9276] LOG:  entering standby mode
2020-04-17 20:16:18.290 CEST [9276] LOG:  redo starts at 0/2000028
2020-04-17 20:16:18.323 CEST [9276] LOG:  consistent recovery state reached at 0/2000138
2020-04-17 20:16:18.326 CEST [19868] LOG:  database system is ready to accept read only connections
2020-04-17 20:16:18.416 CEST [15924] LOG:  started streaming WAL from primary at 0/3000000 on timeline 1

expose postgres to outer world

fire wall

Whatever you need for firewall...

pg_hba.conf

host    all             all             XXX.XXX.XXX.XXX/0           md5

postgresql.conf

listen_addresses = 'localhost,XXX.XXX.XXX.XXX'

import osmnames to postgres

download links & more info on osmnames: https://github.com/osmnames/osmnames

Data import:

create table if not exists geonames (
	name text,
	alternative_names text,
	osm_type text,
	osm_id bigint,
	class text,
	type text,
	lon numeric,
	lat numeric,
	place_rank integer,
	importance numeric,
	street text,
	city text,
	county text,
	state text,
	country text,
	country_code text,
	display_name text,
	west numeric,
	south numeric,
	east numeric,
	north numeric,
	wikidata text,
	wikipedia text,
	housenumbers text
);
delete from geonames;

create table if not exists housenumbers (
	osm_id bigint,
	street_id bigint,
	street text,
	housenumber text,
	lon numeric,
	lat numeric
);
delete from housenumbers;

COPY geonames FROM 'S:/ome/location/planet-latest_geonames.tsv' DELIMITER E'\t' CSV HEADER;
COPY housenumbers FROM 'S:/ome/location/planet-latest_housenumbers.tsv' DELIMITER E'\t' CSV HEADER;

Note that with v2 of osmnames there may be duplicates (https://github.com/OSMNames/OSMNames/issues/162)

therefore it is worth cleaning it up:

select distinct on (osm_id) * into geonames_clean from geonames;
select distinct on (osm_id) * into housenumbers_clean from housenumbers; 

Add some pks & indexes, so we can speed stuff up a bit:

ALTER TABLE public.geonames_clean
    ADD PRIMARY KEY (osm_id);
ALTER TABLE public.housenumbers_clean
    ADD PRIMARY KEY (osm_id);
CREATE INDEX geonames_clean_country_code_idx
    ON public.geonames_clean USING btree
    (country_code COLLATE pg_catalog."default");

Data extract for a country & glue house numbers:

select
	gn.*,
	hn.osm_id as hn_osm_id,
	hn.street_id as hn_street_id,
	hn.street as hn_street,
	hn.housenumber as hn_housenumber,
	hn.lon as hn_lon,
	hn.lat as hn_lat
into
	nl_geonames_complete
from
	geonames_clean gn 
	left outer join housenumbers_clean hn on gn.osm_id = hn.street_id
where
	gn.country_code = 'nl';

Finally output the extract to a tab separated file:

copy (select * from nl_geonames_complete) to 'S:/ome/location/nl_geonames_complete.tsv' with CSV HEADER DELIMITER E'\t';

pgsql - update jsonb with behavior similar to monogdb $inc operator

I was looking into upserting data into pgsql jsonb field but with a behavior similar to mongodb $inc operator.

upsert itself was solved by using:

insert ...  
on conflict(cols) do update set ...


this snippet is just a dirty reminder on how jsonb can be updated with new keys:

select
  d || jsonb_build_object(
    'x', coalesce(cast(d->>'x' as int), 0) + 40,
    'y', coalesce(cast(d->>'y' as int), 0) + 40
  )
from (
  select '{"x":5000, "z":100}'::jsonb as d
) as data

And a more complete example:

CREATE TABLE monitoring.usage_stats
(
  id serial NOT NULL,
  date timestamp without time zone NOT NULL,
  application_id uuid NOT NULL,
  organization_id uuid NOT NULL,
  user_id uuid,
  token_id uuid,
  referrer text,
  id_1 text,
  id_2 text,
  id_3 text,
  id_4 text,
  id_5 text,
  counters jsonb NOT NULL DEFAULT '{}'::jsonb,
  CONSTRAINT "PK_monitoring.usage_stats" PRIMARY KEY (id)
);

CREATE UNIQUE INDEX usage_stats_uq_usage_stats_composite_key
  ON monitoring.usage_stats
  USING btree
  (date, application_id, organization_id, user_id, token_id, referrer COLLATE pg_catalog."default", id_1 COLLATE pg_catalog."default", id_2 COLLATE pg_catalog."default", id_3 COLLATE pg_catalog."default", id_4 COLLATE pg_catalog."default", id_5 COLLATE pg_catalog."default");
INSERT INTO monitoring.usage_stats (
    date,
    application_id,
    organization_id,
    user_id,
    token_id,
    referrer,
    id_1,
    id_2,
    id_3,
    id_4,
    id_5,
    counters
)
VALUES (
    @date,
    @applicationId,
    @organizationId,
    @userId,
    @tokenId,
    @referrer,
    @id1,
    @id2,
    @id3,
    @id4,
    @id5,
    @counters::jsonb
)
ON CONFLICT (date,application_id,organization_id,user_id,token_id,referrer,id_1,id_2,id_3,id_4,id_5)
DO UPDATE SET
    counters = usage_stats.counters || jsonb_build_object('test',coalesce(cast(usage_stats.counters->>'test' as int), 0) + 2);

Postgres - logical replication

Master DB:

1. Create a publication

CREATE PUBLICATION alltables FOR ALL TABLES;

2. Create a replication user

create role replication_user with login password 'some-pass' replication; 
--GRANT whatever permissions are required
grant usage on schema some_schema to replication_user;
grant select on all tables in schema some_schema to replication_user;

Note: with logical replication, user does not seem to have to have the replication rights; this is required with std streaming replication

3. Modify the postgresql.conf to allow connections to the machine via its ip

listen_addresses = 'localhost,server_ip'

and set up a proper WAL lvl

wal_level = logical			# minimal, replica, or logical

4. Modify pg_hba.conf to allow connections from specified slaves

host    all     replication_user         slave_ip/0                 md5

5. Dump db structure

pg_dump.exe --host localhost --port 5432 --username "some-user-name" --no-password  --format plain --schema-only --verbose --file "C:\temp\db_schema.sql" --schema "schema-name" "db-name"

6. Review publication and replication slots (when a subscription is created)

select * from pg_publication;
select * from pg_replication_slots; --when subscription is created
--when need to drop replication slot
select pg_drop_replication_slot('slot_name')

7. Check replication status

select * from pg_publication;
select * from pg_stat_replication;


Replica DB:

1. recreate db structure - this is required in order to make the subscription work

createdb --host localhost --port 5432 --username user-name  db-name
psql --host localhost --port 5432 --username user-name -w --dbname db-name --command "create extension if not exists postgis"
psql --host localhost --port 5434 --username postgres -w --dbname db-name --file "c:\temp\db_schema.sql"

Note: installing postgis is not required of course

2. Create a subscription

create subscription my_sub 
connection 'host=master-ip dbname=db-to-replicate user=replication_user password=some-password port=5432' 
publication alltables;

3. Review subscription

select * from pg_subscription;

4. Check replication status

select * from pg_stat_subscription;

More info here: https://www.postgresql.org/docs/10/logical-replication.html