This document explains the installation and configuration of the PostgreSQL database engine.

Install PostgreSQL

Install PostgreSQL

sudo apt install postgresql postgresql-client

Debian wiki explains how to install and set up PostgreSQL.

Change the listening port

In case you need to change the listening port (for example when I had to test postgresql on docker containers while keeping postgresql running locally) Edit the configuration file /etc/postgresql/13/main/postgresql.conf and change the port definition as follows

port = 5433

Start and stop the server

sudo service postgresql start
sudo service postgresql stop

Create a user and a database

Add system users at the regular bash shell before you can add them in postgresql. I call this user “rdb” because I plan to use R to connect to the database and “R” cannot be used as a user name according to system policy.

sudo adduser rdb --disabled-password 

Login as the postgres user!

sudo -i -u postgres
# In case postgresql is not on the default port 5432
export PGPORT=5433

Now logged in as postgres@machine_name in the regular bash shell:

  1. Create a user with the same name as the system user created above

    createuser –pwprompt rdb # Create a user for myself createuser paul

See also ~/.pgpass to store the password.

  1. Create a database owned by that user

    createdb -O rdb tradeflows createdb -O rdb biotrade createdb -O paul tradeflows_migrated

Note: createdb is a wrapper around the SQL command CREATE DATABASE. At the postgreSQL command prompt, you can use

create database biotrade;
  1. Grant rights to another user. First connect to the database

     psql -d biotrade -h localhost -U rdb

    Then grand privileges

     GRANT ALL PRIVILEGES ON ALL TABLES IN SCHEMA raw_comtrade TO paul;

Connect to a database

Configuration file ~/.pgpass

This is the preferred way to connect to the database for scripts. Create a ~/.pgpass file to store the connection details for that user:

echo 'localhost:*:tradeflows:rdb:localhost' >> ~/.pgpass

Secure the file:

chmod 600 ~/.pgpass

The ~/.pgpass file should contain lines of the following format:

hostname:port:database:username:password

“Each of the first four fields can be a literal value, or *, which matches anything.”

Therefore I use * for the port so that it matches any ports.

Service file ~/.pg_service.conf

The postgresql service file enables “connection parameters to be associated with a single service name”. You can specify the database connection parameters in a file called ~/.pg_service.conf in you home.

# Trade database
[biotrade]
host=localhost
port=5432
user=rdb
dbname=biotrade
password=localhost

And you would then connect to the database with:

psql service=biotrade

Connect to the database

Using the credentials stored in ~/.pgpass

psql -d tradeflows -h localhost -U rdb
psql -d biotrade -h localhost -U rdb 

Specifying the port (normally not necessary if it’s specified in ~/.pgpass)

psql -d biotrade -h localhost -U rdb -p 5433

It’s possible to specify a connection URI of the form:

psql postgresql://rdb@localhost:5432/biotrade
# And since I have a URL set as an environmental variable
psql $BIOTRADE_DATABASE_URL

Postgresql connection URIs

“It is possible to specify multiple host components, each with an optional port component, in a single URI. A URI of the form postgresql://host1:port1,host2:port2,host3:port3/ is equivalent to a connection string of the form host=host1,host2,host3 port=port1,port2,port3. As further described below, each host will be tried in turn until a connection is successfully established.”

Connect by changing the user at the system level

Login as the user who is the owner of that database

sudo -i -u rdb

Connect to the database

psql tradeflows

Data types

Character

Datatype character:

Name Description
character varying(n), varchar(n) variable-length with limit
character(n), char(n) fixed-length, blank padded
text variable unlimited length

Tip:

“There is no performance difference among these three types, apart from increased storage space when using the blank-padded type, and a few extra CPU cycles to check the length when storing into a length-constrained column. While character(n) has performance advantages in some other database systems, there is no such advantage in PostgreSQL; in fact character(n) is usually the slowest of the three because of its additional storage costs. In most situations text or character varying should be used instead.”

SO answer

“Generally, there is no downside to using text in terms of performance/memory. On the contrary: text is the optimum. Other types have more or less relevant downsides. text is literally the”preferred” type among string types in the Postgres type system, which can affect function or operator type resolution.”

differences between text and varchar

To sum it all up: - char(n) – takes too much space when dealing with values shorter than n (pads them to n), and can lead to subtle errors because of adding trailing spaces, plus it is problematic to change the limit - varchar(n) – it’s problematic to change the limit in live environment (requires exclusive lock while altering table) - varchar – just like text - text – for me a winner – over (n) data types because it lacks their problems, and over varchar – because it has distinct name

In SQLite as well, character(n) doesn’t seem to have any performance advantage.

SQLite data types

Note that numeric arguments in parentheses that following the type name (ex: “VARCHAR(255)”) are ignored by SQLite - SQLite does not impose any length restrictions (other than the large global SQLITE_MAX_LENGTH limit) on the length of strings, BLOBs or numeric values.

Delete data

Delete content from a table

Keep the table structure but delete it’s content. Delete oil products from the Comtrade monthly table:

# Check results
select distinct(product_code) from raw_comtrade.monthly where product_code like '15%';
# Delete
DELETE FROM raw_comtrade.monthly WHERE product_code like '15%';
DELETE FROM raw_comtrade.monthly WHERE product_code like '44%';

Drop a db, schema or table

Drop a database

postgres=# drop database databasename;

Drop a schema

psql -d tradeflows -h localhost -U rdb -c "DROP SCHEMA raw_comext"
ERROR:  cannot drop schema raw_comext because other objects depend on it
DETAIL:  table raw_comext.monthly depends on schema raw_comext
HINT:  Use DROP ... CASCADE to drop the dependent objects too.

Drop a schema and all its table with a drop in cascade

psql -d tradeflows -h localhost -U rdb -c "DROP SCHEMA raw_comext CASCADE"

Drop a table

drop table raw_comtrade.yearly;

Optimize

Indexes

Partial indexes

Create a partial index on the first 4 characters of the product code

CREATE INDEX ix_raw_comtrade_yearly_product_code_prefix_4
ON raw_comtrade.yearly (LEFT(product_code, 4));

The database structure is defined as SQLAlchemy objects, then later translated in either SQLite or PostGreSQL dialect. Show the database structure as defined in PostGreSQL:

pg_dump --schema-only -t raw_comtrade.yearly $BIOTRADE_DATABASE_URL > table_structure_dump.sql

Create an additional index to speed up query on the first 4 digits of the product code (it doesn’t actually speed things up, left here for information purposes).

psql $BIOTRADE_DATABASE_URL
CREATE INDEX ix_raw_comtrade_yearly_product_code_prefix_4
ON raw_comtrade.yearly (LEFT(product_code, 4));

Remove an index

SET search_path TO raw_comtrade, public;
DROP INDEX ix_raw_comtrade_yearly_product_code_prefix_4;

Rename

Rename a table

Rename a table to another name with _backup at the end

ALTER TABLE raw_comtrade.yearly RENAME TO yearly_backup;
ALTER TABLE raw_comtrade.monthly RENAME TO monthly_backup;
ALTER TABLE raw_comtrade.product RENAME TO product_backup;

Rename a schema

Rename a schema

ALTER SCHEMA raw_comtrade RENAME TO raw_comtrade_backup;

Dump

Examples copied from man pg_dump. To dump a database called mydb into a SQL-script file:

$ pg_dump mydb > db.sql

To reload such a script into a (freshly created) database named newdb:

$ psql -d newdb -f db.sql

Remove statements

The dump contained the following two statements which I removed:

SET default_tablespace = '';

SET default_with_oids = false;

I removed default table space because the official documentation for table spaces says it doesn’t make sense to have more than one table space per system.

I removed default_with_oids because this posrt explains that it’s used to enable a legacy feature.

Duplicate a table

Duplicate a table:

create table raw_comtrade.monthly_copy as (select * from raw_comtrade.monthly);

Help

Help about psql commands:

?

Help about SQL commands:

\h
\h select

Information

List schemas, tables, columns

List databases

\l

Connect to a database

\c db_name

List schemas

\dn

List tables

\dt

List tables for a particular schema ilist all tables in schema

\dt raw_comext.*;
\dt raw_comtrade.*;
\dt raw_faostat*;

Describe data types

Describe a table to show the table structure, including column names and data types

\d raw_comtrade.yearly;
\d raw_comtrade.monthly;

Load data into the database

SQL files

Read commands from a file with the -f argument:

psql -d biotrade -h localhost -U rdb -f ~/rp/biotrade/biotrade/config_data/comtrade.sql

MariaDB/MySQL and PostgreSQL

Compare commands

Compare common commands between MariaDB and PostgreSQL.

MariaDB/MySQL PostgreSQL
show databases; \l
connect db_name; \c db_name
show tables; \dt

Migrate from MySQL into PostgreSQL

Transfer data from MySQL to PostgreSQL:

pgloader mysql://user@localhost/sakila postgresql:///pagila

For my own database I used:

sudo su postgres
createdb -O paul tradeflows_migrated
exit # Go back to user "paul"
pgloader mysql://paul@localhost/tradeflows postgresql:///tradeflows_migrated

SELECT

Connect to the database (with the credentials defined in ~/.pgpass)

psql biotrade

postgresql.org the WHERE Clause

Here are some examples of WHERE clauses:

SELECT ... FROM fdt WHERE c1 > 5

SELECT ... FROM fdt WHERE c1 IN (1, 2, 3)

SELECT ... FROM fdt WHERE c1 IN (SELECT c1 FROM t2)

SELECT ... FROM fdt WHERE c1 IN (SELECT c3 FROM t2 WHERE c2 = fdt.c1 + 10)

SELECT ... FROM fdt WHERE c1 BETWEEN (SELECT c3 FROM t2 WHERE c2 = fdt.c1 + 10) AND 100

SELECT ... FROM fdt WHERE EXISTS (SELECT c1 FROM t2 WHERE c2 > fdt.c1)

When using string values, single quotes are required (otherwise the ERROR: column "02" does not exist is returned)

select distinct(reporter) from raw_comtrade.yearly_hs2 where commodity_code = '02';

Like

Select product codes that start with "15"

select distinct(product_code) from raw_comtrade.monthly where product_code like '15%';

In

Select product codes within a list, then group by and count

biotrade=> SELECT product_code, count(*) FROM raw_comtrade.yearly WHERE 
product_code IN ('230400', '150710', '120190') group by product_code;
 product_code | count
--------------+-------
 120190       | 29727
 150710       | 33790
 230400       | 49398

Road map

postgresql.org Roadmap

Schemas

Create a schema

create schema raw_comtrade;

List schemas

\dn

Official documentation postgresql.org schemas

“A database contains one or more named schemas, which in turn contain tables. Schemas also contain other kinds of named objects, including data types, functions, and operators. The same object name can be used in different schemas without conflict; for example, both schema1 and myschema can contain tables named mytable. Unlike databases, schemas are not rigidly separated: a user can access objects in any of the schemas in the database they are connected to, if they have privileges to do so.”

“There are several reasons why one might want to use schemas:”

  • “To allow many users to use one database without interfering with each other.”
  • “To organize database objects into logical groups to make them more manageable.”
  • “Third-party applications can be put into separate schemas so they do not collide with the names of other objects.”

“Schemas are analogous to directories at the operating system level, except that schemas cannot be nested.”

Tables

List tables for a particular schema:

\dt raw_comtrade.*;

Export a table structure

Export the create statement for an existing table

pg_dump -t 'raw_comtrade.yearly_2_digit' --schema-only biotrade  >> /tmp/comtrade_yearly.sql

Blogs and other resources

Dimitri Fontaine 2014 Why is pgloader faster ?

“In searching for a modern programming language the best candidate I found was actually Common Lisp.”

See also a more detailed quote under the Lisp page.

Wikipedia paragraph on the pronunciation

“The original standard declared that the official pronunciation for”SQL” was an initialism: /ˌɛsˌkjuːˈɛl/ (“ess cue el”). Regardless, many English-speaking database professionals (including Donald Chamberlin himself) use the acronym-like pronunciation of /ˈsiːkwəl/ (“sequel”), mirroring the language’s prerelease development name, “SEQUEL”.”