Blog > Big Data

Connecting PostgreSQL to your script (Python)

Introduction

Nowadays PostgreSQL is probably one of the most powerful relational databases among the open-source solutions. Its functional capacities are no worse than Oracle’s and definitely way ahead of the MySQL.

So if you are working on apps using Python, someday you will face the need of working with databases. Luckily, Python has quite a wide amount of packages that provide an easy way of connecting and using databases. In this article, we will go through the most common packages for dealing with PostgreSQL and show the way of using one of them.

Python modules for working with PostgreSQL

The way you access the database via Python environment usually depends on personal choices and development specificities. One of the most common and easiest ways is to establish connections using special Python drivers among which the most popular are psycopg2, PyGreSQL, py-postgresql, and pg8000.

psycopg2

psycopg2 is probably one of the most popular packages for interaction with PostgreSQL from the Python environment. Written on C programming language with libpq wrapper, it provides a wide range of operations for database manipulations.

psycopg2 provides both client-side and server-side cursors as well as asynchronous communication and notifications and a “COPY TO/ FROM” support. It should be noted that the package provides most of Python data types support with their adaptation to match PostgreSQL data types.

Among the key features of this package are the support of multiple connections and connection objects, various methods of transaction and its management, pool of connections, auto filtering and async queries and cursors objects. Plus, columns from the database are returned via Python dictionary with their names.

Among the possible weaknesses of the package, we can note a lack of documentation and mostly outdated code examples.

PyGreSQL

PyGreSQL is the first PostgreSQL adapter and one of the open-source Python modules for working with PostgreSQL databases that is actively developing. This package embeds the PostgreSQL query library to provide easy use of all of the PostgreSQL database manipulation features from Python environment.

Despite the fact that this module provides great opportunities for working with databases, some users found its problems with working with cursors objects.

py-postgresql

py-postgresql is a Python3 module that provides wide abilities to interact with PostgreSQL including a high-level driver to make working with databases deeper.

Among the main disadvantages of this package are the fact that it works only on the Python3 environment and doesn’t have direct support for high-level async interfaces.

pg8000

The pg8000 is a pure Python module for dealing with PostgreSQL interactions that has wide documentation and is actively developing. It should be noted that this module complies with Python Database API 2.0 that basically provides the user a broader reach of DB connectivity from Python environment. Being pure Python, the module allows users using it in AWS Lambda functions without any extra work.

For now, this module is actively reviewing and developing and therefore some bugs can be faced while working with it.

Despite the fact that many more drivers for PostgreSQL interacting are available, some of them are outdated or have limits in usage. Still, the listed above provide users great opportunities for various databases manipulations.

Working with queries

We covered some of the most common packages for connecting PostgreSQL to the Python app and now it’s time to see it in use. We chose for working the psycopg2 module and continue reviewing connecting and working with databases with its operations as an example. It is assumed that PostgreSQL is already installed and run on your machine.

1. Installing package.

Run

conda install psycopg2

if you are using anaconda or

pip install psycopg2 

via pip package manager.

2. Once the package was downloaded, let’s check the accessibility to the database. 

To check the connection, we will write a simple script.

Be sure, that you provided own specific database name, username, and password.

Save the script and run it with Python test.py command. As a result, we will get an empty array ([]) symbolizing that connection works properly.

Now it’s time to explore some basic operations with databases. First, let’s add to our database some tables we will work with. Usually data stores in tables in .csv format so why not start exploring PostgreSQL interactions from adding some data to database from .csv table?

For the next examples, we will use parts of the datasets that can be found on Kaggle: 

  1. Chicago census data;
  2. Chicago public schools.

First, we establish connection and create tables with the corresponding table names.

connect_str = "dbname=yourdbname user=yourname host='localhost' " + "password='yourpwd'"
conn = psycopg2.connect(connect_str)
cursor = conn.cursor()
cursor.execute("""
CREATE TABLE census(id integer PRIMARY KEY, 
COMMUNITY_AREA_NUMBER float, 
COMMUNITY_AREA_NAME text, 
PERCENT_OF_HOUSING_CROWDED float, 
PERCENT_HOUSEHOLDS_BELOW_POVERTY float, 
PERCENT_AGED_16+_UNEMPLOYED float, 
PERCENT_AGED_25+_WITHOUT_HIGH_SCHOOL_DIPLOMA float, 
PERCENT_AGED_UNDER_18_OR_OVER_64 float, 
PER_CAPITA_INCOME integer, 
HARDSHIP_INDEX float)
""")

In such a way another table will be created.

After this, we can copy data from corresponding .csv files to our tables.

with open('CENSUS.csv', 'r') as f:
next(f)
cursor.copy_from(f, census, sep=',')
conn.commit()

The commit() method is needed to make changes to our database persistent.

It should be noted that while copying data from .csv tables, we do not need a header, so we skip it.

The same operation will be done to another table. And now we can write some simple queries.

  1. Let’s find out which Community Areas in CENSUS table start with letter ‘B’.
sql = ‘SELECT community_area_name 
FROM census
WHERE community_area_name LIKE %s’
community_area_name = ‘B%’
cursor.execute(“sql, (community_area_name)”)
cursor.fetchall()


  1. Let’s list the top 5 Community Areas by average College Enrollment.
sql =  ‘SELECT community_area_name, AVG(college_enrollment) AS avg_enrollment 
FROM schools GROUP BY community_area_name limit 5’
cursor.execute(“sql”)
cursor.fetchall()


Now, let’s find the Per Capita Income of the Community Area which has a school Safety Score of 1.

sql = SELECT community_area_number, community_area_name, per_capita_income 
FROM census
WHERE community_area_number IN ( select community_area_number
FROM schools
WHERE safety_score = '1' )’ cursor.execute(“sql”)

As you have already noticed, we used fetchall() method. The package has some quite useful - among many other useful methods - options to retrieve a single row from a table (fetchone()) or all the rows according to query (fetchall()).

As soon as all the manipulations with the database are finished, the connection must be closed. It can be done with two methods: cursor.close() and conn.close().

Conclusion

Establishing connection and interaction with PostgreSQL from the Python environment with the help of various drivers becomes easy and quick. Every module has its own pros and cons that basically allows each user to choose the suitable one. The psycopg2 module used for the example provides an easy and clear way to establish a connection and set queries giving opportunities for using flexible queries.