Access PostgreSQL with Python
http://wiki.postgresql.org/wiki/Psycopg2_Tutorial There are any number of programming languages available for you to use with PostgreSQL. One could argue that PostgreSQL as an Open Source database has one of the largest libraries of Applic
http://wiki.postgresql.org/wiki/Psycopg2_Tutorial
There are any number of programming languages available for you to use with PostgreSQL. One could argue that PostgreSQL as an Open Source database has one of the largest libraries of Application Programmable Interfaces (API) available for various languages.
One such language is Python and it happens to be one of my favored languages. I use it for almost all hacking that I do. Why? Well to be honest it is because I am not that great of a programmer. I am a database administrator and operating system consultant by trade. Python ensures that the code that I write is readable by other more talented programmers 6 months from when I stopped working on it.
Nine times out of ten, when I am using Python, I am using the language to communicate with a PostgreSQL database. My driver of choice when doing so is called Psycopg. Recently Psycopg2 has been under heavy development and is currently in Beta 4. It is said that this will be the last Beta. Like the first release of Pyscopg the driver is designed to be lightweight, fast.
The following article discusses how to connect to PostgreSQL with Psycopg2 and also illustrates some of the nice features that come with the driver. The test platform for this article is Psycopg2, Python 2.4, and PostgreSQL 8.1dev.
Psycopg2 is a DB API 2.0 compliant PostgreSQL driver that is actively developed. It is designed for multi-threaded applications and manages its own connection pool. Other interesting features of the adapter are that if you are using the PostgreSQL array data type, Psycopg will automatically convert a result using that data type to a Python list.
The following discusses specific use of Psycopg. It does not try to implement a lot of Object Orientated goodness but to provide clear and concise syntactical examples of uses the driver with PostgreSQL. Making the initial connection:
#!/usr/bin/python2.4 # # Small script to show PostgreSQL and Pyscopg together # import psycopg2 try: conn = psycopg2.connect("dbname='template1' user='dbuser' host='localhost' password='dbpass'") except: print "I am unable to connect to the database"
The above will import the adapter and try to connect to the database. If the connection fails a print statement will occur to STDOUT. You could also use the exception to try the connection again with different parameters if you like.
The next step is to define a cursor to work with. It is important to note that Python/Psycopg cursors are not cursors as defined by PostgreSQL. They are completely different beasts.
cur = conn.cursor()
Now that we have the cursor defined we can execute a query.
cur.execute("""SELECT datname from pg_database""")
When you have executed your query you need to have a list [variable?] to put your results in.
rows = cur.fetchall()
Now all the results from our query are within the variable named rows. Using this variable you can start processing the results. To print the screen you could do the following.
print "\nShow me the databases:\n" for row in rows: print " ", row[0]
Everything we just covered should work with any database that Python can access. Now let's review some of the finer points available. PostgreSQL does not have an autocommit facility which means that all queries will execute within a transaction.
Execution within a transaction is a very good thing, it ensures data integrity and allows for appropriate error handling. However there are queries that can not be run from within a transaction. Take the following example.
#/usr/bin/python2.4 # # import psycopg2 # Try to connect try: conn=psycopg2.connect("dbname='template1' user='dbuser' password='mypass'") except: print "I am unable to connect to the database." cur = conn.cursor() try: cur.execute("""DROP DATABASE foo_test""") except: print "I can't drop our test database!"
This code would actually fail with the printed message of "I can't drop our test database!" PostgreSQL can not drop databases within a transaction, it is an all or nothing command. If you want to drop the database you would need to change the isolation level of the database this is done using the following.
conn.set_isolation_level(0)
You would place the above immediately preceding the DROP DATABASE cursor execution.
The psycopg2 adapter also has the ability to deal with some of the special data types that PostgreSQL has available. One such example is arrays. Let's review the table below:
Table "public.bar" Column | Type | Modifiers --------+--------+----------------------------------------------------- id | bigint | not null default nextval('public.bar_id_seq'::text) notes | text[] | Indexes: "bar_pkey" PRIMARY KEY, btree (id)
The notes column in the bar table is of type text[]. The [] has special meaning in PostgreSQL. The [] represents that the type is not just text but an array of text. To insert values into this table you would use a statement like the following.
foo=# insert into bar(notes) values ('{An array of text, Another array of text}');
Which when selected from the table would have the following representation.
foo=# select * from bar; id | notes ----+---------------------------------------------- 2 | {"An array of text","Another array of text"} (1 row)
Some languages and database drivers would insist that you manually create a routine to parse the above array output. Psycopg2 does not force you to do that. Instead it converts the array into a Python list.
#/usr/bin/python2.4 # # import psycopg2 # Try to connect try: conn=psycopg2.connect("dbname='foo' user='dbuser' password='mypass'") except: print "I am unable to connect to the database." cur = conn.cursor() try: cur.execute("""SELECT * from bar""") except: print "I can't SELECT from bar" rows = cur.fetchall() print "\nRows: \n" for row in rows: print " ", row[1]
When the script was executed the following output would be presented.
[jd@jd ~]$ python test.py Rows: ['An array of text', 'Another array of text']
You could then access the list in Python with something similar to the following.
#/usr/bin/python2.4 # # import psycopg2 # Try to connect try: conn=psycopg2.connect("dbname='foo' user='dbuser' password='mypass'") except: print "I am unable to connect to the database." cur = conn.cursor() try: cur.execute("""SELECT * from bar""") except: print "I can't SELECT from bar" rows = cur.fetchall() for row in rows: print " ", row[1][1]
The above would output the following.
Rows: Another array of text
Some programmers would prefer to not use the numeric representation of the column. For example row[1][1], instead it can be easier to use a dictionary. Using the example with slight modification.
#/usr/bin/python2.4 # # # load the adapter import psycopg2 # load the psycopg extras module import psycopg2.extras # Try to connect try: conn=psycopg2.connect("dbname='foo' user='dbuser' password='mypass'") except: print "I am unable to connect to the database." # If we are accessing the rows via column name instead of position we # need to add the arguments to conn.cursor. cur = conn.cursor(cursor_factory=psycopg2.extras.DictCursor) try: cur.execute("""SELECT * from bar""") except: print "I can't SELECT from bar" # # Note that below we are accessing the row via the column name. rows = cur.fetchall() for row in rows: print " ", row['notes'][1]
The above would output the following.
Rows: Another array of text
Notice that we did not use row[1] but instead used row['notes'] which signifies the notes column within the bar table.
A last item I would like to show you is how to insert multiple rows using a dictionary. If you had the following:
namedict = ({"first_name":"Joshua", "last_name":"Drake"}, {"first_name":"Steven", "last_name":"Foo"}, {"first_name":"David", "last_name":"Bar"})
You could easily insert all three rows within the dictionary by using:
cur = conn.cursor() cur.executemany("""INSERT INTO bar(first_name,last_name) VALUES (%(first_name)s, %(last_name)s)""", namedict)
The cur.executemany statement will automatically iterate through the dictionary and execute the INSERT query for each row.
The only downside that I run into with Pyscopg2 and PostgreSQL is it is a little behind in terms of server side support functions like server side prepared queries but it is said that the author is expecting to implement these features in the near future.