Visual Analytics Primer Part 3: Local Databases

Alright, so you feel comfortable with relational database theory. Now you actually need to set up a database. What will that look like? In this tutorial, we’ll walk through setting up a PostgreSQL database on your computer, why it’s useful, and how to populate it with data.

What’s a local database?

A local database is one which is hosted on the same computer as the one that’s trying to access the database. They are not and should not be made to be publicly accessible for security purposes. However, they’re very useful for people who are trying to work with more data than can be loaded into R, Python, or other similar tools at one time. The kind of database that we’ll be setting up is a SQL database, meaning that it follows the relational model of database theory. We’re using the PostGres variety of SQL, out of the personal preference of the author. Other kinds of SQL databases are just as viable. PostGreSQL can be downloaded here. Download the appropriate form of PostGres for your operating system.

Installation

Once you’ve downloaded PostGreSQL, click on the installer to start it up. It’ll take a little bit of time to get installed as databases take a lot of space to install. Important note — you don’t need Stack Builder and can uncheck the box to install it. Nothing bad will happen if you do install it, but it will open another installation window which might be confusing. Make sure to take note of your master password when you enter it into the database. It will be important for the rest of the tutorial.

There are two main ways to access your PostGres database outside of tools like Python and R. You can either use pgAdmin, the visual interface for working with PostGres, or psql, the terminal interface for working with it. My personal preference is psql because I prefer to write my SQL commands, but neither option is incorrect. One important note, though, is that the two have slightly different syntax from one another at times.

An important distinction to make note of is that Postgres provides servers, which will be host to one or multiple databases, not just the database itself. The database itself will hold multiple tables, A.K.A. relations. Multiple tables from a database can be joined to one another, and with specific database management systems, from a server.

Accessing the Database

Accessing the database is simple, but will depend on what your personal preference is. I like to work with it through psql, the command line interface. pgAdmin is the other option, a graphical user interface which will more readily give you access to information about the database. It’s better to start off learning pgAdmin. I primarily use psql because I learned SQL primarily from a command line interface, and old habits die hard. Whichever you choose, there will be an application shortcut for them which you can click on to open them up. psql will open up a window which looks like terminal, while pgAdmin will open up a window in your default browser.

Navigating psql

When you’ve opened the psql window on your computer, press enter until it prompts you for your password. Then, enter your password. You’re accessing a local database, so the default configurations that it suggests are perfect. The syntax of psql differs in several small, but significant ways from that of pgAdmin. When you’ve connected, you won’t be able to do anything just yet. Enter \l in order to see a list of open databases. You can also create one by writing “CREATE DATABASE name;”. Of note — in SQL you will almost always need to terminate your commands with a semicolon. There are a few notable exceptions in psql, which begin with a forward slash. Of note, “\connect name,” which allows you to connect to a specific database, does not require a semicolon.

Navigating pgAdmin

pgAdmin is a GUI intended for working with PostGreSQL, which means that it comes with a few more amenities than psql does. Thanks to this, all you need to do is open it up and log in using your master password. From there, you can navigate the databases using the menu on the left of the screen. Clicking on the small carrots next to words you see on the menu will open up their sub-menus.

Populating the database

If you’d like to read a CSV file into your database, you should do it in psql. This is because pgAdmin is running on the PostGres server on your computer, which while on your computer, is not “of” your computer. This is without specific permissions which are beyond the scope of this tutorial. Therefore, it does not have file access. psql, in contrast, is a wrapper on your computer which accesses the server. It’s a bit of a weird distinction, but long story short, it’s easier to do this part from psql.

You can create the table into which you’d like to copy the data from either psql or pgAdmin, depending on your preference. If you’d like to do it through psql, follow the first half of this tutorial. The way that we’ll execute the COPY command will be different. Otherwise, open up pgAdmin, click on the server you’d like, open the databases tab by clicking on it, click on the database that you’d like, and then right click on the tables drop down under schemas. Select the option to create a new table.

This will take you to the tool for editing tables. Here you can name your table, click on the columns tab, and then create columns for your table. Make sure that each column in your table matches up with a column in your CSV file! Both methods will make a table.

Now that you have a table, open up psql, log in, and connect to the database. Enter “\COPY table_name FROM ‘absolute file path’ DELIMITER ‘,’ CSV HEADER” all on one line. It must be on one line, otherwise you will run into issues. From there, if everything’s working, your data should load into the database!

Querying

Now here’s the fun part — querying the database is when you’re able to actually ask it for information. For the purposes of this tutorial, I’ll assume that you’re already familiar with basic SQL and I’ll just tell you how to query through psql or pgAdmin. Thankfully, both are easy. With psql, once you’ve connected to the right database, you can just go ahead and write a query. With pgAdmin, you’ll need to select the right database in the drop down, and then click the following icon.

From there, you’ll open the query tool and you can start writing up your queries!

In psql, you only need to log into the appropriate database and you can write queries from there.