Graph Databases, Neo4j, and Py2neo for the Absolute Beginner

Ananda Montoly
Smith-HCV
Published in
12 min readJan 14, 2020

--

Panama Papers represented in neo4j

Are you interested in graph theory and NoSQL database design? Graph databases may offer you the tools that you need. Whether it comes to pattern matching, recommender systems, fraud detection, social media, or more, graph databases are a great option. Neo4j is the most popular open-source graph database management system available to the public. In the following article, we’ll discuss graphs, relational databases, neo4j, and py2neo for the beginners. If you’ve taken an introductory python course [CSC 111 at Smith College], you’ve got all the tools you need to get started.

So what is a graph?

If you’ve taken discrete math [MTH 153 at Smith College], a lot of this will be familiar to you already. Imagine that you are the owner of three dogs named Rex, Fido, and Spot. If you want to visually show out the relationship between your dogs, you could first draw out everyone as bubbles on a sheet of paper. You, Rex, Fido, and Spot would each be a bubble on the paper. However, you want to show more than just those bubbles. You want to represent your relationships with your dogs. You can draw out those relationships as lines in between those bubbles. For example, you can have lines which point from any of the dogs to you, representing the “Pet Of” relationship. So if a line were drawn from Rex to you, it would show “Rex has the ‘Pet Of’ relationship with You”, which means that Rex is your pet. Lines can be directed or not. If they’re directed, the relationship goes one way, like the “Pet Of” relationship, but if they’re not directed, the relationship goes both ways, like, for example the “Loves” relationship between one of your pets and you.

You, Rex, Fido, and Spot

This image above is a very basic graph. You are your dogs are each nodes, entities which are stored in the database. The relationships between you and your pets are called edges. A graph is any structure made up of a series of nodes which may be connected by edges. They’re an incredibly useful structure in computer science. The internet, for example, can be considered a graph, with different web pages as nodes and hyperlinks between them as edges. The first algorithm behind Google search ranking worked by constructing a graph of the internet and then weighting different pages based on how many hyperlinks went to them. If you’ve ever wondered how Facebook suggests people you may know, one of the key parts of this algorithm is a graph. It adds everyone to a massive graph and suggests people based on your friends’ connections. Graphs have many applications, so there are many tools out there for using them.

A graph database management system is any tool which takes this kind of structure and plunks it into the computer. They also provide a query language to quickly search for information in it. The database itself is all of that info plunked on the computer. neo4j is a graph database management system (hereafter abbreviated to GDBMS), and the one we’re talking about in this tutorial.

How does this compare to relational databases?

If you’re coming to this article, you likely have some experience in relational database design. If not or if you’ve forgotten most of it, a quick refresher — relational databases are databases designed using relations, tables of information which look like Excel sheets. The names for your columns are called your schema, and for every row of your “Excel” sheet, you must have something in each column, even if it’s just a null. Different columns are attributes. For example, if you had a person named “Alice” in your relation, her name and her age would each be an attribute, meaning it would be a column on the table. In order to query this table for information, you can use “Standard Query Language”, often abbreviated to “SQL.” SQL is a query language, not a programming language, but it is incredibly robust and gives one the ability to search across multiple tables and for specific attributes. If you’re familiar with sets, relational database tables are each sets, and we use operations like the Cartesian product or set complement in order to work with multiple tables. Where graph databases are formed off graphs, relational databases are formed off of sets. There are many other concepts, from Boyce-Codd Normal Form to the basics of set theory, that I recommend you explore but that we won’t be covering in this tutorial.

Graph databases bear some similarities to relational databases, but are also incredibly different. One doesn’t need to adhere to a schema when using a graph database and they do not depend on set theory. They’re also much better at modeling relationships. With relational databases, a relationship between two items takes slightly more math to model than it does with a graph database. With many relationships and millions of data points, this extra computational cost exceeds a reasonable amount. Graph databases require significantly less math to model those same relationships.

Installing neo4j

Installing neo4j is fairly simple, but just in case, we’ll go over the process. Go to this page, click download, and fill out the form that it asks for. It’ll give you a key after this page. Copy and paste that key somewhere or leave this page open and then download neo4j. Once you’ve downloaded it, open it up to get the installation wizard and then you’ll be asked for the activation key. Just paste that back in and you’re ready to start using neo4j!

Using neo4j

This screen should be similar to what you see. In my projects section, I have two different projects, but you’ll see just one called “My Project.” I’m going to be working in a project called “Example Project” throughout the course of this tutorial. Now, it’s time to get started on the more interesting stuff. In order to make your first graph, all you have to do is click the “Add Graph” button. Choose to make a local graph. You’ll then be prompted to name the graph and to provide a password. I made the graph’s name “Graph” and the password “password” since we won’t be working with any particularly sensitive information.

Your screen should now look like this

Click the start button in order to get the database running. The graph box should then look like this.

From there, click the manage button. It will lead you to the management menu, where you should see a button that says browser. Press that button and it will open up a new window which should look like this.

This is where you can experiment with neo4j, write in code, and check visualizations. It’s an incredibly useful tool. You can type in commands in that top box, next to the dollar sign. I highly recommend the tutorial series that it recommends. Those are very helpful.

Creating a Graph

Now, we want to go through the process of making a graph. Let’s say that we have a group of friends named Alice, Bob, and Cam. Each of them will be a node connected by edges in our graph. The first thing we need to do is make nodes for each of them. We’ll start with Alice. The syntax for creating a node in neo4j goes as follows:

CREATE (n:label {property:”Value”})

If you want to have multiple labels or properties, you can do so like this:

CREATE (n:label:label2:label3 {property1:”Value1”, property2:”Value2, property3:”Value3”})

So, if we’re interested in adding Alice first, we can create a node for her using the following syntax:

CREATE (n:person {name:”Alice”})

We can do the same thing for Bob and Cam. A quick tip for neo4j is that if you want to see all of the nodes in your graph, you can enter the following into the command line:

MATCH (n) RETURN (n)

Doing this and pressing enter, we can see Alice, Bob, and Cam as nodes. Neo4j provides tools for graph visualization, which we will be using throughout this guide. This graph is of the three friends. If you wish to close it, go to the top right corner of the box and press the x. Otherwise, you can pan around the screen, drag the nodes, and even go into full screen to explore the graph. This gets more interesting with more elements and with relationships.

Each friend is a node in this graph

Of course, this isn’t a very good graph because none of our nodes are connected to one another. We can rectify that right now by creating relationships. Let’s say that even though both Alice and Bob are friends with Cam, they’re best friends with one another. How would one model this relationship? Neo4j provides the tools to model it. Just like we can have nodes with labels and properties, we can have relationships between two nodes with labels and properties. The syntax is as follows:

MATCH (a:person), (b:person)

WHERE a.name=”Alice” and b.name =”Bob”

CREATE (a)-[r:FRIEND {type:”best”}]->(b)

return r

This creates a best friend relationship between Alice and Bob. R is the variable representing the relationship between Alice and Bob, and similar to nodes, it has the label “FRIEND” and the properties within the curly brackets. We can make similar relationships between Alice and Cam or Bob and Cam while omitting the brackets with “type:’best’” in them in order to make normal friendships. An interesting thing to note about this relationship is that it is a directed one, from Alice to Bob, which is why there is an arrow pointing at the variable b. This process is useful, but a bit tedious, which is why we’ll go over doing it in py2neo later on.

Cypher

So far we’ve created nodes and we’ve created relationships, but we haven’t explored the language we’re using to do so. Cypher is an open-source graph query language used for neo4j, among other graph applications. It has many similarities to SQL, but the syntax is slightly different and the underlying mechanics are very different. Instead of querying across relations, we query across graphs. This means the mechanics must be much different.

Let’s say that we have our friend graph and we want to return the name of every single one of our friends. Saying something along the lines of the following will return every node in the graph:

MATCH (n) RETURN n

However, this gives us too much information. To get just their names, we can say the following:

MATCH (n) RETURN n.name

This is a pretty simple query to make in neo4j, but it’s a good one to start out learning the syntax. If you’re familiar with SQL, this equivalent to “SELECT name FROM graph;” If not, this is still pretty simple. By saying MATCH n, we take the variable n, and since we have nothing narrowing it down, n represents every element in the entire graph. RETURN n.name gives us back the name for n, and since n is every node, that means we get the name of everything in our graph.

If you want to delete everything from your graph instead, you can say the following:

MATCH (n) DETACH DELETE n

Now, if you’re looking to query a specific node, a new line is added to your query. Let’s say that we’re only looking to return the node for Alice in this graph. We can do as follows:

MATCH (a)

WHERE a.name = “Alice”

RETURN a

If you’re familiar with SQL, this where clause is almost exactly the same as it is in SQL, except for the fact that you use a variable. Where you would have previously said “SELECT * FROM graph WHERE name=”Alice””, you now must choose a variable to represent what you’re searching for.

There are many more complex ways to query in Cypher, for which I recommend you check out the following tutorials. Otherwise, the information above will cover basic searches across a graph database.

Py2neo

Cypher is a great language, but I’m personally a big fan of python, especially for automating the process of creating nodes and relationships. In order to do this, I use py2neo, a python package which works with neo4j. You can install it with just the command “pip install py2neo” from the command line. From there, open up your preferred python editor (I will be using python’s built in editor IDLE for simplicity’s sake), and we can get started! The most important thing to start is to have neo4j open and the graph that you want to use running.

The bolt port is circled in red

Go to the manage page for your graph (the one where you can open up your browser) in order to continue. Once you have it open, you’ll be able to see this screen. Note down the bolt port of your graph. We’ll be using this to access the graph from python. It’s also helpful to open the browser at this point, if you don’t already have it open. While we’ll be working primarily in python, seeing neo4j’s graph visualizations is only possible if you are using the browser. Now, open up a new python file and name it whatever you want. Next, we’re going to start importing things into our program.

These are the first three lines

Make a main function for your code. Now, the first thing to do is link to our neo4j graph. Since you have the bolt port ready, let’s start by connecting like so:

Put in bolt://localhost:(the bolt port you have listed on your manage screen) as an argument for graph and we can now begin creating nodes and relationships. Say that we want to model our original graph. There’s you and your dogs Rex, Fido, and Spot. For the sake of this graph, we’ll say that your name is Alex. We want to have four different nodes then. Rex, Fido, Spot, and Alex. We also want two different kinds of nodes: human and dog. We’ll start by doing this in a more simple manner. First, we can start by instantiating the nodes. The syntax for doing so is as follows.

Variable_name=Node(“Label in quotes”,property1=”value for property1")

So if we want to put you and your dogs into our database, we can write out the following.

Alex = Node(“Person”,name=”Alex”)
Rex = Node(“Dog”,name=”Rex”)
Fido=Node(“Dog”,name=”Fido”)
Spot=Node(“Dog”,name=”Spot”)

However, this just creates the nodes and doesn’t put them into the database. We want to actually push them into our database. We can do that using the following code:

We can use graph.create(variable_name) to put the node into the graph itself. Now, we want to draw out the relationships between all of the nodes. We can do this by similarly creating a relationship. The syntax for a relationship is as follows.

Relationship(node1, “TYPE OF RELATIONSHIP”,node2)

So if we wanted to say that Alex loves Rex, we could say the following.

Relationship(Alex,”LOVES”, Rex)

We can put that straight into the graph.create() code to get the following.

graph.create(Relationship(Alex,”Loves”,Rex))

This code puts the relationship into our graph. This code down below results in the following visualization:

The code
Resulting visualization

As you can see, neo4j is a great tool of working with graphs and py2neo makes it even simpler to use. You can combine the python code above with any variety of different packages or tools to get an even more robust graph. There are also many more tools in py2neo that we haven’t explored. The documentation covers many more utilities, tools, and variations of ideas that we’ve previously explored. Good luck using neo4j!

https://xkcd.com/173/

--

--

Ananda Montoly
Smith-HCV

Software engineer at Google Cambridge and graduate from Smith College with a degree in computer/data science!