Tips for Building on Graph Databases (Neo4j)

I’ve been working on a graph database app with data ported from a relational database. There’s plenty online to help you decide if a graph database is right for your project, but this post begins on the premise that you’ve already decided to go with a graph database, and now you’re building an app on top of it.

Here are two important things to remember:

graph-db

1. Neo4j IDs (also called Node IDs) are not universally unique ids.

Rule number 1 of working with a Neo4j database: don’t try to retrieve items by their ID attribute. That ID does not work the way IDs work in most SQL and NoSQL databases. In those databases, you can use the ID for item retrieval because they are universally unique—one ID will only ever point to one item, and if that item gets deleted, trying to retrieve with that ID will not return an item.

Not so with Neo4j. If you delete an item in Neo4j, its ID can later refer to a newly created item. Neo4j IDs do not increment. They are not unique across time. Also, they are not assigned with a random string generator—which would create a possibility, but low likelihood, of ID intersection. Stefan Armbruster explains how these IDs get assigned in a presentation he gave last year, which I quote here for your convenience:

Node IDs have a semantic [meaning] and give you the offset of that node or relationship within the store file. Consider the following example: Let’s say you delete a node that has a reference in MongoDB (or your other third-party database), only you forget [to delete] the reference in Mongo to that now-deleted node.

After the original node is deleted, there’s a free space in the node file, so when you create a new (likely unrelated) node, it now uses the same previously-used Node ID. The dangling reference now points to something semantically completely different, which can cause huge problems in your database.

This means that the likelihood of a new item taking on an old Node ID is not low at all. In fact, it’s practically guaranteed the first time someone misses dotting an i or crossing a t.

So what can you do instead? Much of the Neo4j team recommends making your own attribute to facilitate retrieval. Armbuster recommends calling it uuid. In the app I’m writing I call it a relational_id (remember, we’re porting from a relational database).

2. Neo4j does not conform to your ORM of choice.

This is because your ORM uses SQL, and Neo4j takes cypher queries. Of course, Neo4j provides you with an OGM to translate between your models and the nodes in your database. For example, the neo4j gem in Rails contains a module called ActiveNode that attempts to approximate some of the most common ActiveRecord queries. Similarly, the Neo4j Object Graph Mapping library sort of approximates the JdbcRespository functionality for Spring.

That said, these libraries do not contain all of the functionality you’d find in your ORM of choice. They’re newer and they’re made by a smaller team, so it’s reasonable to expect them to do less. What this means is that, while ActiveRecord and JdbcRepository may well rescue you from ever having to write a line of SQL, ActiveNode and other Neo4j OGMS are not likely to save you from the need to write Cypher.

You’re in luck, though: Cypher has a more consistent API than SQL, which means you’re facing a gentler learning curve. Neo4j provides this helpful intro. If you’re looking for a small, well-circumscribed exercise in Cypher to write in your app, here is my recommendation:

EXERCISE: Suppose you have a Node type with a relationship to several other instances of some other node type: for the sake of this example, let’s say you have a Genus node that has a :contains relationship to many Specie nodes. Your OGM allows you, in most cases, to query for the Species of a Genus with instance_of_genus.species. However, ActiveNode and Neo OGM don’t have built-in support for querying the other way.

Instead, make a method in your Specie model that allows you to fetch the genus it comes from. In the example, you might call the method like specie.derives_from.

Your implementation might look something like this example written in ruby (highlight to reveal)

SOLUTION:

def derives_from
     result = Neo4j::ActiveBase.current_session.query(“MATCH (g:Genus)-[:CONTAINS]->(s:Species{uuid:#{uuid}}) return g”)
     #Cypher returns an object whose ‘rows’ attribute contains the values as a ruby Array.
     return result.rows.first.first
end

Conclusion

Graph databases can offer you a rich, expressive, flexible data structure that might better represent your data than a relational database in some cases. As you begin to build an app on top of that graph database, it’s important to keep a couple of things in mind. First, you’ll want to make your own unique id attribute if you intend to use ids for item identification or retrieval. Second, you’ll want to get comfortable with the graph query language, Cypher—but luckily, Cypher has a lot of internal consistency (more than SQL, if I do say so myself), so the learning curve won’t be so steep.

Leave a Reply