Real-world application of Neo4j in a Python and Django project

Optimising graph database integration increases the usability and effectiveness of software solutions

Denis Germano, Jhon Rocha 21 Feb 2024

Integrating with graph databases can be hard, but optimising this process makes software solutions more robust, user-friendly and adaptable. This increases their overall effectiveness and usability, increasing the value they bring to users.

There are tools that can help with this integration. Engineers wondering how to create heavy graph database-intensive APIs can rely on Python and Neo4j to prototype and implement them quickly. You can access the examples and see the full implementation of this article here.

This article provides insights into using Neo4j to optimise the process of integrating with graph databases. It does so by using a real-world application of Neo4j in a Nearform project.

The application: Movie project

We are going to create a simple movie network app. Movie Network's data will have a lot of nodes of Movie and Person and connections (edges) among them that can be related to them as an actor, director, producer, reviewer or writer. This is a perfect use case for a graph database.

Project setup

We have prepared a repo with an example application here, where we have a docker container for a Neo4j database and the Neo4j drivers already setup in the codebase. To create a similar application from the ground, first open a terminal application and create the folder for the project:

text

mkdir neo4j-python-example
cd neo4j-python-example

For this application, we are going to use Poetry, a popular package manager for Python. After installing it, run:

text

poetry init

This will start an interactive session where you should fill up the details.

Now, let's add the dependencies:

text

poetry add neo4j
poetry add neomodel

Neo4j setup

Neo4j is a knowledge graph database that can be used to store and retrieve data of such format. To interact with such systems, developers normally use Cypher queries. Cypher is an open and fully specified query language for property graph databases.

Here are some of the most common options to use a Neo4j DB on a python project:

neo4j-python-driver: The official Python driver for Neo4j
neomodel: An Object Graph Mapper (OGM) for Neo4j
django-neomodel: The Neomodel plugin for Django

Noemodel OGM provides lots of quality-of-life features that help rapid development for simple tasks. The main focus of this article will be to show how to use it and its tradeoffs. You can find more details on Neo4j docs.

Simple queries

The first step to using Neomodel is to define the nodes and relationship entity classes. Neomodel provides a list of classes to be derived for defining them. Here we will use StrucutureNode and StructureRel. The library also provides a set of methods to define the property types of the fields on those nodes.

Here is our list of entities:

python

class ActedIn(StructuredRel):
    roles = StringProperty()

class Review(StructuredRel):
    summary = StringProperty()
    rating = IntegerProperty()

class Person(StructuredNode):
    name = StringProperty(unique_index=True)
    born = IntegerProperty()

    acted_in = RelationshipTo("Movie", "ACTED_IN", model=ActedIn)
    directed = RelationshipTo("Movie", "DIRECTED")
    produced = RelationshipTo("Movie", "PRODUCED")
    reviewed = RelationshipTo("Movie", "REVIEWED", model=Review)
    wrote = RelationshipTo("Movie", "WROTE")
    follows = RelationshipTo("Person", "FOLLOWS")
    followed_by = RelationshipFrom("Person", "FOLLOWS")

class Movie(StructuredNode):
    title = StringProperty(unique_index=True)
    released = IntegerProperty()
    tagline = StringProperty()

    actors = RelationshipFrom("Person", "ACTED_IN", model=ActedIn)
    directors = RelationshipFrom("Person", "DIRECTED")
    producers = RelationshipFrom("Person", "PRODUCED")
    reviewers = RelationshipFrom("Person", "REVIEWED", model=Review)
    writers = RelationshipFrom("Person", "WROTE")

Create the node

Let’s now use the OGM provided by Neomodel to populate the graph database. That can be easily done by using the save method from the entities above:

python

from models import Person, Movie 

keanu = Person(name="Keanu Reeves", born=1964).save()
the_matrix = Movie(title="The Matrix", released=1999, tagline="Welcome to the Real World").save()

To add a relationship between nodes, we use the connect method on the entity instance:

python

keanu.acted_in.connect(the_matrix, {'roles': ['Neo']})

Check the file src/neomodel/1-Create.py to see all the data inserted on the graph.

Read the nodes

To find data from the graph we use the .nodes class property to have access to the methods for querying it. Here are some examples:

To find a single node:

python

# Find the actor named "Tom Hanks"...
tom_hanks = Person.nodes.get(name="Tom Hanks")

To find all nodes, from a relationship:

python

# List all Tom Hanks movies...
# Fetch the 'Person' node for Tom Hanks
tom_hanks = Person.nodes.get(name="Tom Hanks")
# Fetch all the 'Movie' nodes related to Tom Hanks via the 'ACTED_IN' relationship
tom_hanks_movies = tom_hanks.acted_in.all()

To slice some nodes:

python

# Find 10 people...To slice some nodes:
people_names = [person.name for person in Person.nodes[:10]]

Run more advanced queries:

python

# Find movies released in the 1990s...
nineties_movies_titles = [
    movie.title for movie in Movie.nodes.filter(released__gte=1990, released__lt=2000)
]

Update the node

To update a single node we can simply get the instance, change the property and call save:

python

# Fetch the 'Movie' node for "The Da Vinci Code"
da_vinci = Movie.nodes.get(title="The Da Vinci Code")
# Update the release data
da_vinci.released = 2007
# Save
da_vinci.save()

Delete the node

Deleting a single node is also a simple task:

python

# Fetch the 'Movie' node for "The Da Vinci Code"
da_vinci = Movie.nodes.get(title="The Da Vinci Code")
# Delete it
da_vinci.delete()

Second-degree queries

As we have seen above, using the OGM for simple queries is straightforward. However, this landscape changes when we start to dive deeper into more complex node-relationship queries.

For example, to get the Co-Actors who have worked alongside Tom Hanks on any film, we would write this Cypher:

text

MATCH (tom:Person {name:"Tom Hanks"})-[:ACTED_IN]->(m)<-[:ACTED_IN]-(coActors)
RETURN coActors.name

Nevertheless, achieving identical results with the OGM necessitates constructing an operation with O(n²) complexity. This approach underscores a significant consideration in the trade-off between ease of use and performance efficiency.

python

# Tom Hanks' co-actors...
# MATCH (tom:Person {name:"Tom Hanks"})-[:ACTED_IN]->(m)<-[:ACTED_IN]-(coActors) RETURN coActors.name
print("Tom Hanks' co-actors...")
# Fetch the 'Person' node for Tom Hanks
tom_hanks = Person.nodes.get(name="Tom Hanks")
# Initialize an empty set to hold unique co-actor names
co_actors_names = set()
# Loop through each movie Tom Hanks acted in
for movie in tom_hanks.acted_in.all():
    # For each movie, find all actors who are not Tom Hanks
    for co_actor in movie.actors:
        if co_actor.name != "Tom Hanks":
            co_actors_names.add(co_actor.name)

This intricate operation can be optimised by efficiently fetching the associated nodes:

python

co_actors_names = set()
for _, _, _, co_actor, _ in (
    Person.nodes.filter(name="Tom Hanks").fetch_relations("acted_in__actors").all()
):
    co_actors_names.add(co_actor.name)

In the scenario described above, the nature of the data returned is not immediately apparent. The query yields an extensive array of information, including the main person searched, the movies in which they acted, their connection to these movies, other individuals related to the same movies and the nature of these relationships. In essence, the OGM generates a Cypher query akin to the original one but with a broader range of properties returned.

python

MATCH (person:Person)-[r1:`ACTED_IN`]->(movie_acted_in_1:Movie)<-[r2:`ACTED_IN`]-(person_actors_1:Person)
WHERE person.name = $person_name_1
RETURN person, movie_acted_in_1, r1, person_actors_1, r2

In scenarios like these, opting for custom queries might emerge as the more favourable solution.

Custom queries

To execute a custom query, one can directly write and run the complete Cypher query. The results are then processed, converting them into the corresponding StructuredNode objects. While this method is highly effective and offers considerable power, it's important to note that extensive use throughout the codebase can lead to clutter and complexity. For illustration, consider the following example.

python

query = """
MATCH (tom:Person {name:"Tom Hanks"})-[:ACTED_IN]->(m)<-[:ACTED_IN]-(coActors)
RETURN coActors
"""

# Execute the raw Cypher query
results, _ = db.cypher_query(query)

# Inflate results into Person objects
co_actors_names = set()
for record in results:
    # Extract node id
    node_id = record[0].element_id
    co_actor = Person.inflate(record[0])
    co_actors_names.add(co_actor.name)

A more efficient approach to employing custom queries involves integrating them within the model definition as a property or a method. This strategy not only ensures a more organised code structure but also allows for leveraging the context of the current object to query additional data as needed. Observe the modifications in the updated code for the 'person' model, where the co_actors_names property has been added to achieve the same results as previously demonstrated.

python

class Person(StructuredNode):
		# ...

		@property
    def co_actors_names(self):
        results, _ = self.cypher(
            "MATCH (a:Person) WHERE elementId(a)=$self MATCH (a)-[:ACTED_IN]->(m)<-[:ACTED_IN]-(coActors) RETURN coActors"
        )
        return list(set([self.inflate(row[0]).name for row in results]))

Results

Following our recent series of optimisations, the results have been both transformative and enlightening. These refinements have led to a significant increase in the clarity and efficiency of our code, notably enhancing its readability and maintainability.

Particularly impactful was the restructuring of custom queries within the model definition, which not only streamlined the code but also imbued it with greater contextual relevance. Additionally, the revised approach to handling complex operations, especially in terms of data retrieval and processing, has markedly improved performance.

These changes underscore the importance of continuous improvement and the profound impact that well-considered optimisations can have on the overall effectiveness and usability of our software solutions — making them more robust, user-friendly and adaptable to evolving requirements.

Conclusion

By using Neomodel, an engineer can get up to speed very quickly using the classes and properties provided. However, for better performance on more complex tasks it is recommended to write the Cypher queries manually, and implementing a query builder might be the best solution for some cases.

Interacting with graph databases can be challenging but there are many tools to facilitate the daily driving of such tasks. Neo4j, being one of the most widely adopted graph DBs has very good support for libraries for that purpose. Neo4j Driver and Neomodel are the most prominent ones.

An introduction to ETL and Azure Data Factory

13 mins
Heider Hengstmann
21 Jul 2023

Database Migration: A TypeScript-guided-journey from MongoDB to PostgreSQL

8 mins
Matheus Castiglioni
19 Jan 2024

Insight, imagination and expertly engineered solutions to accelerate and sustain progress.

Contact