Real-world application of Neo4j in a Python and Django project
Optimising graph database integration increases the usability and effectiveness of software solutions
Integrating with graph databases can be hard, but optimising this process makes software solutions more robust, user-friendly and adaptable. This increases their overall effectiveness and usability, increasing the value they bring to users.
There are tools that can help with this integration. Engineers wondering how to create heavy graph database-intensive APIs can rely on Python and Neo4j to prototype and implement them quickly. You can access the examples and see the full implementation of this article here.
This article provides insights into using Neo4j to optimise the process of integrating with graph databases. It does so by using a real-world application of Neo4j in a Nearform project.
The application: Movie project
We are going to create a simple movie network app. Movie Network's data will have a lot of nodes of Movie
and Person
and connections (edges) among them that can be related to them as an actor, director, producer, reviewer or writer. This is a perfect use case for a graph database.
Project setup
We have prepared a repo with an example application here, where we have a docker container for a Neo4j database and the Neo4j drivers already setup in the codebase. To create a similar application from the ground, first open a terminal application and create the folder for the project:
For this application, we are going to use Poetry, a popular package manager for Python. After installing it, run:
This will start an interactive session where you should fill up the details.
Now, let's add the dependencies:
Neo4j setup
Neo4j is a knowledge graph database that can be used to store and retrieve data of such format. To interact with such systems, developers normally use Cypher queries. Cypher is an open and fully specified query language for property graph databases.
Here are some of the most common options to use a Neo4j DB on a python project:
neo4j-python-driver: The official Python driver for Neo4j
neomodel: An Object Graph Mapper (OGM) for Neo4j
django-neomodel: The Neomodel plugin for Django
Noemodel OGM provides lots of quality-of-life features that help rapid development for simple tasks. The main focus of this article will be to show how to use it and its tradeoffs. You can find more details on Neo4j docs.
Simple queries
The first step to using Neomodel is to define the nodes and relationship entity classes. Neomodel provides a list of classes to be derived for defining them. Here we will use StrucutureNode
and StructureRel
. The library also provides a set of methods to define the property types of the fields on those nodes.
Here is our list of entities:
Create the node
Let’s now use the OGM provided by Neomodel to populate the graph database. That can be easily done by using the save
method from the entities above:
To add a relationship between nodes, we use the connect
method on the entity instance:
Check the file src/neomodel/1-Create.py
to see all the data inserted on the graph.
Read the nodes
To find data from the graph we use the .nodes
class property to have access to the methods for querying it. Here are some examples:
To find a single node:
To find all nodes, from a relationship:
To slice some nodes:
Run more advanced queries:
Update the node
To update a single node we can simply get the instance, change the property and call save:
Delete the node
Deleting a single node is also a simple task:
Second-degree queries
As we have seen above, using the OGM for simple queries is straightforward. However, this landscape changes when we start to dive deeper into more complex node-relationship queries.
For example, to get the Co-Actors
who have worked alongside Tom Hanks on any film, we would write this Cypher:
Nevertheless, achieving identical results with the OGM necessitates constructing an operation with O(n²) complexity. This approach underscores a significant consideration in the trade-off between ease of use and performance efficiency.
This intricate operation can be optimised by efficiently fetching the associated nodes:
In the scenario described above, the nature of the data returned is not immediately apparent. The query yields an extensive array of information, including the main person searched, the movies in which they acted, their connection to these movies, other individuals related to the same movies and the nature of these relationships. In essence, the OGM generates a Cypher query akin to the original one but with a broader range of properties returned.
In scenarios like these, opting for custom queries might emerge as the more favourable solution.
Custom queries
To execute a custom query, one can directly write and run the complete Cypher query. The results are then processed, converting them into the corresponding StructuredNode
objects. While this method is highly effective and offers considerable power, it's important to note that extensive use throughout the codebase can lead to clutter and complexity. For illustration, consider the following example.
A more efficient approach to employing custom queries involves integrating them within the model definition as a property or a method. This strategy not only ensures a more organised code structure but also allows for leveraging the context of the current object to query additional data as needed. Observe the modifications in the updated code for the 'person' model, where the co_actors_names
property has been added to achieve the same results as previously demonstrated.
Results
Following our recent series of optimisations, the results have been both transformative and enlightening. These refinements have led to a significant increase in the clarity and efficiency of our code, notably enhancing its readability and maintainability.
Particularly impactful was the restructuring of custom queries within the model definition, which not only streamlined the code but also imbued it with greater contextual relevance. Additionally, the revised approach to handling complex operations, especially in terms of data retrieval and processing, has markedly improved performance.
These changes underscore the importance of continuous improvement and the profound impact that well-considered optimisations can have on the overall effectiveness and usability of our software solutions — making them more robust, user-friendly and adaptable to evolving requirements.
Conclusion
By using Neomodel, an engineer can get up to speed very quickly using the classes and properties provided. However, for better performance on more complex tasks it is recommended to write the Cypher queries manually, and implementing a query builder might be the best solution for some cases.
Interacting with graph databases can be challenging but there are many tools to facilitate the daily driving of such tasks. Neo4j, being one of the most widely adopted graph DBs has very good support for libraries for that purpose. Neo4j Driver and Neomodel are the most prominent ones.
Insight, imagination and expertly engineered solutions to accelerate and sustain progress.
Contact