Graph Data Modelling with Neo4j - A short introduction

This post introduces the basic elements and concepts of creating a data model for a graph database based on the property graph model.

Neo4j uses its own declarative query language Cypher.

Elements Overview

Building Blocks

Nodes: (node)
(Node) Labels: :Label
Relationships: -[relationship]->
(Relationship) Types: :Type
Properties: {property:<propertyValue>}

Syntax

Simple

(node:Label {property:<propertyValue>})-[relationship:Type {property:<propertyValue>}]->(node:Label)

Advanced (with cardinalities)

(node:LabelN {propertyN:<propertyNValue>} )-[relationship:1TypeOnly {propertyN:<propertyNValue>}]->(node:LabelN {propertyN:<propertyNValue>})

Cardinalities

Regarding the current limitations of n please check your actual version of Neo4j (but since version 3.0 these should not be a problem).

For nodes:

  • n (nodes) per graph
  • n :Labels per node
  • n {properties} per node
  • values: primitive types only or an array thereof
  • indexes: available

For relationships:

  • n -[relationships]-> per graph
  • 1 :Type per relationship only
  • n {properties} per relationship
  • values: primitive types only or array thereof
  • indexes: not available

So the main differences are that -[relationships]-> may have only one :Type and that there are no schema indexes available on relationship properties.

Element Details

(nodes)

Nodes in Cypher Query Language are denoted (node) and used for the following:

  • To represent (domain) entities, but depending on the domain relationships may be used for that purpose as well. Apart from properties and relationships, nodes can also be labeled (grouped) with zero or more labels. source
  • Compared to RDBMS, each row in a table in a RDBMS is a node and columns on those tables become node properties. source
  • Every node can have different properties. Nodes do not need to have the same keys. source

Summary: (nodes) are for domain entities and complex value types, grouped via (node:Labels) and interconnected via -[relationships]->. Every node can have different {properties} (schema-less key-value pairs), so not all nodes have the same keys.

:Labels

Labels are denoted (node:Label) or (node:Label1:Label2:LabelN) if several labels per node (e.g. (john:Staff:Administration)):

  • Compared to RDBMS, an entity table in RDBMS is represented by a label on a node. source
  • A label is a named graph construct used to group nodes into sets. Many database queries can work with these sets instead of the whole graph (easier & more efficient queries). A node may be labeled with any number of labels, including none. source
  • Labels are used to represent roles (e.g. user, product, company, category) to group belonging nodes together. source

Summary: :Labels are an optional addition to (nodes) for grouping related ones into sets. These facilitate writing queries, may improve performance and provide more semantic structure to the graph as a whole.

-[relationships]->

Key advantage or feature of a graph database are, of course, -[relationships]->. They are so-called first-class citizens of the graph data model. This means that they are stored/materialized (on insert) persistently with (nodes) and not, in contrast to SQL, inferred by a query (at runtime).

This has far-reaching consequences, one of which is that queries scale linearly only with the amount of -[relationships]-> (stored at this (node)). By contrast, in SQL a query for relationships scales (linearly) with the amount of entries in the JOIN table or explodes (exponentially) if there is more than one JOIN table involved.

Relationships can be used as follows:

  • To identify the interactions between (nodes) source
  • Every relationship has a name (:Type) and a direction. They add structure to the graph and provide semantic context for nodes. They must have end and start node (no dangling relationships). source
  • Compared to RDBMS, JOIN tables in RDMBS are transformed into relationships, columns on those tables become relationship properties. source
  • For direct access to connected nodes (no need for expensive queries by materializing relationships). source

Beyond the performance and ease of writing queries, nodes and relationships as connected structures enable graph databases to model closely many business domains in an intuitive way (at least compared to RDBMS).

Summary: -[relationships]-> are usually used for interactions between domain entities (sometimes, depending on the domain, also to represent entities per se). Whereas (nodes) may have several :Labels, -[relationships]-> may only have one :Type only but multiple {properties}.

{properties}

Properties can be added to (nodes) and -[relationships]->. They are denoted {propertyKey:propertyValue} inside the outer declaration, so e.g. (node {property:<propertyValue>}) or -[relationship {property:<propertyValue>}]-> and used for the following:

  • For advancing the data model further by defining attributes, weight or metadata as key-value properties. source
  • Every node or relationship can have different properties. So, not all entities (need to) have the same keys (schema-less). source
  • Compared to RDBMS, columns in entity or JOIN tables in RDBMS become node or relationship properties. source
  • Allowed values are primitive types only or an array of primitive types. They may not be null. source
  • Schema indexes are only available on node properties, not on relationship properties. source

Summary: {properties} are schema-less key-value pairs of primitive types used to advance and enrich (nodes) and -[relationships]-> with further data.

Further Information

This was of course only a short, condensed overview of elements, syntax and (very basic) concepts for a property graph model as used by Neo4j.

Apart from the intro given here, further and more detailed information information can be found here:
https://neo4j.com/docs/developer-manual/current/introduction/graphdb-concepts/
https://neo4j.com/developer/guide-data-modeling/
https://neo4j.com/developer/graph-db-vs-rdbms/
http://www.slideshare.net/neo4j/graphconnect-2014-sf-from-zero-to-graph

Another very useful and entertainingly told guide is "Welcome to the Dark Side: Neo4j Worst Practices (& How to Avoid Them)" by Stefan Armbruster:
https://neo4j.com/blog/dark-side-neo4j-worst-practices/

An easy start for Neo4j is to use their official docker image and their sample movie database which can be loaded via the web interface with just a few clicks:
https://hub.docker.com/_/neo4j/