Neo4J

Neo4J is a popular Graph NoSQL Database. It provides the cypher query language to interact with the graph data.

Either use their cloud sandbox or a self-hosted instance.

  • ❌ Account required, can't delete your account πŸ™„
  • πŸͺ΅ A bit hard to use (easy to make mistakes/errors)
  • 🀝 open-source (GitHub, 11.6k ✨)
  • πŸš€ Cypher's syntax looks like SQL
  • 🌫️ Cloud (free version available) or self-hosted (local)

Example database.


Core logic

With Neo4J, each cypher query returns a graph.

Nodes

  • πŸ’° These are our records
  • πŸ—ƒοΈ They may have labels (ex: Person, Actor, ShowBizPerson...)
  • πŸ”₯ For instance, the node Werner Herzog (ShowbizPerson)

Edges

  • 🧡 They are the links between records
  • ↔️ They have a direction, and link two nodes
  • πŸ•ΈοΈ There is no limit to the number of edges between two nodes
  • πŸ”₯ For instance, -ACTED_IN-> between W. H. and What Dreams...

Cypher Graph Example


Nodes and edges

Nodes

In a query, we use () to represent a node.

✍️ If we need to use the captured node, we store it in a variable: (variable_name) such as (m).

πŸ‘‰ We can select nodes having a specific label using: (:Label) such as (:Movie) or (m:Movie).

βœ‚οΈ We can go further and filter nodes based on their properties (:{released:2008}), (:Movie{released:2008}), or (r:{released:2008}) represent nodes with released == 2008.

πŸ”₯ You can use variable_name.attribute to access an attribute later.

Edges

In a query, we use -- to represent an edge.

Similarly to nodes:

  • You can capture an edge similarly to a node: -(v)-.
  • You can select edges based on a Label: -(:Label)-, -(v:Label)-.
  • ...

But, you can also add a direction:

  • From a to b: a-->b, a-(v)->b, etc.
  • From b to a: a<--b, a<-(v)-b, etc.

Basic clauses

Clauses are case-insensitive instructions.

  • Comments are made with // or /* ... */
  • Concatenate strings with +
  • Use ; to chain requests.

The order is: MATCH > WHERE > RETURN > ORDER BY > SKIP > LIMIT.

MATCH (SQL FROM/WHERE)

Use MATCH to select nodes/edges. You can chain matches if needed.

MATCH (m:Movie) [...]                 // fetch all movies
MATCH (m:Movie{released: 2008}) [...] // match + filtering
MATCH (m), (p) [...]                  // cartesian product
MATCH g = (:Movie)-[]-() [...]        // store graph

WHERE (SQL WHERE)

Filter nodes/edges based on a predicate.

  • =, !=, <>, >, <, >=, <=, ...: basic operators
  • attribute IN [value, value]: in array
  • attribut =~ "regex": in regex
  • attribute STARTS WITH, ENDS WITH, CONTAINS: ...
  • xxx:label: true if xxx got this label, false else
  • exists(xxx.attribute): check if "attribute" exists
  • type(edge) == 'name': test if an edge got this name
  • an edge (filter nodes not having an edge)
  • ...

πŸ‘‰ Chain predicates using AND/OR/XOR. See also: NOT/IS.

RETURN (SQL SELECT)

Return specifies which nodes are in the final graph.

[...] RETURN m                    // one node
[...] RETURN DISTINCT m           // no duplicates results
[...] RETURN m.title, m.released  // attributes
[...] RETURN m, n                 // cartesian product 
[...] RETURN {title:m.title}      // as JSON

ORDER BY (SQL ORDER BY)

Sort results.

[...] ORDER BY m.title       // ASC
[...] ORDER BY m.title ASC   // ASC
[...] ORDER BY m.title DESC  // DESC

LIMIT + SKIP (SQL LIMIT)

Skip results or limit the number of results.

[...] SKIP 10    // skip the first 10 results
[...] LIMIT 3    // return up to 3 results

Advanced clauses

Functions

Functions are listed here.

  • keys(node): list a node's attributes
  • properties(node): list a node's attributes and their values
  • labels(node): returns a node's labels
  • nodes(graph): returns all nodes in a graph
  • relationships(graph): returns all edges of a graph
  • id(node): returns a node's ID

And some useful aggregates functions:

  • COUNT(something): number of results
  • MIN(something), MAX(something), SUM(something), AVG(something), ROUND(something)... which are working like in SQL.

For instance: MATCH (m:Movie) RETURN ROUND(AVG(m.released)) returns the rounded average release date of all movies.

WITH (SQL <none>)

With can be used to execute an operation, such as a calculation.

MATCH (m:Movie) WITH ROUND(AVG(m.released)) AS avg RETURN avg
// which is useful when chaining matches
MATCH (m:Movie) WITH ROUND(AVG(m.released)) as avg 
MATCH (m{released: avg})
RETURN m

OPTIONAL MATCH (SQL <none>)

An optional match can be used to select nodes/edges that may be null.

MATCH (a:Movie) OPTIONAL MATCH (a)<-[r:ACTED_IN]-()
RETURN a.title, r // may be null for some movies

Create, update, and delete data

Create

Create takes a graph and creates it.

CREATE (:Movie{title: "My movie", released: 2021})
// create from something existing
MATCH (m:Movie{title: "My movie"}) CREATE (m)<-[:ACTED_IN]-(:ShowbizPerson{name: "me"})

Update

As in SQL, select nodes first, then update them.

MATCH (p:ShowbizPerson{name: "me"})
SET p.name = "My name", p.born = 2021
// same as
MATCH (p:ShowbizPerson{name: "me"})
SET p.name = "My name" SET p.born = 2021

Delete

There are 3 clauses, for attributes, edges, and nodes.

// REMOVE an attribute
MATCH (p:ShowbizPerson{name: "My name"}) REMOVE p.born RETURN p

// DELETE an edge
MATCH (:ShowbizPerson{name: "My name"})-[r]-() DELETE r

// DETACH DELETE a node
// DETACH = delete incident edges, optional if they were already deleted
// DELETE = delete the node
MATCH (p:ShowbizPerson{name: "My name"}) DETACH DELETE (p)

Examples

Here are some example queries.

// released after 2000
MATCH (m) WHERE m:Movie AND exists(m.released) AND m.released > 2000 RETURN m
// released after 2000 (same)
MATCH (m:Movie) WHERE exists(m.released) AND m.released > 2000 RETURN m
// match every node that PRODUCED a movie, and the movie
MATCH (p)-[:PRODUCED]->(m:Movie) RETURN p, m
// same using "where"
MATCH (p)--(m) WHERE (p)-[:PRODUCED]->(m:Movie) RETURN p, m
// number of persons that wrote or produced a movie
MATCH (p:ShowbizPerson)-[r:WROTE|PRODUCED]->(:Movie) RETURN DISTINCT COUNT(p)
// Actors that played in a movie with Tom Cruise
MATCH (s:ShowbizPerson)-[:ACTED_IN]->(:Movie)<-[:ACTED_IN]-(:ShowbizPerson{name: "Tom Cruise"}) RETURN DISTINCT s.name
// Movies released between 1990-2000 (included)
MATCH (m:Movie) WHERE m.released>=1990 AND m.released<=2000 RETURN m.title
// Persons who directed the movie "The Matrix"
MATCH (:Movie{title: "The Matrix"})-[:DIRECTED]-(s:ShowbizPerson) RETURN s.name
// Movies Meg Ryan acted in
MATCH (:ShowbizPerson{name: "Meg Ryan"})-[:ACTED_IN]->(m:Movie) RETURN DISTINCT m.title
// Those that both wrote and produced the same movie
MATCH (w:ShowbizPerson)-[:WROTE]->(:Movie)<-[:PRODUCED]-(p:ShowbizPerson) WHERE w.name = p.name RETURN DISTINCT w.name
MATCH (w:ShowbizPerson)-[:WROTE]->(:Movie)<-[:PRODUCED]-(w) RETURN DISTINCT w.name
MATCH (w:ShowbizPerson)-[:WROTE]->(:Movie) MATCH (w)-[:PRODUCED]->(:Movie) RETURN DISTINCT w.name

Lists

If needed, you can use lists in your queries. See Python lists.

// return the list of the release dates for the movies linked to "Keanu Reeves"
MATCH (a:ShowbizPerson {name: 'Keanu Reeves'})
RETURN [(a)-->(b) WHERE b:Movie | b.released] AS years

Here are two simple queries explained:

  • [x in range(0,5) | x]: returns [0,1,2,3,4,5]
  • [x in range(0,5) WHERE x+2<5 | x^2]: returns [0,1,4]
    • only 0,1,2 are passing the criteria x+2<5
    • then we evaluate each value as x^2

Common functions πŸ“‘

  • range(min,max): returns a list of values [min, max]
  • head(list): return the first element
  • tail(list): remove the list without the head
  • size(list): size of a list
  • reverse(list): reverse a list

Indexes ✨

  • list[0]: first element
  • list[-1]: last element
  • list[1..3]: list of list[1]+list[2]
  • list[..3]: list of list[0]+list[1]+list[2]
  • list[0..]: list of list[0]+list[1]+...

πŸ‘» To-do πŸ‘»

Stuff that I found, but never read/used yet.