Back

Investigating Neo4j Indexes

Published Dec 23, 2020 • Last reviewed Dec 23, 2020

An investigation into the trade-offs of a Neo4j database index.

In the past year, I've been working on a product recommendation system using Neo4j that included product information for over 100,000 items. Databases with upwards of 5 million nodes and 30 million relationships are running in production so this was by no means pushing Neo4j to its limit but it was a scale that began to have performance issues if no optimizations were made.

Database Indexes

One such optimization was to create an index for the database. A database index is a data structure that can improve the search efficiency of the data. A good simplification is to consider a phone book. When you're looking for a particular entry, you likely won't look through every listing as there might be hundreds of thousands of entries. Instead, phonebooks come with tabs that sort the entries by name.

Process

I first created two droplets on Digital Ocean and configured Neo4j the same way on both droplets. I then created a property index in one database (Alpha) and left the other one as is (Beta).

CREATE INDEX value FOR (number:Number) ON number.value

I then created 1,000,000 nodes in both databases.

FOREACH(i in RANGE(1, 1000000) | CREATE (:Number {value: i}))

I didn't actually use the above cypher, I had to do it in a few steps.

From there, I randomly generated 5 unique numbers within 1 and 1,000,000 and tried to fetch the query in both databases. Here are the results:

Results

Finding a single :Number node by value in a database with 1,000,000 nodes. Alpha is the database that has an index on the value property of the :Number node. Beta is the database that has no index configured.

Test # Alpha (ms) Beta (ms) % Difference
1 423.2053 5257.1049 42.55
2 422.947901 5254.6187 42.55
3 411.369301 5255.7346 42.74
4 429.008599 5240.1161 42.43
5 424.7967 5240.403301 42.50

I ran the same tests for 800,000 nodes 600,000 nodes. 800000 nodes.

800,000 Nodes:

Test # Alpha (ms) Beta (ms) % Difference
1 470.078101 2091.8318 31.65
2 433.553101 915.1855 17.85
3 435.0441 2115.005 32.94
4 465.709401 2089.146301 31.77
5 458.296 2087.101899 32.00

600,000 Nodes:

Test # Alpha (ms) Beta (ms) % Difference
1 480.539601 2965.8342 36.06
2 476.063701 2855.8858 35.71
3 476.549 2923.386399 35.98
4 473.4447 2515.1195 34.16
5 476.3136 2962.481301 36.15

Analysis

In a database with 1 million nodes, creating an index can improve query performance by 40%. I want to look at the trade-off of the improved performance and how an index improves the query so drastically.

We can get a better understanding of how we can achieve such performance increases by prepending our cypher with PROFILE.

Alpha (with index):

neo4j@neo4j> PROFILE MATCH (n:Number {value: 900400}) RETURN n;
+---------------------------+
| n                         |
+---------------------------+
| (:Number {value: 900400}) |
+---------------------------+

+----------------------------------------------------------------------------------------------------------+
| Plan      | Statement   | Version      | Planner | Runtime       | Time | DbHits | Rows | Memory (Bytes) |
+----------------------------------------------------------------------------------------------------------+
| "PROFILE" | "READ_ONLY" | "CYPHER 4.2" | "COST"  | "INTERPRETED" | 55   | 4      | 1    | 0              |
+----------------------------------------------------------------------------------------------------------+


+-----------------------+------------------------------------------+----------------+------+---------+------------------------+
| Operator              | Details                                  | Estimated Rows | Rows | DB Hits | Page Cache Hits/Misses |
+-----------------------+------------------------------------------+----------------+------+---------+------------------------+
| +ProduceResults@neo4j | n                                        |              1 |    1 |       2 |
  0/0 |
| |                     +------------------------------------------+----------------+------+---------+------------------------+
| +NodeIndexSeek@neo4j  | n:Number(value) WHERE value = $autoint_0 |              1 |    1 |       2 |
  0/0 |
+-----------------------+------------------------------------------+----------------+------+---------+------------------------+

1 row available after 40 ms, consumed after another 15 ms

Beta (without index):

neo4j@neo4j> PROFILE MATCH (n:Number {value: 900400}) RETURN n;
+---------------------------+
| n                         |
+---------------------------+
| (:Number {value: 900400}) |
+---------------------------+

+-----------------------------------------------------------------------------------------------------------+
| Plan      | Statement   | Version      | Planner | Runtime       | Time | DbHits  | Rows | Memory (Bytes) |
+-----------------------------------------------------------------------------------------------------------+
| "PROFILE" | "READ_ONLY" | "CYPHER 4.2" | "COST"  | "INTERPRETED" | 754  | 2000003 | 1    | 0              |
+-----------------------------------------------------------------------------------------------------------+


+------------------------+----------------------+----------------+---------+---------+------------------------+
| Operator               | Details              | Estimated Rows | Rows    | DB Hits | Page Cache Hits/Misses |
+------------------------+----------------------+----------------+---------+---------+------------------------+
| +ProduceResults@neo4j  | n                    |         100000 |       1 |       2 |                    0/0 |
| |                      +----------------------+----------------+---------+---------+------------------------+
| +Filter@neo4j          | n.value = $autoint_0 |         100000 |       1 | 1000000 |                    0/0 |
| |                      +----------------------+----------------+---------+---------+------------------------+
| +NodeByLabelScan@neo4j | n:Number             |        1000000 | 1000000 | 1000001 |                    0/0 |
+------------------------+----------------------+----------------+---------+---------+------------------------+

1 row available after 38 ms, consumed after another 716 ms

The cypher to fetch a single node by exact value made to the database without an index hits the database much more. It has no choice but to scan every node in the database.

Here are resources if you're interested in further interpretting execution plans and cypher queries:

A database index is able to produce such performance gains by reducing the complexity of the problem from a problem that scales linearly with the number of nodes to a problem that scales logarithmically by creating and using a binary tree.

But how much space does this extra datastructure take? What's the trade-off of adding a database index? Surely, the extra datastructure involved takes up more memory!

The Trade-off

The gains in speed come at the cost of storage. I went into each droplet and took a look at how much memory was being used by Neo4j's schema index files:

Alpha Beta
Sum (K) 196968 336
Sum (M) 192.3515625 0.328125

An index for one million nodes takes ~190 MB more storage. This scales linearly with the number of nodes indexed.

Summary

In a database with one million nodes, a database index can increase query speed on the indexed property by 40%. These gains are reduced as the number of nodes decreases. These speed improvements come at the cost of memory.

Last reviewed on February 24, 2026