I am wanting to use Neptune for an application with cypher as my query language. I have a pretty small dataset of around ~8500 nodes and ~8500 edges edges. I am trying to do what seem to be fairly straightforward queries, but the latency is very high (~6-8 seconds for around 1000 rows). I have tried with various instance types, enabling and disabling caches, enabling and disabling the OSGP index to no avail. I’m really at a loss as to why the query performance is so poor.
Does anyone have any experience with poor query query performance using Neptune? I feel I must be doing something incorrect to have such high query latency.
Here is some more detailed information on my graph structure and my query.
I have a graph with 2 node types A
and B
and a single edge type
MAPS_TO
which always is directed from an A
node to a B
node. The relation is MAPS_TO
is many to many, but with the current dataset
it is primarily one-to-one, i.e. the graph is mainly
disconnected subgraphs of the form:
(A)-[MAPS_TO]-(B)
What I would like to do is for all A nodes to collect the distinct B nodes which they map to satisfying some conditions. I’ve experimented with my queries a bit and the fastest one I’ve been able to arrive at is:
MATCH (a:A)
WHERE a.Owner = $owner AND a.IsPublic = true
WITH a
MATCH (a)-[r:MAPS_TO]->(b:B)
WHERE (b)<-[:MAPS_TO {CreationReason: "origin"}]-(:A {Owner: $owner})
OR (b)<-[:MAPS_TO {CreationReason: "origin"}]-(:A {IsPublic: true})
WITH a, r, b ORDER BY a.AId SKIP 0 LIMIT 1000
RETURN a {
.AId
} AS A, collect(distinct b {
B: {BId: b.BId, Name: b.Name, other properties on B nodes...}
R: {CreationReason: r.CreationReason, other relation properties}
})
The above query takes ~6 seconds on the t4g.medium
instance type. I tried upping to a r5d.2xlarge
instance type and this cut the query time in half to 3-4 seconds. However, using such a large instance type seems quite excessive for such a small amount of data.
Really I am just trying to figure out why my query seems to perform so poorly. It seems to me that with the amount of data I have it should not really be possible to have a Neptune configuration with such performance.
2
Answers
Unfortunately, there are many reasons that performance could be suffering, be it instance size, data not in buffer cache, instance size, concurrent processes, query optimization, etc. so it is hard to provide specific suggestions with the information available.
To better understand the issue, I’d suggest taking a look at how the query is being processed. These details can be found using the openCypher explain feature which will provide low-level details on what the query is doing and where the time is being spent. If possible, I suggest opening a support case with AWS support.
The query seems unneccessary complicated and you are running a query in a query with multiple times asking for the same things…
You could try the following and it shoul give you the same results:
Best regards
Frank