I am using AWS Neptune and using python with neo4j to query the database.
I am having difficult time to fetch the completely hierarchy by a node id.
I want the all children and sub children which are directly and in-directly connected to a particular node.
Node Properties:
(~id
, ‘entity_name’, ‘entity_type’, ~label
)
Edge Properties:
(~id
, ~label
, ‘journey_id’)
I have tried multiple queries, but I am getting timeout after 10 minutes since my aws neptune timeout is 10 minutes.
MATCH (source)<-[relationship*]-(target)
WHERE source.`~id` = '{entity_id}'
RETURN source, relationship, target
MATCH (source)<-[relationship*]-(target)
WHERE source.`~id` = '{entity_id}'
RETURN COLLECT(DISTINCT target)
MATCH (source)<-[relationship*]-(target)
WHERE source.`~id` = '{entity_id}'
RETURN COLLECT(DISTINCT [ID(startNode(relationship)), TYPE(relationship), ID(endNode(relationship))])
MATCH p=(source)<-[relationship*]-(target)
WHERE source.`~id` = '{entity_id}'
WITH p, source, target, nodes(p) AS n, relationships(p) AS r
UNWIND n AS node
WITH source, target, COLLECT(DISTINCT node) AS NODES, r
UNWIND r AS rel
WITH source, target, nodes, COLLECT(DISTINCT rel) as rels
RETURN source, target, nodes, rels
MATCH (source)<-[relationship*]-(target)
WHERE source.`~id` = '{entity_id}'
AND EXISTS( (source)<-[]-() )
RETURN source
MATCH p=(source {`~id`: '{entity_id}'})-[relationship*]-(target)
WITH DISTINCT p as path, source, target
return source, relationships(path) as rels, target
I have tired all the above queries but not efficient enough. It’s working fine if I limit the hierarchy to some level but in my scenario I want the completely hierarchy.
can anyone help with it?
2
Answers
Before I get into the following on Neptune Database, I should also mention that if these patterns are your primary use case (which is basically a Weakly Connected Components algorithm), then you may want to consider using Neptune Analytics instead of Neptune Database. Neptune Analytics has built in functionality for executing graph algorithms: https://docs.aws.amazon.com/neptune-analytics/latest/userguide/wcc.html
Neptune Engine Version
If you haven’t already, upgrade to engine version
1.3.2.1
as there were significant enhancements to openCypher query performance in this release.Neptune Database Edge Indexing
One likely issue here is the use of an incoming edge traversal and not providing edge labels. The following gets into Neptune’s indexing scheme [1] and what is happening when you query by incoming edges without labels.
Edge labels are stored in the P position in the indexes. There is the POGS index, but not knowing the edge type requires lookups in the POGS index for all P that is an edge label. If you can specify the edge labels that you want to traverse, this can make the query more efficient.
You can provide the possible edge labels, such as:
I would also state that if this is a common pattern, you may want to switch the direction of your edges. All things being equal, the performance of traversing an edge in either direction should be similar. However, Neptune stores outgoing edges adjacent to the originating vertex information in the SPOG index:
As such, the vertex ID, properties and outgoing edge are more likely to be held within the same set of database pages. This means the lookup of a vertex by ID would likely also fetch the vertex’s outgoing edge information and cache this in buffer pool cache. This makes outgoing edge traversals slightly more efficient (fewer trips to storage – more done in-memory).
When querying on an incoming edge (with a label), the POGS index would be used to find the edge and the vertex ID on the other side:
But when query on an outgoing edge, the SPOG index would be used. And since you’re binding on S (the outgoing vertex id), there’s less of a need to supply an edge label.
[1] https://docs.aws.amazon.com/neptune/latest/userguide/feature-overview-data-model.html
Aside from the above, if you can provide an openCypher explain plan output, we can look at potential other aspects of the query/data that may help make these queries more efficient: https://docs.aws.amazon.com/neptune/latest/userguide/access-graph-opencypher-explain.html
Variable-length path patterns have exponential complexity:
O(N^D)
, whereN
is the number of relationships per node, andD
is the depth of the search.If you do not specify an upper bound on the depth of the search (as in your case), you can cause long-running queries that may eventually get out-of-memory errors.
You should consider specifying an upper bound on the depth. For example, to search up to 7 relationships deep: