Redis - Optimising or restructuring n-n relationship Neo4j queries causing very slow response times

ArnabSaha
January 11, 2020
273 views
1 vote
2 Answers

I’m looking for some assistance with one of my Neo4j graphs. My nodes and relationships looks something like this

This is an example of one end to end relationship within the graph.

The issue is related to node (Store) having multiple (Notifications) and they being related to multiple (Notification Action Types) & (Notification Types)

The query looks something like this:

MATCH (s:app)-[:HAS_GEOZONE]->(g:geozone)-[:HAS_ENGAGEMENTZONE]->(e:engagement_zone)-[:HAS_STORE]->(st:store)-[:HAS_NOTIFICATION]->(nt:notification), shortestPath((nt)-[r:HAS_NOTIFICATION_TYPE]->(ntt:notification_type)), shortestPath((nt)-[ra:HAS_NOTIFICATION_ACTION_TYPE]->(ntta:notification_action_type)), (st)-[:HAS_ATTRIBUTE]->(sta:store_attribute) WHERE s.uuid={app_id} AND sta.key='name' OPTIONAL MATCH (nt)-[:HAS_BRAND]->(br:brand) OPTIONAL MATCH (nt)-[:HAS_LABEL]->(l:loyalty) RETURN nt.uuid as nt_id, COLLECT(DISTINCT st.uuid) as st_ids, COLLECT(DISTINCT sta.value) as store_names, COLLECT(DISTINCT properties(br)) as notification_brands, COLLECT(DISTINCT properties(l)) as notification_labels, COLLECT(DISTINCT properties(nt)) as notification, COLLECT(DISTINCT properties(ntt)) as notification_type, COLLECT(DISTINCT properties(ntta)) as notification_action_type ORDER BY nt_id

And the response times for one query is more than 8 secs. And my application ends up requiring this information fairly frequently. Which was causing overall poor response and crashes which is why in the interim I have introduced redis in between to cache some of this data required by the app.

e.g. of the response of one entry from the original query is

And, the json looks like below where the column names are the nodes in the graph

 [
  {
    "nt_id": "002a3ba0-2584-11ea-93de-118eb121a0f8",
    "st_ids": [
      "e5fb2cc0-2246-11ea-a327-c1a6ac2ca4a0"
    ],
    "store_names": [
      "AND"
    ],
    "notification_brands": [],
    "notification_labels": [],
    "notification": [
      {
        "sub_text": "Happy shopping!!",
        "action_url": "https://www.tatacliq.com/and/c-mbh11a00015",
        "image_url": "",
        "notification_match_type": "GENERAL",
        "validity_start": 1,
        "text": "Welcome to {{store_name}}",
        "inventory_request_params": "",
        "isActive": true,
        "validity_end": 1,
        "uuid": "002a3ba0-2584-11ea-93de-118eb121a0f8",
        "active_days": "{"SUNDAY":"1100-2100","MONDAY":"1100-2100","TUESDAY":"1100-2100","WEDNESDAY":"1100-2100","THURSDAY":"1100-2100","FRIDAY":"1100-2100","SATURDAY":"1100-2100"}"
      }
    ],
    "notification_type": [
      {
        "name": "Deals & Offers",
        "uuid": "2fdc2b20-4faf-11e9-bfff-47192e190163"
      }
    ],
    "notification_action_type": [
      {
        "name": "In Store",
        "uuid": "ce78fc50-4fae-11e9-b974-7995b4e2b93d"
      }
    ]
  }
]

neo4j version: neo4j:3.5.12-enterprise and running in Docker on AWS m5.xlarge machine with 12G heap and cache size configured
what kind of API / driver do you use: Rest API on ECS and Node JS on separate instances
screenshot of [PROFILE or EXPLAIN]

Also, attaching query.log which explains the execution timelines on live environment.

query.log
query.log.1

Any assistance on this is much appreciated!

Thanks,
Arnab

Answers

Chosen as BEST ANSWER

Thanks for the answer. Well, I've managed to resolve this now. The reason why I had introduced shortestPath was I had the query you suggested earlier. Which is

MATCH (s:app)-[:HAS_GEOZONE]->(g:geozone)-[:HAS_ENGAGEMENTZONE]->(e:engagement_zone)-[:HAS_STORE]->(st:store)-[:HAS_NOTIFICATION]->(nt:notification), (nt)-[r:HAS_NOTIFICATION_TYPE]->(ntt:notification_type), (nt)-[ra:HAS_NOTIFICATION_ACTION_TYPE]->(ntta:notification_action_type), (st)-[:HAS_ATTRIBUTE]->(sta:store_attribute)
WHERE s.uuid={app_id} AND sta.key='name'
OPTIONAL MATCH (nt)-[:HAS_BRAND]->(br:brand)
OPTIONAL MATCH (nt)-[:HAS_LABEL]->(l:loyalty)
RETURN nt.uuid as nt_id, COLLECT(DISTINCT st.uuid) as st_ids, COLLECT(DISTINCT sta.value) as store_names, COLLECT(DISTINCT properties(br)) as notification_brands, COLLECT(DISTINCT properties(l)) as notification_labels, COLLECT(DISTINCT properties(nt)) as notification, COLLECT(DISTINCT properties(ntt)) as notification_type, COLLECT(DISTINCT properties(ntta)) as notification_action_type ORDER BY nt_id

And if in my graph there was an incoming relationship i.e. [r:HAS_NOTIFICATION_TYPE] & [ra:HAS_NOTIFICATION_ACTION_TYPE] for each Notification node, and there were multiple notification to each Store node. When the graph grew i.e. When I had more than 1500+ store and ~200 notifications under each and each of those had the above incoming relationships. The query would just not work and hang and crash the server. Changing those to shortestPath did stop that from happening but the results time was still high.

Now, I'm saving the Notif_Type & Notif_Action_Type nodes' uuid in the Notification node's properties. And updated the query like so:

MATCH (n:business_entity)-[:HAS_APP]->(s:app)-[:HAS_GEOZONE]->(g:geozone)-[:HAS_ENGAGEMENTZONE]->(e:engagement_zone)-[:HAS_STORE]->(st:store)-[:HAS_NOTIFICATION]->(nt:notification), 
        (st)-[:HAS_ATTRIBUTE]->(sta:store_attribute)
        WHERE n.uuid={b_id} AND s.uuid={app_id} AND sta.key='name'
        OPTIONAL MATCH (nt)-[:HAS_BRAND]->(br:brand)
        OPTIONAL MATCH(ntt:notification_type) WHERE ntt.uuid = nt.notification_type
        OPTIONAL MATCH(ntta:notification_action_type) WHERE ntta.uuid = nt.notification_action_type
        RETURN nt.uuid as nt_id,  COLLECT(DISTINCT st.uuid) as st_ids, COLLECT(DISTINCT sta.value) as store_names,
        COLLECT(DISTINCT properties(br)) as notification_brands, COLLECT(DISTINCT properties(nt)) as notification, 
        COLLECT(DISTINCT properties(ntt)) as notification_type, COLLECT(DISTINCT properties(ntta)) as notification_action_type

And, the response times are under 500ms now. Thanks for the detailed explanation! Cheers!

(Edit)

Hej Arnab!

I am a bit confused about what you are trying to achieve with the 2 shortestPath call you have in your MATCH.
You might want to use the shortestPath function to return the shortest path between 2 nodes you have previously matched. It would then look like that for instance:

MATCH (a), (b), p = shortestPath((a)-[*]-(b))
RETURN p

You have to use variable path length (*) to let the function search through all possible paths from a to b.

Now, if we take the shortestPath function call out of your querie and based on the way you scripted your query, you should get the same result but hopefully faster:

MATCH (s:app)-[:HAS_GEOZONE]->(g:geozone)-[:HAS_ENGAGEMENTZONE]->(e:engagement_zone)-[:HAS_STORE]->(st:store)-[:HAS_NOTIFICATION]->(nt:notification), (nt)-[r:HAS_NOTIFICATION_TYPE]->(ntt:notification_type), (nt)-[ra:HAS_NOTIFICATION_ACTION_TYPE]->(ntta:notification_action_type), (st)-[:HAS_ATTRIBUTE]->(sta:store_attribute)
WHERE s.uuid={app_id} AND sta.key='name'
OPTIONAL MATCH (nt)-[:HAS_BRAND]->(br:brand)
OPTIONAL MATCH (nt)-[:HAS_LABEL]->(l:loyalty)
RETURN nt.uuid as nt_id, COLLECT(DISTINCT st.uuid) as st_ids, COLLECT(DISTINCT sta.value) as store_names, COLLECT(DISTINCT properties(br)) as notification_brands, COLLECT(DISTINCT properties(l)) as notification_labels, COLLECT(DISTINCT properties(nt)) as notification, COLLECT(DISTINCT properties(ntt)) as notification_type, COLLECT(DISTINCT properties(ntta)) as notification_action_type ORDER BY nt_id

And you can even try using subsequent matches to make your query even faster instead of a single match:

MATCH (s:app)-[:HAS_GEOZONE]->(g:geozone)-[:HAS_ENGAGEMENTZONE]->(e:engagement_zone)-[:HAS_STORE]->(st:store)-[:HAS_NOTIFICATION]->(nt:notification)
WHERE s.uuid={app_id}
MATCH (nt)-[r:HAS_NOTIFICATION_TYPE]->(ntt:notification_type)
MATCH (nt)-[ra:HAS_NOTIFICATION_ACTION_TYPE]->(ntta:notification_action_type)
MATCH (st)-[:HAS_ATTRIBUTE]->(sta:store_attribute)
WHERE sta.key='name'
OPTIONAL MATCH (nt)-[:HAS_BRAND]->(br:brand)
OPTIONAL MATCH (nt)-[:HAS_LABEL]->(l:loyalty)
RETURN nt.uuid as nt_id, COLLECT(DISTINCT st.uuid) as st_ids, COLLECT(DISTINCT sta.value) as store_names, COLLECT(DISTINCT properties(br)) as notification_brands, COLLECT(DISTINCT properties(l)) as notification_labels, COLLECT(DISTINCT properties(nt)) as notification, COLLECT(DISTINCT properties(ntt)) as notification_type, COLLECT(DISTINCT properties(ntta)) as notification_action_type ORDER BY nt_id

But beware that depending on your graph model those 2 queries might deliver different results. If you’re interested to know the difference between those 2 different ways to match your graph, you can read Adam’s answer in the following post:
https://community.neo4j.com/t/significance-of-using-in-neo4j-multiple-relationships-cypher-query/10800

I hope this helps!
BR,
Fabien

Please signup or login to give your own answer.

Click here to cancel reply.

Redis – Optimising or restructuring n-n relationship Neo4j queries causing very slow response times

Answers