skip to Main Content

I have a graph with multiple vertices and each of them represent an article from Wikipedia. The edges represent which article the first link of the current article text leads to. The article that is related to every other article is "Philosophy". I have 27 vertices and 26 edges.

If I want to see how far one edge is away from the other, I can query it in two different ways: one is using the size() function and the other is using the length() function. But one thing that I noted is that when we use size() instead of length() the query runs almost twice as fast. Why does that happen?

demo=# timing on
Timing is on.

demo=# SELECT * FROM cypher('Wikipedia', $$
MATCH p = (a)-[e:RELATED_TO*]->(b)
WHERE a.name = 'Tulpa' AND b.name = 'Philosophy'
RETURN size(e)
$$) AS (edge_count agtype);
 edge_count 
------------
 18
(1 row)

Time: 4.724 ms

demo=# SELECT * FROM cypher('Wikipedia', $$
MATCH p = (a)-[e:RELATED_TO*]->(b)
WHERE a.name = 'Tulpa' AND b.name = 'Philosophy'
RETURN length(p)
$$) AS (edge_count agtype);
 edge_count 
------------
 18
(1 row)

Time: 7.280 ms

2

Answers


  1. "p" contains information about the vertices throughout the path and holds more information.
    In contrast "e" only has information about the relationship so im assuming calculating the size of "p" and "e" comes down to the size of each variable and not specifically about the size() and length() functions.

    Login or Signup to reply.
  2. Correct me if I am wrong.

    According to the Apache AGE docs, the size() function returns the length of a list (array), while length() returns the length of a path. After reading the source code for age_size and age_length in the AGE Repository, as well as each array returned by the query, it appears that the length() function retrieves the path array, checks for a path, and calculates the length by subtracting one from the number of elements (edges + vertices) and dividing by 2.

    I believe the length() function, which creates a path using the AGE type AGTV_PATH, is more computationally expensive than size(), which simply checks the type of its values and counts the length of an array or a string (since the function supports inputs such as cstrings, text, and the agtype string or list).

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search