I have a graph with multiple vertices and each of them represent an article from Wikipedia. The edges represent which article the first link of the current article text leads to. The article that is related to every other article is "Philosophy". I have 27 vertices and 26 edges.
If I want to see how far one edge is away from the other, I can query it in two different ways: one is using the size()
function and the other is using the length()
function. But one thing that I noted is that when we use size()
instead of length()
the query runs almost twice as fast. Why does that happen?
demo=# timing on
Timing is on.
demo=# SELECT * FROM cypher('Wikipedia', $$
MATCH p = (a)-[e:RELATED_TO*]->(b)
WHERE a.name = 'Tulpa' AND b.name = 'Philosophy'
RETURN size(e)
$$) AS (edge_count agtype);
edge_count
------------
18
(1 row)
Time: 4.724 ms
demo=# SELECT * FROM cypher('Wikipedia', $$
MATCH p = (a)-[e:RELATED_TO*]->(b)
WHERE a.name = 'Tulpa' AND b.name = 'Philosophy'
RETURN length(p)
$$) AS (edge_count agtype);
edge_count
------------
18
(1 row)
Time: 7.280 ms
2
Answers
"p" contains information about the vertices throughout the path and holds more information.
In contrast "e" only has information about the relationship so im assuming calculating the size of "p" and "e" comes down to the size of each variable and not specifically about the size() and length() functions.
Correct me if I am wrong.
According to the Apache AGE docs, the
size()
function returns the length of a list (array), whilelength()
returns the length of a path. After reading the source code forage_size
andage_length
in the AGE Repository, as well as each array returned by the query, it appears that thelength()
function retrieves the path array, checks for a path, and calculates the length by subtracting one from the number of elements (edges + vertices) and dividing by 2.I believe the
length()
function, which creates a path using the AGE type AGTV_PATH, is more computationally expensive than size(), which simply checks the type of its values and counts the length of an array or a string (since the function supports inputs such as cstrings, text, and the agtype string or list).