I’m trying to find a way to calculate the average of the first commented day vs the first approved day of this data set. I’ve narrowed it down to one PR that I’d like to work on but I’d like to have the query ultimately scale to the entire data set.
pullRequest | stateCategory | submittedAt | refreshedAt |
---|---|---|---|
GitHub-microsoft-deepspeed-3324 | Commented | April 20, 2023, 9:59 PM | April 22, 2023, 5:28 PM |
GitHub-microsoft-deepspeed-3324 | Commented | April 20, 2023, 10:30 PM | April 22, 2023, 5:28 PM |
GitHub-microsoft-deepspeed-3324 | Commented | April 20, 2023, 10:32 PM | April 22, 2023, 5:28 PM |
GitHub-microsoft-deepspeed-3324 | Commented | April 20, 2023, 10:53 PM | April 22, 2023, 5:28 PM |
GitHub-microsoft-deepspeed-3324 | Approved | April 20, 2023, 11:37 PM | April 22, 2023, 5:28 PM |
Here is my Query:
SELECT "vcs_PullRequestReview"."pullRequest"
, "vcs_PullRequestReview"."stateCategory"
, "vcs_PullRequestReview"."submittedAt"
, "vcs_PullRequest"."refreshedAt"
FROM "vcs_PullRequestReview"
JOIN "vcs_PullRequest"
ON "vcs_PullRequestReview"."pullRequest" = "vcs_PullRequest"."id"
WHERE "vcs_PullRequest"."id" IN ('GitHub|microsoft|deepspeed|3324')
GROUP BY "vcs_PullRequestReview"."pullRequest"
, "vcs_PullRequestReview"."stateCategory"
, "vcs_PullRequestReview"."submittedAt"
, "vcs_PullRequest"."refreshedAt"
ORDER BY "vcs_PullRequestReview"."pullRequest" DESC
, "vcs_PullRequestReview"."submittedAt" ASC
I know it’s probably a window function or a series of case statement and subqueries but I’m not sure how to put it together. Thank you!
2
Answers
I think probably the easiest thing you can do is to utilize a CTE to prepare a mapping ("pullRequest", "stateCategory") -> minimum submittedAt and then use this to find what you want:
It’s not clear from the question if days refers to calendar days or 24 hour intervals. The following query returns the mean interval between first comment and first approval:
The mean can be expressed as fractional days (24 hour itervals) by changing the final
SELECT
expression to:To get the mean as fractional calendar days based on integral calendar days per pull request, cast
pr.submitted_at
in thepr_spans
subquery to dates; e.g.,pr.submitted_at::date
.