skip to Main Content

I’m trying to find a way to calculate the average of the first commented day vs the first approved day of this data set. I’ve narrowed it down to one PR that I’d like to work on but I’d like to have the query ultimately scale to the entire data set.

pullRequest stateCategory submittedAt refreshedAt
GitHub-microsoft-deepspeed-3324 Commented April 20, 2023, 9:59 PM April 22, 2023, 5:28 PM
GitHub-microsoft-deepspeed-3324 Commented April 20, 2023, 10:30 PM April 22, 2023, 5:28 PM
GitHub-microsoft-deepspeed-3324 Commented April 20, 2023, 10:32 PM April 22, 2023, 5:28 PM
GitHub-microsoft-deepspeed-3324 Commented April 20, 2023, 10:53 PM April 22, 2023, 5:28 PM
GitHub-microsoft-deepspeed-3324 Approved April 20, 2023, 11:37 PM April 22, 2023, 5:28 PM

Here is my Query:

SELECT "vcs_PullRequestReview"."pullRequest"
     , "vcs_PullRequestReview"."stateCategory"
     , "vcs_PullRequestReview"."submittedAt"
     , "vcs_PullRequest"."refreshedAt" 
FROM "vcs_PullRequestReview"
JOIN "vcs_PullRequest" 
  ON "vcs_PullRequestReview"."pullRequest" = "vcs_PullRequest"."id"
WHERE "vcs_PullRequest"."id" IN ('GitHub|microsoft|deepspeed|3324')
GROUP BY "vcs_PullRequestReview"."pullRequest"
       , "vcs_PullRequestReview"."stateCategory"
       , "vcs_PullRequestReview"."submittedAt"
       , "vcs_PullRequest"."refreshedAt" 
ORDER BY "vcs_PullRequestReview"."pullRequest" DESC
       , "vcs_PullRequestReview"."submittedAt" ASC

I know it’s probably a window function or a series of case statement and subqueries but I’m not sure how to put it together. Thank you!

2

Answers


  1. I think probably the easiest thing you can do is to utilize a CTE to prepare a mapping ("pullRequest", "stateCategory") -> minimum submittedAt and then use this to find what you want:

    with minSubmittedAtByPR as (
      select pullRequest, stateCategory, min(submittedAt) as minSubmittedAt
      from vcs_PullRequestReview
      group by pullRequest, stateCategory
    )
    select pr.pullRequest, minCommented.minSubmittedAt as minCommentedAt, minApproved.minSubmittedAt as minApprovedAt 
    from vcs_PullRequest pr
    left join minSubmittedAtByPR minCommented
    on pr.id = minCommented.pullRequest and minCommented.stateCategory = 'Commented'
    left join minSubmittedAtByPR minApproved
    on pr.id = minApproved.pullRequest and minApproved.stateCategory = 'Approved';
    
    Login or Signup to reply.
  2. It’s not clear from the question if days refers to calendar days or 24 hour intervals. The following query returns the mean interval between first comment and first approval:

    WITH pr(pull_request, state_category, submitted_at, refreshed_at) AS
      (VALUES ('GitHub-microsoft-deepspeed-3324', 'Commented','2023-04-20 21:59'::timestamp, '2023-04-22 17:28'::timestamp),
              ('GitHub-microsoft-deepspeed-3324', 'Commented','2023-04-20 22:30'::timestamp, '2023-04-22 17:28'::timestamp),
              ('GitHub-microsoft-deepspeed-3324', 'Commented','2023-04-20 22:32'::timestamp, '2023-04-22 17:28'::timestamp),
              ('GitHub-microsoft-deepspeed-3324', 'Commented','2023-04-20 22:53'::timestamp, '2023-04-22 17:28'::timestamp),
              ('GitHub-microsoft-deepspeed-3324', 'Approved','2023-04-20 23:37'::timestamp, '2023-04-22 17:28'::timestamp)),
    pr_spans AS       
      (SELECT pr.pull_request,
              MIN(pr.submitted_at) FILTER (WHERE pr.state_category = 'Commented') AS first_commented_at,
              MIN(pr.submitted_at) FILTER (WHERE pr.state_category = 'Approved') AS first_approved_at
         FROM pr
        GROUP BY pr.pull_request)
    SELECT AVG(pr_spans.first_approved_at - pr_spans.first_commented_at) AS mean_first_comment_to_first_approval
      FROM pr_spans;
    

    The mean can be expressed as fractional days (24 hour itervals) by changing the final SELECT expression to:

    SELECT EXTRACT(EPOCH FROM AVG(pr_spans.first_approved_at - pr_spans.first_commented_at)) / 86400 AS mean_first_comment_to_first_approval
    

    To get the mean as fractional calendar days based on integral calendar days per pull request, cast pr.submitted_at in the pr_spans subquery to dates; e.g., pr.submitted_at::date.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search