skip to Main Content

I’m experiencing a significant discrepancy in the total user count between BigQuery and GA4. I’m using Shopify and Elevar for server-side tracking. Over one week, there is a 15% discrepancy, and over one month, it can be up to 30%.

Here’s the query I’m using in BigQuery:

select 
  count(distinct(user_pseudo_id)) as users 
from 
  `events_*` 
where 
  _table_suffix between '20240715' and '20240721'

The BigQuery result is 21,627 users for the specified week but GA4 is 24,319(see attached screenshot).

I suspect that GA4 might be calculating total users differently compared to my query. This level of discrepancy is unusual as GA4 and BigQuery data are usually quite similar in my experience. I would appreciate any insights or suggestions on what might be causing this discrepancy and how to resolve it.

Additional Information:
Tracking Setup: Shopify and Elevar for server-side tracking
Time Frame: 15% discrepancy over one week, up to 30% over one month
Tools Used: BigQuery, GA4
Any help or advice would be greatly appreciated.

GA4 Total Users
BigQuery Total Users

2

Answers


  1. GA4 and BigQuery might not always show the same numbers due to various reasons:

    • Data Processing Differences
    • Data Sampling
    • Data Collection Latency
    • Time Zone Setup Differences
    • GA4 Uses modeling
    • SQL Query Logic

    Getting the same number is very unlikely but you can get close by trying to use HyperLogLog++ algorithm (HLL++): https://cloud.google.com/bigquery/docs/reference/standard-sql/hll_functions

    Login or Signup to reply.
  2. GA4 and Bigquery will never align for the reasons in Micheles post above,

    I would point out that not all sessions have a user_pseudo_id which may explain a lot of the difference you are seeing and give you some confidence in your tracking.

    SELECT 
        COUNT(
            DISTINCT CONCAT(IFNULL(user_pseudo_id, 'MISSING'), 
                IFNULL((SELECT CAST(value.int_value AS STRING) FROM UNNEST(event_params) 
                 WHERE key = "ga_session_id"), 'MISSING'))
        ) AS num_of_sessions_missing_ID,
            COUNT(
            DISTINCT CONCAT(user_pseudo_id, 
                (SELECT value.int_value FROM UNNEST(event_params) 
                 WHERE key = "ga_session_id")
            )
        ) AS num_of_sessions,
        -- Extract and clean the page_location as a separate column
        REGEXP_REPLACE(
            (SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'page_location'), 
            r'?.*', ''
        ) AS page_location_cleaned,
        (SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'page_location') as page_location,
        COUNT(DISTINCT user_pseudo_id) AS Users,
        COUNT(user_pseudo_id) AS Users_all,
        device.web_info.hostname AS Hostname,
        -- Count how many times ga_session_id is blank or missing
        COUNTIF(
            (SELECT value.int_value FROM UNNEST(event_params) WHERE key = "ga_session_id") IS NULL
        ) AS blank_ga_session_id_count,
        -- Count of rows where user_pseudo_id is considered missing (assuming null represents missing)
        COUNTIF(user_pseudo_id IS NULL OR user_pseudo_id = '') AS missing_user_pseudo_id_count,
        -- Count of page views (assuming event_name = 'page_view' represents a page view)
        COUNTIF(event_name = 'page_view') AS page_views
    FROM 
        `bigquery.events_*`
    WHERE 
        _table_suffix BETWEEN FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 2 DAY))
        AND FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 2 DAY))
    GROUP BY 
        page_location_cleaned,
        page_location,
        Hostname
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search