skip to Main Content

So forgive my ignorance, but I can’t seem to work this out.

I want to create a “table” in BigQuery, from an API call.
I am thinking https://developer.companieshouse.gov.uk/api/docs/search/companies/companysearch.html#here

I want to easily query the Companies House API, without writing oodles of code?
And then cross reference that with other datasets – like Facebook API, LinkedIn API.
eg. I want to input a company ID/ name on Companies house and get a fuzzy list of the people and their likely Social connections (Facebook, LinkedIn and Twitter)

Maybe BigQuery is the wrong tool for this? Should I just code it??

Or

It is, and adding a dataset with an API is just not obvious to me how to figure it out – in which case – please enlighten me.

2

Answers


  1. You will not be able to directly use BigQuery and perform the task at hand. BigQuery is a web service that allows you to analyze massive datasets working in conjunction with Google Storage (or any other storage system).

    The correct way of going about the situation would be to perform a curl request to collect all the data you require from Companies House and store the data as a spreadsheet (csv). Afterwards you may store the csv within Google Cloud Storage and load the data into BigQuery.

    If you simply wish to link clients from Companies House and social media applications such as Facebook or LinkedIn, then you may not even need to use BigQuery. You may construct a structured table using Google Cloud SQL. The fields would consist of the necessary client information and you may later do comparisons with the FaceBook or LinkedIn API responses.

    Login or Signup to reply.
  2. So if you are looking to load data from various sources and do big-query operations through API – Yes there is a way and adding to the previous answer, big-query is meant to do only analytical queries (on big data) otherwise simply, it’s gonna cost you a lot and slower than a regular search api if you intend to do thousands of search queries on big datasets joining various tables etc.,

    let’s try to query using api from bigquery from public datasets

    to authenticate – you will need to generate the authentication token using your application default credentials

    gcloud auth print-access-token
    

    Now using the token generated by gcloud command – you can use it for rest api calls.

    POST https://www.googleapis.com/bigquery/v2/projects/<project-name>/queries
    Authorization: Bearer <Token>
    Body: {
      "query": "SELECT tag, SUM(c) c FROM (SELECT CONCAT('stackoverflow.com/questions/', CAST(b.id AS STRING)), title, c, answer_count, favorite_count, view_count, score, SPLIT(tags, '|') tags FROM `bigquery-public-data.stackoverflow.posts_questions` a JOIN (SELECT CAST(REGEXP_EXTRACT(text,r'stackoverflow.com/questions/([0-9]+)/') AS INT64) id, COUNT(*) c FROM `fh-bigquery.hackernews.comments` WHERE text LIKE '%stackoverflow.com/questions/%' AND EXTRACT(YEAR FROM time_ts)>=@year GROUP BY 1 ORDER BY 2 DESC) b ON a.id=b.id), UNNEST(tags) tag GROUP BY 1 ORDER BY 2 DESC LIMIT @limit",
      "queryParameters": [
        {
          "parameterType": {
            "type": "INT64"
    
          },
          "parameterValue": {
            "value": "2014"
    
          },
          "name": "year"
        },
        {
          "parameterType": {
            "type": "INT64"
    
          },
          "parameterValue": {
            "value": "5"
    
          },
          "name": "limit"
        }
      ],
      "useLegacySql": false,
      "parameterMode": "NAMED"
    }
    

    Response:

    {
        "kind": "bigquery#queryResponse",
        "schema": {
            "fields": [
                {
                    "name": "tag",
                    "type": "STRING",
                    "mode": "NULLABLE"
                },
                {
                    "name": "c",
                    "type": "INTEGER",
                    "mode": "NULLABLE"
                }
            ]
        },
        "jobReference": {
            "projectId": "<project-id>",
            "jobId": "<job-id>",
            "location": "<location>"
        },
        "totalRows": "5",
        "rows": [
            {
                "f": [
                    {
                        "v": "javascript"
                    },
                    {
                        "v": "102"
                    }
                ]
            },
            {
                "f": [
                    {
                        "v": "c++"
                    },
                    {
                        "v": "90"
                    }
                ]
            },
            {
                "f": [
                    {
                        "v": "java"
                    },
                    {
                        "v": "57"
                    }
                ]
            },
            {
                "f": [
                    {
                        "v": "c"
                    },
                    {
                        "v": "52"
                    }
                ]
            },
            {
                "f": [
                    {
                        "v": "python"
                    },
                    {
                        "v": "49"
                    }
                ]
            }
        ],
        "totalBytesProcessed": "3848945354",
        "jobComplete": true,
        "cacheHit": false
    }
    

    Query – The most popular tags on Stack Overflow questions linked from Hacker News since 2014:

    #standardSQL
    SELECT tag, SUM(c) c
    FROM (
      SELECT CONCAT('stackoverflow.com/questions/', CAST(b.id AS STRING)),
    title, c, answer_count, favorite_count, view_count, score, SPLIT(tags, '|') tags
      FROM `bigquery-public-data.stackoverflow.posts_questions` a
      JOIN (
        SELECT CAST(REGEXP_EXTRACT(text,
    r'stackoverflow.com/questions/([0-9]+)/') AS INT64) id, COUNT(*) c
        FROM `fh-bigquery.hackernews.comments`
        WHERE text LIKE '%stackoverflow.com/questions/%'
        AND EXTRACT(YEAR FROM time_ts)>=2014
        GROUP BY 1
        ORDER BY 2 DESC
      ) b
      ON a.id=b.id),
    UNNEST(tags) tag
    GROUP BY 1
    ORDER BY 2 DESC
    LIMIT 5
    

    Result :

    enter image description here

    So, we do some of our analytical queries using api to build periodical reports. But, I let you explore the other options & big-query API to create & load data using API.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search