skip to Main Content

I want to load data from MySQL to BigQuery using Cloud Dataflow. Anyone can share article or work experience about load data from MySQL to BigQuery using Cloud Dataflow with Python language?

Thank you

2

Answers


  1. You can use apache_beam.io.jdbc to read from your MySQL database, and the BigQuery I/O to write on BigQuery.

    Beam knowledge is expected, so I recommend looking at Apache Beam Programming Guide first.

    If you are looking for something pre-built, we have the JDBC to BigQuery Google-provided template, which is open-source (here), but it is written in Java.

    Login or Signup to reply.
  2. If you only want to copy data from MySQL to BigQuery, you can firstly export your MySql data to Cloud Storage, then load this file to a BigQuery table.

    I think no need using Dataflow in this case because you don’t have complex transformations and business logics. It only corresponds to a copy.

    Export the MySQL data to Cloud Storage via a sql query and gcloud cli :

    gcloud sql export csv INSTANCE_NAME gs://BUCKET_NAME/FILE_NAME 
    --database=DATABASE_NAME 
    --offload 
    --query=SELECT_QUERY 
    --quote="22" 
    --escape="5C" 
    --fields-terminated-by="2C" 
    --lines-terminated-by="0A"
    

    Load the csv file to a BigQuery table via gcloud cli and bq :

    bq load 
      --source_format=CSV 
      mydataset.mytable 
      gs://mybucket/mydata.csv 
      ./myschema.json
    

    ./myschema.json is the BigQuery table schema.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search