Error while using WriteToBigquery in python for Dataflow pipeline. Unicode object has no attribute 'items' - SEO

UtsavChatterjee
November 25, 2018
250 views
0 votes
2 Answers

My sample data is in json format and looks like:

{
  "metadata": {
    "action": "insert",
    "type": "export",
    "version": 1,
    "timestamp": "2018-11-23T09:17:59.048-08:00"
  },
  "data": {
    "attr1": 61,
    "day": "2018-11-22",
    "pin": "2C49956",
    "CDP": 0,
    "DP": 0,
    "VD": 0,
    "seo": 0,
    "dir": 0,
    "other": 0,
    "at": 0
  }
}

This is in a flat file and the objective is to run a dataflow pipeline in batch mode to insert the data into bigquery table. In one of the transformations where I want to take timestamp from the metadata and add it as a key value pair in the data section, I am getting the error from dataflow saying ‘unicode object has no attribute ‘items’.

The code looks as below:

import collections
import json
import argparse
import logging
from datetime import datetime
import apache_beam as beam
from apache_beam.options.pipeline_options import PipelineOptions, GoogleCloudOptions, StandardOptions, SetupOptions, 
    WorkerOptions
from apache_beam.io.gcp import bigquery


# Creating options object
def create_options(argv):
    # pipeline options
    options = PipelineOptions()
    google_cloud_options = options.view_as(GoogleCloudOptions)
    google_cloud_options.project = 'something'
    google_cloud_options.job_name = datetime.now().strftime('somename')
    google_cloud_options.staging_location = 'some_loc'
    google_cloud_options.temp_location = 'another_loc'
    options.view_as(StandardOptions).runner = 'DirectRunner'
    options.view_as(SetupOptions).save_main_session = True
    options.view_as(WorkerOptions).machine_type = 'n1-standard-1'
    return options

    class PrepareData(beam.DoFn):
        """
        ParDo function to create a dictionary of data for downstream consumption
        """

        def process(self, element):
            data = json.loads(element)
            modified_data = {"action": data["metadata"]["action"], "timestamp": data["metadata"]["timestamp"], "data": data}
            return [modified_data]


    class FilterInserts(beam.DoFn):
        """
        Filter data for inserts
        """

        def process(self, element):
            if element["action"] == "insert":
                element['data']['data']['timestamp'] = element['timestamp']
                # for dict in element["data"]["data"]:
                #     dict["timestamp"] = element["timestamp"]
                return element["data"]["data"]


    def run_pipe(options, argv):
        """
        Creating pipelines
        """
        p = beam.Pipeline(options=options)

        main_pipe =p | 'PREPARE_DATA' >> beam.io.ReadFromText('/home/Downloads/sample_1') | beam.ParDo(PrepareData())

        """ Separating pipes for various actions """
        insert_pipe= main_pipe | beam.ParDo(FilterInserts())

        """
        Inserts--> sinking to BQ
        """
        insert_pipe | 'INSERT' >> beam.io.WriteToBigQuery(
            project='some-data-warehouse',
            dataset='sample_data',
            table='sample',
            write_disposition='WRITE_APPEND',
            create_disposition='CREATE_IF_NEEDED')


        p.run()


    def main():
        """
        Main function to drive the run
        :return: errors if any
        """
        parser = argparse.ArgumentParser()
        args = parser.parse_args()
        try:
            # create options
            opt = create_options(argv=args)
            # run pipeline
            run_pipe(opt, argv=args)
        except Exception as e:
            logging.error('Pipeline failed with error : %s', e)
            raise Exception('Pipeline failed with error : %s', e)


    if __name__ == "__main__":
        main()

I am running this on direct runner to test on local but i get the same error even if i change the runner to dataflow-runner.
The error message is:

Exception: ('Pipeline failed with error : %s', AttributeError(u"'unicode' object has no attribute 'items' [while running 'INSERT/WriteToBigQuery']",))

Can anyone help me figure out what’s going wrong and how i can fix this?

Answers

By using the following table’s schema (You can modify it according to your necessities):

schema = 'VD:INTEGER,pin:STRING,timestamp:STRING,other:INTEGER,CDP:INTEGER,dir:INTEGER,attr1:INTEGER,seo:INTEGER,day:STRING,DP:INTEGER,at:INTEGER'

Try the following on your FilterInserts class:

class FilterInserts(beam.DoFn):
    """
    Filter data for inserts
    """

    def process(self, element):
        if element["action"] == "insert":
            element['data']['data']['timestamp'] = element['timestamp']

            return [{
            'VD': element['data']['data']['VD'],
            'pin': element['data']['data']['pin'],
            'timestamp': element['data']['data']['timestamp'],
            'other': element['data']['data']['other'],
            'CDP': element['data']['data']['CDP'],
            'dir': element['data']['data']['dir'],
            'attr1' : element['data']['data']['attr1'],
            'seo' : element['data']['data']['seo'],
            'day' : element['data']['data']['day'],
            'DP' : element['data']['data']['DP'],
            'at' : element['data']['data']['at'],
            }]

The issue happens since you need to send a Key Value array to BigQuery, and you were sending a JSON dictionary with unicode strings.

Hope it helps.

- Pkr
- October 8, 2019 at 11:19 pm
- 0 votes
0
Create a table with column names same as the key of the dictionary of your pipeline you want to insert.

For example:
```
output2 = output|beam.io.WriteToBigQuery('gcpcloud1-254210:dataflow.dataflow')
```
my output pipeline is a dictionary {'age': 30, 'city': 'New York', 'name': 'John'}

that I want to insert. So my table has a schema {age:INTEGER,name:STRING,city:STRING}
```
input = (p | beam.io.ReadFromText('gs://bucketname/*.json'))
output=(input|beam.Map(lambda e : json.loads(e))
beam.io.WriteToBigQuery('projectid:dataset.table'))
```
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.

Error while using WriteToBigquery in python for Dataflow pipeline. Unicode object has no attribute 'items' – SEO

Answers