skip to Main Content

When I load data from MongoDB into a data frame using spark, there is one field with ObjecType and ArrayType whose value is null due to missing data, when I df.show() I get an error:

Cannot cast ARRAY into a StructType at documents that are arrays.

Is there any way to solve this problem?

2

Answers


  1. Chosen as BEST ANSWER

    This is my db problem:

    {
      "user4x": [],
    }
    {
      "user4x": {
        "username": "user1",
        "userid": "629a35"
      }
    } 
    

    When I use pyspark and load db:

    spark = SparkSession 
        .builder 
        .appName("mongo") 
        .config("spark.mongodb.input.uri",dbURLinfo) 
        .config("spark.mongodb.output.uri",dbURLinfo) 
        .config("spark.executor.heartbeatInterval", "180s") 
        .config("spark.network.timeout", "300s") 
        .config('spark.jars.packages', 'org.mongodb.spark:mongo-spark-connector_2.12:3.0.2') 
        .config("spark.sql.debug.maxToStringFields", 1000) 
        .getOrCreate()
    
    df = spark.read.format("mongo") 
        .option("database", "log1") 
        .option("collection", "202304") 
        .load()
    df.show()
    

    I have this fault:

    com.mongodb.spark.exceptions.MongoTypeConversionException: Cannot cast ARRAY into a StructType(StructField(userid,StringType,true),StructField(username,StringType,true),StructField(zohocrm_id,StringType,true))
    

  2. This is my_db. These are two different types, and it has too much data. I do not have permission to pre-process this data.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search