skip to Main Content

i have followed the link here to install, build is succesful but I cannot find the connector.

from pyspark.sql import SparkSession
my_spark = SparkSession 
    .builder 
    .appName("myApp") 
    .config("spark.mongodb.read.connection.uri", "mongodb://127.0.0.1/intca2.tweetsIntca2") 
    .config("spark.mongodb.write.connection.uri", "mongodb://127.0.0.1/intca2.tweetsIntca2") 
    .config('spark.jars.packages', 'org.mongodb.spark:mongo-spark-connector_2.11:2.2.2') 
.getOrCreate()


df = spark.read.format("com.mongodb.spark.sql.DefaultSource").load()

Py4JJavaError: An error occurred while calling o592.load.
: java.lang.ClassNotFoundException: Failed to find data source: com.mongodb.spark.sql.DefaultSource

the connector was downloaded and built here
https://github.com/mongodb/mongo-spark#please-see-the-downloading-instructions-for-information-on-getting-and-using-the-mongodb-spark-connector

I Am using ubuntu 20.04

2

Answers


  1. Change to

    df = spark.read.format("mongodb").load()
    

    Then, you have to tell pyspark where to find the mongo libs, e.g.

    /usr/local/bin/spark-submit --jars $HOME/java/lib/mongo-spark-connector-10.0.0.jar,$HOME/java/lib/mongodb-driver-sync-4.3.2.jar,$HOME/java/lib/mongodb-driver-core-4.3.2.jar,$HOME/java/lib/bson-4.3.2.jar mongo_spark1.py
    
    Login or Signup to reply.
  2. I’m running pyspark in local mode.

    • Mongodb version 4
    • Spark version 3.2.1

    I download all needed jars in one folder(path_to_jars) and add it to spark config

    bson-4.7.0.jar
    mongodb-driver-legacy-4.7.0.jar
    mongo-spark-connector-10.0.3.jar
    mongodb-driver-core-4.7.0.jar
    mongodb-driver-sync-4.7.0.jar
    
    from pyspark.sql import SparkSession
    
    url = 'mongodb://id:port/Database.collection'
    spark = (SparkSession
             .builder
             .master('local[*]')
             .config('spark.driver.extraClassPath','path_to_jars/*')
             .config("spark.mongodb.read.connection.uri",url)
             .config("spark.mongodb.write.connection.uri", url)
             .getOrCreate()
             )
    df = spark.read.format("mongodb").load()
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search