skip to Main Content

Amazon web services – AWS Glue Pyspark Job is not ending

I am trying to read the data from RDS Postgres via PySpark 3.3 and AWS Glue 5.0 versions using the below command. df = ( self.config.spark_details.spark.read.format("jdbc") .option( "url", f"jdbc:postgresql://{self.postgres_host}:{self.postgres_port}/{self.postgres_database}", ) .option("driver", "org.postgresql.Driver") .option("user", self.postgres_username) .option("password", self.postgres_password) .option("query", query) .load() )…

VIEW QUESTION

Amazon web services – Pyspark error: " Class org.apache.hadoop.fs.s3a.S3AFileSystem not found" in EMR 7.0.0

I am using EMR 7.0.0 version, which has python 3.9, spark 3.5.0, Hadoop 3.3.6 in AWS. I got the error: File "/usr/local/lib/python3.9/site-packages/pyspark/python/lib/pyspark.zip/pyspark/sql/readwriter.py", line 740, in csv File "/usr/local/lib/python3.9/site-packages/pyspark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py", line 1322, in __call__ File "/usr/local/lib/python3.9/site-packages/pyspark/python/lib/pyspark.zip/pyspark/errors/exceptions/captured.py", line 179, in deco File "/usr/local/lib/python3.9/site-packages/pyspark/python/lib/py4j-0.10.9.7-src.zip/py4j/protocol.py",…

VIEW QUESTION

Docker – Can't connect/write stream from spark container to table in cassandra container

I am composing these services in separate docker containers all on the same confluent network: broker: image: confluentinc/cp-server:7.4.0 hostname: broker container_name: broker depends_on: zookeeper: condition: service_healthy ports: - "9092:9092" - "9101:9101" environment: KAFKA_BROKER_ID: 1 KAFKA_ZOOKEEPER_CONNECT: 'zookeeper:2181' KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: PLAINTEXT:PLAINTEXT,PLAINTEXT_HOST:PLAINTEXT KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://broker:29092,PLAINTEXT_HOST://localhost:9092…

VIEW QUESTION

Phpmyadmin – Pyspark stream kafka debezium topic Error format, ETL

I have successfully created a mariadb database connection using debezium and kafka When I tried to stream the topic using pyspark this is the output that I get ------------------------------------------- Batch: 0 ------------------------------------------- +------+--------------------------------------------------------------------------------------------------------------------------+ |key |value | +------+--------------------------------------------------------------------------------------------------------------------------+ ||MaxDoe1.4.2.Finalnmysqlmariadbbtruebasecampemployees mysql-bin.000032�r�ȯݭd |…

VIEW QUESTION
Back To Top
Search