pyspark Questions

Redis – How to read DBF file in PySpark

January 29, 2022
Anamika Singh
2 Answers

I have a requirement to read and process .DBF File in PySpark but I didn't get any library that how can I read that like we read the CSV, JSON, Parquet or other file. Please help to read this file.…

VIEW QUESTION

How can I reach a spark cluster in a Docker container with spark-submit and a python script?

January 26, 2022
Colin Defever
2 Answers

I've created a Spark cluster with one master and two slaves, each one on a Docker container. I launch it with the command start-all.sh. I can reach the UI from my local machine at localhost:8080 and it shows me that…

VIEW QUESTION

Ebay API – Pyspark – Pivot function issue

August 22, 2021
Saurabh
3 Answers

Input should be as below: company sales amazon 100 flipkart 900 ebay 890 amazon 100 flipkart 100 ebay 10 amazon 100 flipkart 90 ebay 10 And expected output should be as below: amazon flipkart ebay 300 1090 910 Tried using…

VIEW QUESTION

Why does PySpark not find spark-submit when creating a SparkSession? – CentOS

I'm trying to initialize a PySpark cluster with a Jupyter Notebook on my local machine running Linux Mint. I am following this tutorial. When I try to create a SparkSession, I get an error that spark-submit does not exist. Strangely,…

VIEW QUESTION

Pyspark Zeppelin – Fail to open PythonInterpreter – Debian

April 21, 2021
Skruff
2 Answers

I am new to Zeppelin and Pyspark. I have tried in vain to get Zeppelin to run with Pyspark. My setup: 4 x Raspberry 4(8GB) Ubuntu Server 64bit 20.04 Hadoop: 3.2.2 Yarn Spark 3.1.1 & Hadoop integrated Zeppelin 0.9 Pi01…

VIEW QUESTION

PySpark array_remove elements regex question – Twitter API

April 14, 2021
N1h1l1sT
2 Answers

I'm trying to learn PySpark better and I'm streaming tweets and trying to capture the hashtags from the tweet's text (I know twitter API's json already provides the hashtags, I'm doing this as an excercise). So with a pyspark dataframe…

VIEW QUESTION

How to setup jar configs in databricks for redis connections

March 1, 2021
Vamsi Nimmala
2 Answers

I have installed the following jar in databricks "com.redislabs:spark-redis_2.12:2.5.0". And trying create a spark session with the respective authentications Below is the code where I create a spark session with creds redis= SparkSession.builder.appName("redis_connection").config("spark.redis.host", "hostname").config("spark.redis.port", "port").config("spark.redis.auth", "pass").getOrCreate() But when I try…

VIEW QUESTION

Spark redis connector to write data into specific index of the redis

July 8, 2020
Tulasi
2 Answers

I'm trying to read the data from Cassandra and write to Redis of a specific index. let's say Redis DB 5. I need to write all data into Redis DB index 5 in the hashmap format. val spark = SparkSession.builder()…

VIEW QUESTION

Trying to consuming the kafka streams using spark structured streaming – Twitter API

July 5, 2020
Sri-nidhi
2 Answers

I'm new to Kafka streaming. I setup a twitter listener using python and it is running in the localhost:9092 kafka server. I could consume the stream produced by the listener using a kafka client tool (conduktor) and also using the…

VIEW QUESTION

PySpark application submitting error on Yarn cluster mode and standalone mode – CentOS

June 25, 2020
Mengyu Wang
2 Answers

Environment: Python : 3.6.8 OS: CentOS 7 Spark: 2.4.5 Hadoop:2.7.7 Hardware: 3 computers (8 VCores available for each computer on hadoop cluster) I constructed a simple python application. And my code is: import numpy as np from pyspark.sql import SparkSession…

VIEW QUESTION