skip to Main Content

Dataframes saved to S3/S3A from Spark are unencrypted despite settings "fs.s3a.encryption.algorithm" and "fs.s3a.encryption.key" – Ubuntu

Description Within PySpark, even though a DataFrame can be saved to S3/S3A (not AWS, but a S3-compliant storage), its data are saved unencrypted despite that setting fs.s3a.encryption.algorithm (SSE-C) and fs.s3a.encryption.key are used. Reproducibility Generate the key as followed: encKey=$(openssl rand…

VIEW QUESTION

How to connect to hdfs from the docker container?

My goal is to read file from hdfs in airflow and do further manipulations. After researching, I found that url I need to use is as follows: df = pd.read_parquet('http://localhost:9870/webhdfs/v1/hadoop_files/sample_2022_01.parquet?op=OPEN'), where localhost/172.20.80.1/computer-name.mshome.net can be interchangeably used, 9870 - namenode port,…

VIEW QUESTION
Back To Top
Search