skip to Main Content

I’m trying to read the data from Cassandra and write to Redis of a specific index. let’s say Redis DB 5.

I need to write all data into Redis DB index 5 in the hashmap format.

 val spark = SparkSession.builder()
  .appName("redis-df")
  .master("local[*]")
  .config("spark.redis.host", "localhost")
  .config("spark.redis.port", "6379")
  .config("spark.redis.db", 5)
  .config("spark.cassandra.connection.host", "localhost")
  .getOrCreate()

  import spark.implicits._
    val someDF = Seq(
      (8, "bat"),
      (64, "mouse"),
      (-27, "horse")
    ).toDF("number", "word")

    someDF.write
      .format("org.apache.spark.sql.redis")
      .option("keys.pattern", "*")
      //.option("table", "person"). // Is it mandatory ?
      .save()

Can I save data into Redis without a table name? Actually just I want to save all data into Redis index 5 without table name is it possible?
I have gone through the documentation of spark Redis connector I don’t see any example related to this.
Doc link : https://github.com/RedisLabs/spark-redis/blob/master/doc/dataframe.md#writing

I’m currently using this version of spark redis-connector

    <dependency>
        <groupId>com.redislabs</groupId>
        <artifactId>spark-redis_2.11</artifactId>
        <version>2.5.0</version>
    </dependency>

Did anyone face this issue? any workaround?

The error I get if I do not mention the table name in the config

FAILED

  java.lang.IllegalArgumentException: Option 'table' is not set.
  at org.apache.spark.sql.redis.RedisSourceRelation$$anonfun$tableName$1.apply(RedisSourceRelation.scala:208)
  at org.apache.spark.sql.redis.RedisSourceRelation$$anonfun$tableName$1.apply(RedisSourceRelation.scala:208)
  at scala.Option.getOrElse(Option.scala:121)
  at org.apache.spark.sql.redis.RedisSourceRelation.tableName(RedisSourceRelation.scala:208)
  at org.apache.spark.sql.redis.RedisSourceRelation.saveSchema(RedisSourceRelation.scala:245)
  at org.apache.spark.sql.redis.RedisSourceRelation.insert(RedisSourceRelation.scala:121)
  at org.apache.spark.sql.redis.DefaultSource.createRelation(DefaultSource.scala:30)
  at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)
  at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
  at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)

2

Answers


  1. The table option is mandatory. The idea is that you specify the table name, so it is possible to read the dataframe back from Redis providing that table name.
    In your case another option is to convert the dataframe to the key/value RDD and use sc.toRedisKV(rdd)

    Login or Signup to reply.
  2. I have to disagree. I’m dealing with the exact same issues you are. Here’s what I have found:

    1. You must either reference a table OR keys pattern. (e.g.)

      df = spark.read.format("org.apache.spark.sql.redis")
      .option("keys.pattern", "rec-*")
      .option("infer.schema", True).load()

    In my case, I’m using a HASH and the HASH keys all begin with "rec-" followed by a int.
    The spark-redis code considers the "rec-" a table. As mentioned, the trick is if you want to read the data back into Spark. It wants a table name but it seems to use a colon as the delimiter. Since I want to do read/write, I simply changed my table names to "rec:" and was good-to-go.

    I think your confusion stems from the fact that in your example, you only have one record defined in Spark. What if you have two? Redis needs to create two different keys like "person:1" or "person:2". It uses the term table to describe "person". Is it a key or a table? The docs don’t seem to be consistent.

    My issue at the moment is being able to save to a different Redis db by somehow changing the db context .config("spark.redis.db", 5). This doesn’t seem to work for me when I use it in df.write.format. Any ideas?

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search