skip to Main Content
    ...
    <property>
        <name>hive.metastore.warehouse.dir</name>
        <value>hdfs://spark-master-01:9000/skybluelee/skybluelee_warehouse_mysql_5.7</value>
        <description>location of default database for the warehouse</description>
    </property>
    ...

the code is a part of /user/spark3/conf/hive-site.xml

At first the value was

hdfs://spark-master-01:9000/kikang/skybluelee_warehouse_mysql_5.7

And I changed the value

hdfs://spark-master-01:9000/skybluelee/skybluelee_warehouse_mysql_5.7

Below there is a code and result

println(spark.conf.get("spark.sql.warehouse.dir"))  //--Default : spark-warehouse

spark
    .sql("""
        SELECT 
            website, 
            avg(age) avg_age, 
            max(id) max_id
        FROM 
            people a 
            JOIN 
            projects b 
            ON a.name = b.manager 
        WHERE 
            a.age > 11 
        GROUP BY 
            b.website
        """)
    .write
    .mode("overwrite")  //--Overwrite mode....
    .saveAsTable("JoinedPeople")  //--saveAsTable(<warehouse_table_name>)....
    
sql("SELECT * FROM JoinedPeople").show(1000)

hdfs://spark-master-01:9000/skybluelee/skybluelee_warehouse_mysql_5.7
+--------------------+-------+------+
|             website|avg_age|max_id|
+--------------------+-------+------+
|http://hive.apach...|   30.0|     2|
|http://kafka.apac...|   19.0|     3|
|http://storm.apac...|   30.0|     9|
+--------------------+-------+------+

value ‘spark.sql.warehouse.dir’ was changed kikang into skybluelee as I want.

but the location of table "JoinedPeople" doesn’t change. The location is ‘hdfs://spark-master-01:9000/kikang/skybluelee_warehouse_mysql_5.7’ – the first value in hive-site.xml

I want to change the location of default database.

How can I change the default location?

I changed ‘spark-defaults.conf’ and of courese turn off & on the ubuntu. But not effective

2

Answers


  1. Chosen as BEST ANSWER

    I found what I missed!

    <property>
      <name>javax.jdo.option.ConnectionURL</name>
      <value>jdbc:mysql://spark-worker-01:3306/metastore_db?createDatabaseIfNotExist=true</value>
      <description>metadata is stored in a MySQL server</description>
    </property>
    

    this is a part of warehouse in mysql_5.7 when I tried first in metastore_db was created, so even I changed the location, that doesn't make change in metastore


  2. May you check what your Spark version is in this scenario?
    According to Hive Tables in the official Spark documentation:

    Note that the hive.metastore.warehouse.dir property in hive-site.xml is deprecated since Spark 2.0.0. Instead, use spark.sql.warehouse.dir to specify the default location of database in warehouse. You may need to grant write privilege to the user who starts the Spark application.

    1. Does changing the property in hive-site.xml work for you (assuming the Spark version is above 2.0.0)?

       ...
       <property>
           <name>spark.sql.warehouse.dir</name>
           <value>hdfs://spark-master-01:9000/skybluelee/skybluelee_warehouse_mysql_5.7</value>
           <description>location of default database for the warehouse</description>
       </property>
       ...
      
    2. Does setting the property before the initialization of a Spark session work for you?

      import org.apache.spark.sql.SparkSession
      
      val warehouseLocation = "hdfs://spark-master-01:9000/skybluelee/skybluelee_warehouse_mysql_5.7"
      
      // Create a SparkSession with the desired warehouse location
      val spark = SparkSession
        .builder()
        .appName("Spark Hive Example")
        .config("spark.sql.warehouse.dir", warehouseLocation)
        .enableHiveSupport()
        .getOrCreate()
      
      // Import the necessary Spark functions and implicit
      import spark.implicits._
      import spark.sql
      sql("""
       SELECT 
           website, 
           avg(age) avg_age, 
           max(id) max_id
       FROM 
           people a 
           JOIN 
           projects b 
           ON a.name = b.manager 
       WHERE 
           a.age > 11 
       GROUP BY 
           b.website
       """)
         .write
         .mode("overwrite")
         .saveAsTable("JoinedPeople")
      
      // Retrieve the location of the "JoinedPeople" table from the Hive metastore
      val tableLocation = spark.sql("DESCRIBE EXTENDED JoinedPeople").filter($"col_name" === "Location").select("data_type").collect()(0)(0)
      println(s"Table location: $tableLocation")
      
      
      
      
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search