본문 바로가기

Study/Bigdata

HDP3 spark, pyspark, zepplin에서 database가 안보일때,

분명히 HDP3 에서 ranger에 제대로 설정되어 있는데


zepplin 이나 spark 에서 하이브 테이블이 안보입니다.


분명히 2.6때는 보였는데, 


HDP 3 되면서 정말 불친절해진 모양이네요 

  1. cp /etc/hive/conf/hive-site.xml /etc/spark2/conf

답은 매우 간단합니다. 클라이언트 노드에서 spark2 에 hive-site를 옮기시면 됩니다...

HDP3 되면서 자동으로 다 설정해주던건데

이제 다 수동으로 바뀐 모양인네요



https://community.hortonworks.com/questions/221974/zeppelin-not-showing-hive-databasetables-in-hdp30.html


Zeppelin : Not able to connect Hive Databases (through spark2) HDP3.0

I have installed Hortonworks hdp3.0 and configured Zeppelin as well.

When I running spark or sql Zeppelin only showing me default database(This is the default database from Spark which has location as '/apps/spark/warehouse', not the default database of Hive). This is probably because hive.metastore.warehouse.dir property is not set from hive-site.xml and zeppelin is picking this from Spark config (spark.sql.warehouse.dir).

I had similar issue with spark as well and it was due to hive-site.xml file on spark-conf dir, I was able to resolve this by copying hive-site.xml from hive-conf dir to spark-conf dir.

I did the same for Zeppelin as well, copied hive-site.xml in zeppelin dir(where it has zeppelin-site.xml and also copied in zeppelin-external-dependency-conf dir.

But this did not resolve the issue

*** Edit#1 - adding some additional information ***

I have create spark session by enabling hive support through enableHiveSupport(), and even tried setting spark.sql.warehouse.dir config property. but this did not help.

  1. import org.apache.spark.sql.SparkSession
  2.  
  3. val spark =SparkSession.builder.appName("Test Zeppelin").config("spark.sql.warehouse.dir","/apps/hive/db").enableHiveSupport().getOrCreate()

Through some online help, I am learnt that Zeppelin uses only Spark's hive-site.xml file, but I can view all hive databases through spark it's only in Zeppelin (through spark2) I am not able to access Hive databases.

Additionaly Zeppelin is not letting me choose programming language, it by default creates session with scala. I would prefer a Zeppeling session with pyspark.

Any help on this will be highly appreciated


Answer by Shantanu Sharma

After copying hive-site.xml from hive-conf dir to spark-conf dir, I restarted the spark services that reverted those changes, I copied hive-site.xml again and it's working now.