본문 바로가기
Study/Bigdata

HDP3 spark, pyspark, zepplin에서 database가 안보일때,

by Red Analyst 2018. 9. 19.
반응형

분명히 HDP3 에서 ranger에 제대로 설정되어 있는데


zepplin 이나 spark 에서 하이브 테이블이 안보입니다.


분명히 2.6때는 보였는데, 


HDP 3 되면서 정말 불친절해진 모양이네요 

  1. cp /etc/hive/conf/hive-site.xml /etc/spark2/conf

답은 매우 간단합니다. 클라이언트 노드에서 spark2 에 hive-site를 옮기시면 됩니다...

HDP3 되면서 자동으로 다 설정해주던건데

이제 다 수동으로 바뀐 모양인네요



https://community.hortonworks.com/questions/221974/zeppelin-not-showing-hive-databasetables-in-hdp30.html


Zeppelin : Not able to connect Hive Databases (through spark2) HDP3.0

I have installed Hortonworks hdp3.0 and configured Zeppelin as well.

When I running spark or sql Zeppelin only showing me default database(This is the default database from Spark which has location as '/apps/spark/warehouse', not the default database of Hive). This is probably because hive.metastore.warehouse.dir property is not set from hive-site.xml and zeppelin is picking this from Spark config (spark.sql.warehouse.dir).

I had similar issue with spark as well and it was due to hive-site.xml file on spark-conf dir, I was able to resolve this by copying hive-site.xml from hive-conf dir to spark-conf dir.

I did the same for Zeppelin as well, copied hive-site.xml in zeppelin dir(where it has zeppelin-site.xml and also copied in zeppelin-external-dependency-conf dir.

But this did not resolve the issue

*** Edit#1 - adding some additional information ***

I have create spark session by enabling hive support through enableHiveSupport(), and even tried setting spark.sql.warehouse.dir config property. but this did not help.

  1. import org.apache.spark.sql.SparkSession
  2.  
  3. val spark =SparkSession.builder.appName("Test Zeppelin").config("spark.sql.warehouse.dir","/apps/hive/db").enableHiveSupport().getOrCreate()

Through some online help, I am learnt that Zeppelin uses only Spark's hive-site.xml file, but I can view all hive databases through spark it's only in Zeppelin (through spark2) I am not able to access Hive databases.

Additionaly Zeppelin is not letting me choose programming language, it by default creates session with scala. I would prefer a Zeppeling session with pyspark.

Any help on this will be highly appreciated


Answer by Shantanu Sharma

After copying hive-site.xml from hive-conf dir to spark-conf dir, I restarted the spark services that reverted those changes, I copied hive-site.xml again and it's working now.


반응형