분명히 HDP3 에서 ranger에 제대로 설정되어 있는데


zepplin 이나 spark 에서 하이브 테이블이 안보입니다.


분명히 2.6때는 보였는데, 


HDP 3 되면서 정말 불친절해진 모양이네요 

  1. cp /etc/hive/conf/hive-site.xml /etc/spark2/conf

답은 매우 간단합니다. 클라이언트 노드에서 spark2 에 hive-site를 옮기시면 됩니다...

HDP3 되면서 자동으로 다 설정해주던건데

이제 다 수동으로 바뀐 모양인네요



https://community.hortonworks.com/questions/221974/zeppelin-not-showing-hive-databasetables-in-hdp30.html


Zeppelin : Not able to connect Hive Databases (through spark2) HDP3.0

I have installed Hortonworks hdp3.0 and configured Zeppelin as well.

When I running spark or sql Zeppelin only showing me default database(This is the default database from Spark which has location as '/apps/spark/warehouse', not the default database of Hive). This is probably because hive.metastore.warehouse.dir property is not set from hive-site.xml and zeppelin is picking this from Spark config (spark.sql.warehouse.dir).

I had similar issue with spark as well and it was due to hive-site.xml file on spark-conf dir, I was able to resolve this by copying hive-site.xml from hive-conf dir to spark-conf dir.

I did the same for Zeppelin as well, copied hive-site.xml in zeppelin dir(where it has zeppelin-site.xml and also copied in zeppelin-external-dependency-conf dir.

But this did not resolve the issue

*** Edit#1 - adding some additional information ***

I have create spark session by enabling hive support through enableHiveSupport(), and even tried setting spark.sql.warehouse.dir config property. but this did not help.

  1. import org.apache.spark.sql.SparkSession
  2.  
  3. val spark =SparkSession.builder.appName("Test Zeppelin").config("spark.sql.warehouse.dir","/apps/hive/db").enableHiveSupport().getOrCreate()

Through some online help, I am learnt that Zeppelin uses only Spark's hive-site.xml file, but I can view all hive databases through spark it's only in Zeppelin (through spark2) I am not able to access Hive databases.

Additionaly Zeppelin is not letting me choose programming language, it by default creates session with scala. I would prefer a Zeppeling session with pyspark.

Any help on this will be highly appreciated


Answer by Shantanu Sharma

After copying hive-site.xml from hive-conf dir to spark-conf dir, I restarted the spark services that reverted those changes, I copied hive-site.xml again and it's working now.


  1. Favicon of http://namioto.github.io BlogIcon namioto 2019.05.02 11:54

    안녕하세요. 근데 이렇게 하면 ambari에서 spark 재기동 시 설정을 덮어씌워버리지 않나요??

    • Favicon of https://redeyesofangel.tistory.com BlogIcon Yuika eizt 2019.05.06 17:57 신고

      정확히는, 저렇게 해버리면 spark trift server 였나 history server 였나. 이것이 재시작하다가 죽어버립니다.

      일단 몇가지 고민이 있는데, 저당시 저거 말고도 다른 문제가 ACID hive table을 spark에서 읽어오지 못하더라고요 (지금 드라이버가 새로 나온것 같습니다 )

      그래서 지금은 원래 설정대로 하고서, external table빼서 Spark에서 ORC 또는 Parquet를 직접처리하는 방법으로 변경하였습니다 .