몇일간 HDP3를 사용하면서 느낀건..꼭 HDP 2.1 때보던것 같은 엉성함이 느껴지네요
첫번째로
HIVE LLAP의 경우도 HDP 2.6에서는 바로 활성화 되었는데, HDP3의 경우 엄청 귀찮게 되어있음
(기억에 Yarn Queue 도 알아서 만들었던것 같은데)
다른것보다, Tez View도 없어지고 Hive View도 없어지고(호튼웍스 커뮤니티에서는 Superset을 쓰라고 하던ㄷ)
Oozzie-View도 이상해지고 default View 말고, HDP에서 예쁘게 보여줬던 어떤게 있었던것 같은데
(물론 쓰진 않았지만.)
Flume 도 없어지고(물론 nifi가 대치된다고 하지만,,, Folder spooling이 Nifi에 있나..? 아직 익숙치가 않아서)
사실, Workflow 엔진을 이미 쓰고 있다면(Azkaban 이나, 루이지나, Airflow 같이) 굳이 할필요없지만
그리고,
간혹 준비가 안되어서, cron으로 하기에는 애매한것들이 있음.
가장 쉬운 솔루션은 zepplin에서 하면되는데, HDP 3에서는 Zepplin의 Cron 기능이 Disable되어 있음
(호튼웍스 커뮤니티에는 cron이 시스템을 접근한다니 해서 oozie를 쓰라고 하는데)
이렇게 해결하면됩니다
Ambari를 통해 Zepplin 설정들어가서 custom zepplin-site에
zeppelin.
https://zeppelin.apache.org/docs/0.8.0/usage/other_features/cron_scheduler.html
Running a Notebook on a Given Schedule Automatically
Apache Zeppelin provides a cron scheduler for each notebook. You can run a notebook on a given schedule automatically by setting up a cron scheduler on the notebook.
Setting up a cron scheduler on a notebook
Click the clock icon on the tool bar and open a cron scheduler dialog box.
There are the following items which you can input or set:
Preset
You can set a cron schedule easily by clicking each option such as 1m
and 5m
. The login user is set as a cron executing user automatically. You can also clear the cron schedule settings by clicking None
.
Cron expression
You can set the cron schedule by filling in this form. Please see Cron Trigger Tutorial for the available cron syntax.
Cron executing user (It is removed from 0.8 where it enforces the cron execution user to be the note owner for security purpose)
You can set the cron executing user by filling in this form and press the enter key.
After execution stop the interpreter
When this checkbox is set to "on", the interpreters which are binded to the notebook are stopped automatically after the cron execution. This feature is useful if you want to release the interpreter resources after the cron execution.
Note: A cron execution is skipped if one of the paragraphs is in a state of
RUNNING
orPENDING
no matter whether it is executed automatically (i.e. by the cron scheduler) or manually by a user opening this notebook.
Enable cron
Set property zeppelin.notebook.cron.enable to true in $ZEPPELIN_HOME/conf/zeppelin-site.xml
to enable Cron feature.
Run cron selectively on folders
In $ZEPPELIN_HOME/conf/zeppelin-site.xml
make sure the property zeppelin.notebook.cron.enable is set to true, and then set property zeppelin.notebook.cron.folders to the desired folder as comma-separated values, e.g. *yst*, Sys?em, System
. This property accepts wildcard and joker.
'Study > Bigdata' 카테고리의 다른 글
Pyspark로 Spark on Yarn Code --1(개발환경구성) (0) | 2018.11.29 |
---|---|
HDP3 에서 Spark 로 Hive Table 를 조회했는데 빈값이 나온경우 (0) | 2018.10.03 |
HDP3 spark, pyspark, zepplin에서 database가 안보일때, (2) | 2018.09.19 |
HDP3 클러스터에 HDF(nifi)설치 (0) | 2018.08.22 |
Spark(Yarn) + Intellj 원격 디버깅 하기 (0) | 2018.08.21 |
intellj, Spark Assembly (0) | 2018.08.17 |