renovate

Question

// This query:
sqlContext.sql("select * from retail_invoice").show
 
// gives this output:
 
 
+---------+---------+-----------+--------+-----------+---------+----------+-------+
 
 
|invoiceno|stockcode|description|quantity|invoicedate|unitprice|customerid|country| 
 
 
+---------+---------+-----------+--------+-----------+---------+----------+-------+
 
 
+---------+---------+-----------+--------+-----------+---------+----------+-------+
 
// The Hive DDL for the table in HiveView 2.0:
CREATE TABLE `retail_invoice`(
  `invoiceno` string, 
  `stockcode` string, 
  `description` string, 
  `quantity` int, 
  `invoicedate` string, 
  `unitprice` double, 
  `customerid` string, 
  `country` string)
CLUSTERED BY ( 
  stockcode) 
INTO 2 BUCKETS
ROW FORMAT SERDE 
  'org.apache.hadoop.hive.ql.io.orc.OrcSerde' 
STORED AS INPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
LOCATION
  'hdfs://hadoopsilon2.zdwinsqlad.local:8020/apps/hive/warehouse/retail_invoice'
TBLPROPERTIES (
  'COLUMN_STATS_ACCURATE'='{\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"country\":\"true\",\"quantity\":\"true\",\"customerid\":\"true\",\"description\":\"true\",\"invoiceno\":\"true\",\"unitprice\":\"true\",\"invoicedate\":\"true\",\"stockcode\":\"true\"}}', 
  'numFiles'='2', 
  'numRows'='541909', 
  'orc.bloom.filter.columns'='StockCode, InvoiceDate, Country', 
  'rawDataSize'='333815944', 
  'totalSize'='5642889', 
  'transactional'='true', 
  'transient_lastDdlTime'='1517516006')

I can query the data in Hive just fine. The data is inserted from Nifi using the PutHiveStreaming processor.

We have tried to recreate the table, but the same problem arises. I haven't found any odd looking configurations.

Any Ideas on what could be going on here?

Answer 1

@Matt Krueger

Your table is ACID i.e. transaction enabled. Spark doesn't support reading Hive ACID table. Take a look at SPARK-15348 and SPARK-16996

인트라넷(폐쇄망) 환경에서 Ambari, HDP 배포하기 (0)	2019.05.06
apache phoenix org.apache.phoenix.exception.PhoenixIOException: org.apache.hadoop.hbase.security.AccessDeniedException: Insufficient permissions for user jdbc (0)	2018.12.12
Pyspark로 Spark on Yarn Code --1(개발환경구성) (0)	2018.11.29
HDP3 spark, pyspark, zepplin에서 database가 안보일때, (2)	2018.09.19
HDP3 제플린(Zepplin) 스케쥴(Cron) 활성화 (0)	2018.09.04
HDP3 클러스터에 HDF(nifi)설치 (1)	2018.08.22

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

renovate

HDP3 에서 Spark 로 Hive Table 를 조회했는데 빈값이 나온경우

Spark not reading data from a Hive managed table. Meanwhile, Hive can query the data in the table just fine.

1 Reply

'Study > Bigdata' 카테고리의 다른 글

티스토리툴바

개인정보

단축키

내 블로그

블로그 게시글

모든 영역