hiveql - Hive Data selecting latest value based on timestamp -

- June 15, 2010

i have table having following columns.

c1,c2,process timestamp,insertdatetimestamp p1,v1,2014-01-30 12:15:23,2013-10-01 05:34:23  p1,v2,2014-01-31 05:11:34,2013-12-01 06:12:31  p1,v3,2014-01-31 07:16:05,2012-09-01 07:45:20  p2,v4,2014-02-01 09:22:52,2013-12-01 06:12:31  p2,v5,2014-02-01 09:22:52,2012-09-01 07:45:20

now, want fetch unique row each primary key based on latest process timestamp.

if process timestamp same row having latest insertdatetimestamp should chosen.

so, result should be.

p1,v3,2014-01-31 07:16:05,2012-09-01 07:45:20  p2,v4,2014-02-01 09:22:52,2013-12-01 06:12:31

how achieve via hiveql ?

i using hive 0.10. can not use subquery in or exists.

thanks.

select c1, s.c2, s.processtimestamp, s.insertdatetimestamp (   select c1, max(named_struct('unixtime', unix_timestamp(processtimestamp, 'yyyy-mm-dd hh:mm:ss'), 'c2', c2, 'processtimestamp', processtimestamp, 'insertdatetimestamp', insertdatetimestamp)) s   my_table group c1 ) t;

doing max of struct compares first field, second field, etc. if struct together, parsed timestamp value first, struct representing row. un-struct selecting out individual fields.

Search This Blog

KBPS

hiveql - Hive Data selecting latest value based on timestamp -

Comments

Post a Comment

Popular posts from this blog

node.js - StackOverflow API not returning JSON -

python - Subclassed QStyledItemDelegate ignores Stylesheet -

java - HttpClient 3.1 Connection pooling vs HttpClient 4.3.2 -