hiveql - Hive Data selecting latest value based on timestamp -
i have table having following columns.
c1,c2,process timestamp,insertdatetimestamp p1,v1,2014-01-30 12:15:23,2013-10-01 05:34:23 p1,v2,2014-01-31 05:11:34,2013-12-01 06:12:31 p1,v3,2014-01-31 07:16:05,2012-09-01 07:45:20 p2,v4,2014-02-01 09:22:52,2013-12-01 06:12:31 p2,v5,2014-02-01 09:22:52,2012-09-01 07:45:20
now, want fetch unique row each primary key based on latest process timestamp
.
if process timestamp
same row having latest insertdatetimestamp
should chosen.
so, result should be.
p1,v3,2014-01-31 07:16:05,2012-09-01 07:45:20 p2,v4,2014-02-01 09:22:52,2013-12-01 06:12:31
how achieve via hiveql ?
i using hive 0.10. can not use subquery in or exists.
thanks.
select c1, s.c2, s.processtimestamp, s.insertdatetimestamp ( select c1, max(named_struct('unixtime', unix_timestamp(processtimestamp, 'yyyy-mm-dd hh:mm:ss'), 'c2', c2, 'processtimestamp', processtimestamp, 'insertdatetimestamp', insertdatetimestamp)) s my_table group c1 ) t;
doing max of struct compares first field, second field, etc. if struct together, parsed timestamp value first, struct representing row. un-struct selecting out individual fields.
Comments
Post a Comment