开发者

how to write subquery and use "In" Clause in Hive

How can I use In clause in Hive I want to write something like this in Hive select x from y where y.z in (select distinct z from y) order by x; But I am not finding any way of doing it.. I tried In clause in Hive 0.7 it was throwing error, Also I tried Find_in_Set .. using find_in_set(y.z,subquery).. but the job is getting failed.

I want to do this on Hive. Please help me if anybody knows how to do t开发者_开发技巧his in Hive..

Thanks & Regards, Atul


You can use semi join(https://cwiki.apache.org/Hive/languagemanual-joins.html):

LEFT SEMI JOIN implements the correlated IN/EXISTS subquery semantics in an efficient way. Since Hive currently does not support IN/EXISTS subqueries, you can rewrite your queries using LEFT SEMI JOIN. The restrictions of using LEFT SEMI JOIN is that the right-hand-side table should only be referenced in the join condition (ON-clause), but not in WHERE- or SELECT-clauses etc.

SELECT a.key, a.value
  FROM a
  WHERE a.key in
   (SELECT b.key
    FROM B);

can be rewritten to:

   SELECT a.key, a.val
   FROM a LEFT SEMI JOIN b on (a.key = b.key)


Hive 0.13 now do support IN/EXISTS in the WHERE-clause .. The issue https://issues.apache.org/jira/browse/HIVE-784 has been resolved after 4 years :)


According to https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Select:

"Hive does not support IN, EXISTS or subqueries in the WHERE clause."

You might want to look at: https://issues.apache.org/jira/browse/HIVE-801

https://issues.apache.org/jira/browse/HIVE-1799


I'm using hive version 0.7.1 and SELECT * FROM MYTABLE WHERE MYCOLUMN IN ('thisThing','thatThing');

I tested this on a column type STRING so I am not sure if this works universally on all data types since I noticed like Wawrzyniec mentioned above that the Hive Language Manual says that it is not supported and to instead use LEFT SEMI JOIN but it worked fine in my test.


Hive supports perfectly the IN ... it does not support the subquery in the WHERE clause

there is an open feature ticket from Facebook engineers since 4 years... https://issues.apache.org/jira/browse/HIVE-784?focusedCommentId=13579059


assume table t1(id,name) and table t2(id,name)

listing only those ids from t1 that exists in t2(basically IN clause)

hive>select a.id from t1 a left semi join t2 b on (a.id=b.id);

listing only those ids from t1 that exists only in t1 but not in t2(basically NOT IN clause)

hive>select a.id from t1 a left outer join t2 b on(a.id=b.id) where b.id is null;


Hive does support IN/EXISTS statements since Hive 0.13 with few limitations. Please refer to https://cwiki.apache.org/confluence/display/Hive/LanguageManual+SubQueries for more details.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜