基于0.14
- left join 时候左边表使用join列的谓词过滤
BUG语句
- explain
- select t.cookie,t.datetime,c.cookie
- from log t
- left join cookie c
- on t.cookie = c.cookie
- where t.dt = '2015_01_01_10'
- and t.cookie = 'xxxxx'
问题原因:
查看执行计划:- Stage: Stage-1
- Map Reduce
- Map Operator Tree:
- TableScan
- alias: t
- Statistics: Num rows: 782692 Data size: 97053857 Basic stats: COMPLETE Column stats: NONE
- Filter Operator
- predicate: (cookie = 'xxxxx') (type: boolean)
- Statistics: Num rows: 391346 Data size: 48526928 Basic stats: COMPLETE Column stats: NONE
- Reduce Output Operator
- key expressions: 'xxxxx' (type: string)
- sort order: +
- Statistics: Num rows: 391346 Data size: 48526928 Basic stats: COMPLETE Column stats: NONE
- value expressions: datetime (type: string)
- TableScan
- alias: c
- Statistics: Num rows: 26938739 Data size: 1728178442 Basic stats: COMPLETE Column stats: NONE
- Filter Operator
- predicate: (cookie = 'xxxxx') (type: boolean)
- Statistics: Num rows: 13469369 Data size: 864089188 Basic stats: COMPLETE Column stats: NONE
- Reduce Output Operator
- key expressions: cookie (type: string)
- sort order: +
- Map-reduce partition columns: cookie (type: string)
- Statistics: Num rows: 13469369 Data size: 864089188 Basic stats: COMPLETE Column stats: NONE
- Reduce Operator Tree:
执行计划如上,其中t表在map端的reduce时候partition columns丢失,导致相同的cookie无法和c表分发到同一个reduce上。
测试:
1、将reduce数量设置为1:
结果发现join操作正确2、改写sql,将t.cookie = 'xxxx' 改为in操作
查看执行计划发现t表在map端的reduce时候partition columns正常,sql也执行正常。