TEZ MRR optimize to MR?

发布时间:2020-06-16 19:33:51 作者:r7raul
来源:网络 阅读:723

https://issues.apache.org/jira/browse/HIVE-2340

select userid,count(*) from u_data group by userid order by userid    will product MRR.

 

I think when the result of  userid,count(*) is small(one reduce can process the result) . This query plan can optimize to MR ?


To prevent bad reducer merging, the reducer merging only kicks in when the

optimizer thinks it gets a perf boost.

 

MR -> MRR is not a big win when it comes Tez, due to container-reuse -

going wide on the large cardinality in case of missing map-side

aggregation will be safer.

 

If hive.map.aggr=true and the userid set fits within memory, then smushing

the reducers would be nicer.

 

To reset the wide-narrow checks, do

 

set hive.optimize.reducededuplication.min.reducer=1;

 

 

But be aware that it will fail (I1ve seen full disks) as you scale upwards

to the 10+ Tb cases.

 

Cheers,

Gopal

hive.optimize.reducededuplication.min.reducer

Reduce deduplication merges two RSs (reduce sink operators) by moving key/parts/reducer-num of the child RS to parent RS. That means if reducer-num of the child RS is fixed (order by or forced bucketing) and small, it can make very slow, single MR. The optimization will be disabled if number of reducers is less than specified value.


推荐阅读:
  1. Hadoop编程基于MR程序如何实现倒排索引
  2. 如何避免出现HTML5错误

免责声明:本站发布的内容(图片、视频和文字)以原创、转载和分享为主,文章观点不代表本网站立场,如果涉及侵权请联系站长邮箱:is@yisu.com进行举报,并提供相关证据,一经查实,将立刻删除涉嫌侵权内容。

mr mrr tez

上一篇:oracle linux 5.8安装oracle 11g rac环境之grid安装

下一篇:DHCP服务器异常,上不了网解决办法

相关阅读

您好,登录后才能下订单哦!

密码登录
登录注册
其他方式登录
点击 登录注册 即表示同意《亿速云用户服务条款》