mapreduce wordcount怎么理解

发布时间：2021-12-30 14:07:11 作者：iii
来源：亿速云阅读：190

这篇文章主要介绍“mapreduce wordcount怎么理解”，在日常操作中，相信很多人在mapreduce wordcount怎么理解问题上存在疑惑，小编查阅了各式资料，整理出简单好用的操作方法，希望对大家解答”mapreduce wordcount怎么理解”的疑惑有所帮助！接下来，请跟着小编一起来学习吧！

wordcount统计个数，在看代码时总是能看懂，但是真正的逻辑反而一直不明比，比如map端时怎么处理，reduce时又是怎么处理的，现在明白了。

原理是这样的，map端时读取每一行数据，并把每行数据中的一个字符统计一次，如下：

map 数据 {key,value} :

{0,hello word by word}

{1,hello hadoop by hadoop}

上面就是map端输入的key与value，在map端处理后会生成以下数据：

｛hello,1} {word,1} {by,1} {word,1}

{hello,1} {hadoop,1} {by,1} {hadoop,1}

当看到这时大家都能明白，但是在reduce端时，就怎么也看不明白了，不知道是怎么对字符做统一的，再下通过对hadoop原理的分析得出在到reduce端时，会对map端发过来的数据进行清洗，清洗后的数据应该是以下结构：

[{hello},{1,1}] [{word},{1,1}] [{by},{1,1}] [{hadoop},{1,1}]

然后输入到reduce端，reduce会对每一个values做循环操作，对数据进行叠加，并输出到本地，具体代码请继续欣赏，不做多过解析。

public class WordCount extends Configured implements Tool{
public static class Map extends Mapper<LongWritable,Text,Text,IntWritable>{
  private final static IntWritable one = new IntWritable(1);
  private Text word = new Text();
  public void map(LongWritable key,Text value, Context context)
  throws IOException,InterruptedException{
   String line = value.toString();
   StringTokenizer tokenizer = new StringTokenizer();
   while(tokenizer.hasMoreTokens()){
    word.set(tokenizer.nextToken);
    context.write(word,one);
   }
  }
}

public static class Reduce extends Reducer<Text,IntWritable,Text,IntWritable>{
  public void reduce(Text key,Iterable<IntWritable> values,Context context)
  throws IOException,InterruptedException{
   int sum = 0 ;
   for(IntWritable val: values) {
    sum += val.get();
   }
   context.write(key,new IntWritable(sum));
  }
}

public int run(String[] arge) throws Exception{
  Job job = new Job(getConf());
  job.setJarByClass(WordCount.class);
  job.setJobName("wordcount");

  job.setOutputKeyClass(Text.class);
  job.setOutputValueClass(IntWritable.class);

  job.setMapperClass(Map.class);
  job.setReduceClass(reduce.class);

  job.setInputFormatClass(TextInputFormat.class);
  job.setOutputFormatClass(TextInputFormat.class);

  FileInputFormat.setInputPaths(job,new Path(args[0]));
  FileInputFormat.setOutputPaths(job, new Path(args[1]));

  boolean success = job.waitForCompletion(true);
  return success ? 0 : 1;
}

public static void main(String[] args) throws Exception{
  int ret = ToolRunner.run(new WordCount(),args);
  System.exit(ret);
}
}

到此，关于“mapreduce wordcount怎么理解”的学习就结束了，希望能够解决大家的疑惑。理论与实践的搭配能更好的帮助大家学习，快去试试吧！若想继续学习更多相关知识，请继续关注亿速云网站，小编会继续努力为大家带来更多实用的文章！

mapreduce wordcount怎么理解

相关阅读