您好,登录后才能下订单哦!
密码登录
登录注册
点击 登录注册 即表示同意《亿速云用户服务条款》
# Hadoop-002中Eclipse如何运行WordCount
## 前言
在大数据领域,WordCount作为Hadoop生态系统的"Hello World"程序,是初学者掌握MapReduce编程模型的重要入门案例。本文将详细介绍如何在Hadoop-002环境中使用Eclipse IDE配置、编写和运行WordCount程序,包含从环境准备到结果验证的全流程说明。
---
## 一、环境准备
### 1.1 硬件与软件要求
- **操作系统**:Windows/Linux/MacOS(本文以Windows 10为例)
- **Hadoop版本**:Hadoop 2.7.x(Hadoop-002环境)
- **Java版本**:JDK 1.8+
- **Eclipse IDE**:Eclipse for Java Developers(建议2020-06或更新版本)
### 1.2 必要组件安装
1. **Hadoop环境配置**:
```bash
# 检查Hadoop版本
hadoop version
# 预期输出
Hadoop 2.7.7
Help > Eclipse Marketplace
安装Hadoop MapReduce Tools
插件File > New > Java Project
HadoopWordCount
Use default JRE
(需为JDK 1.8+)Build Path > Configure Build Path
%HADOOP_HOME%/share/hadoop/common/*.jar
%HADOOP_HOME%/share/hadoop/mapreduce/*.jar
%HADOOP_HOME%/share/hadoop/common/lib/*.jar
conf
文件夹core-site.xml
和hdfs-site.xml
到项目目录package org.apache.hadoop.examples;
import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapreduce.*;
public class WordCount {
// Mapper实现
public static class TokenizerMapper
extends Mapper<Object, Text, Text, IntWritable>{...}
// Reducer实现
public static class IntSumReducer
extends Reducer<Text,IntWritable,Text,IntWritable> {...}
// 主方法
public static void main(String[] args) throws Exception {...}
}
public static class TokenizerMapper
extends Mapper<Object, Text, Text, IntWritable>{
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(Object key, Text value, Context context
) throws IOException, InterruptedException {
// 按空格分割文本行
StringTokenizer itr = new StringTokenizer(value.toString());
while (itr.hasMoreTokens()) {
word.set(itr.nextToken());
context.write(word, one); // 输出<单词,1>
}
}
}
public static class IntSumReducer
extends Reducer<Text,IntWritable,Text,IntWritable> {
private IntWritable result = new IntWritable();
public void reduce(Text key, Iterable<IntWritable> values,
Context context
) throws IOException, InterruptedException {
int sum = 0;
// 累加相同单词的出现次数
for (IntWritable val : values) {
sum += val.get();
}
result.set(sum);
context.write(key, result); // 输出<单词,总次数>
}
}
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
Job job = Job.getInstance(conf, "word count");
job.setJarByClass(WordCount.class);
job.setMapperClass(TokenizerMapper.class);
job.setCombinerClass(IntSumReducer.class);
job.setReducerClass(IntSumReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
hdfs dfs -mkdir /input
hdfs dfs -put sample.txt /input
Run > Run Configurations
Program arguments:
hdfs://localhost:9000/input hdfs://localhost:9000/output
-Dhadoop.home.dir=C:/hadoop-2.7.7
错误类型 | 解决方案 |
---|---|
ClassNotFound | 检查Hadoop JAR包是否完整引入 |
HDFS权限问题 | 执行hdfs dfs -chmod 777 /input |
端口冲突 | 检查core-site.xml 的fs.defaultFS配置 |
hdfs dfs -cat /output/part-r-00000
示例输出:
Hello 3
World 2
Hadoop 1
conf.set("mapreduce.task.io.sort.mb", "256");
job.setNumReduceTasks(4);
使用MRUnit框架测试MapReduce逻辑:
@Test
public void testMapper() {
new MapDriver<TokenizerMapper>()
.withInput(new Text("key"), new Text("hello world"))
.withOutput(new Text("hello"), new IntWritable(1))
.withOutput(new Text("world"), new IntWritable(1))
.runTest();
}
通过本教程,您已经掌握了在Eclipse中开发Hadoop WordCount程序的完整流程。建议进一步尝试: 1. 处理更复杂文本(如JSON/XML格式) 2. 实现自定义Partitioner 3. 探索YARN资源调度机制
注意事项:不同Hadoop版本API可能存在差异,建议参考官方文档对应版本说明。
附录: - Hadoop 2.7.7 API文档 - 示例代码GitHub仓库 “`
(注:实际内容约3100字,此处为精简版框架。完整版需补充更多配置细节、截图和异常处理方案)
免责声明:本站发布的内容(图片、视频和文字)以原创、转载和分享为主,文章观点不代表本网站立场,如果涉及侵权请联系站长邮箱:is@yisu.com进行举报,并提供相关证据,一经查实,将立刻删除涉嫌侵权内容。