Hadoop中MapReduce获取命令行参数的方法

发布时间：2021-07-26 18:04:38 作者：chen
来源：亿速云阅读：369

本篇内容介绍了“Hadoop中MapReduce获取命令行参数的方法”的有关知识，在实际案例的操作过程中，不少人都会遇到这样的困境，接下来就让小编带领大家学习一下如何处理这些情况吧！希望大家仔细阅读，能够学有所成！

package cmd;

import java.io.IOException;
import java.net.URI;
import java.net.URISyntaxException;

import mapreduce.MyMapper;
import mapreduce.MyReducer;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
import org.apache.hadoop.mapreduce.lib.partition.HashPartitioner;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;

/**
 * 计算单词
 * @author Xr
 *
 */
public class WordCountApp  extends Configured implements Tool{
    public static String INPUT_PATH = "";
    public static String OUTPUT_PATH = "";
    @Override
    public int run(String[] args) throws Exception {
        INPUT_PATH = args[0];
        OUTPUT_PATH = args[1];
        Configuration conf = new Configuration();
        
        //判处是否存在输入目录
        existsFile(conf);
        Job job = new Job(conf,WordCountApp.class.getName());
        //打成jar包
        job.setJarByClass(WordCountApp.class);
        //1.1    从哪里读取数据
        FileInputFormat.setInputPaths(job, INPUT_PATH);
        //把输入文本中的每一行解析成一个个键值对
        job.setInputFormatClass(TextInputFormat.class);
        
        //1.2    设置自定义map函数
        job.setMapperClass(MyMapper.class);
        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(LongWritable.class);
        
        //1.3    分区
        job.setPartitionerClass(HashPartitioner.class);
        job.setNumReduceTasks(1);
        
        //1.4    TODO    排序分组
        //1.5    TODO    规约
        
        //2.1    是框架做的，不需要程序员手工干预。
        //2.2    自定义reducer函数
        job.setReducerClass(MyReducer.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(LongWritable.class);
        
        //2.3    写入到HDFS中
        FileOutputFormat.setOutputPath(job, new Path(OUTPUT_PATH));
        //格式化类
        job.setOutputFormatClass(TextOutputFormat.class);
        
        //提交给JobTracker去执行
        job.waitForCompletion(true);
        return 0;
    }
    public static void main(String[] args)throws Exception {
        ToolRunner.run(new WordCountApp(), args);
    }
    private static void existsFile(Configuration conf) throws IOException,
            URISyntaxException {
        FileSystem fs = FileSystem.get(new URI(INPUT_PATH), conf);
        if(fs.exists(new Path(OUTPUT_PATH))){
            fs.delete(new Path(OUTPUT_PATH), true);
        }
    }
}
运行：hadoop jar WordCount.jar hdfs://hadoop:9000/hello  hdfs://hadoop:9000/h2


                                                                     Name : Xr
                                                                     Date : 2014-03-02 21:47

“Hadoop中MapReduce获取命令行参数的方法”的内容就介绍到这里了，感谢大家的阅读。如果想了解更多行业相关的知识可以关注亿速云网站，小编将为大家输出更多高质量的实用文章！

Hadoop中MapReduce获取命令行参数的方法

相关阅读