怎么使用Pig分析Hadoop日志

发布时间:2021-12-10 09:48:30 作者:iii
来源:亿速云 阅读:165

这篇文章主要讲解了“怎么使用Pig分析Hadoop日志”,文中的讲解内容简单清晰,易于学习与理解,下面请大家跟着小编的思路慢慢深入,一起来研究和学习“怎么使用Pig分析Hadoop日志”吧!

目标

计算出每个ip的点击次数,例如 123.24.56.57 13 24.53.23.123 7 34.56.78.120 20 等等……

待分析文件

220.181.108.151 - - [31/Jan/2012:00:02:32 +0800] "GET /home.php?mod=space&uid=158&do=album&view=me&from=space HTTP/1.1" 200 8784 "-" "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)"

208.115.113.82 - - [31/Jan/2012:00:07:54 +0800] "GET /robots.txt HTTP/1.1" 200 582 "-" "Mozilla/5.0 (compatible; Ezooms/1.0; 
ezooms.bot@gmail.com
)"

220.181.94.221 - - [31/Jan/2012:00:09:24 +0800] "GET /home.php?mod=spacecp&ac=pm&op=showmsg&handlekey=showmsg_3&touid=3&pmid=0&daterange=2&pid=398&tid=66 HTTP/1.1" 200 10070 "-" "Sogou web spider/4.0(+http://www.sogou.com/docs/help/webmasters.htm#07)"

112.97.24.243 - - [31/Jan/2012:00:14:48 +0800] "GET /data/cache/style_2_common.css?AZH HTTP/1.1" 200 57752 "
http://f.dataguru.cn/forum-58-1.html
" "Mozilla/5.0 (iPhone; CPU iPhone OS 5_0_1 like Mac OS X) AppleWebKit/534.46 (KHTML, like Gecko) Mobile/9A406"

112.97.24.243 - - [31/Jan/2012:00:14:48 +0800] "GET /data/cache/style_2_widthauto.css?AZH HTTP/1.1" 200 1024 "
http://f.dataguru.cn/forum-58-1.html
" "Mozilla/5.0 (iPhone; CPU iPhone OS 5_0_1 like Mac OS X) AppleWebKit/534.46 (KHTML, like Gecko) Mobile/9A406"

112.97.24.243 - - [31/Jan/2012:00:14:48 +0800] "GET /data/cache/style_2_forum_forumdisplay.css?AZH HTTP/1.1" 200 11486 "
http://f.dataguru.cn/forum-58-1.html
" "Mozilla/5.0 (iPhone; CPU iPhone OS 5_0_1 like Mac OS X) AppleWebKit/534.46 (KHTML, like Gecko) Mobile/9A406"

环境配置

# .bash_profile
# Get the aliases and functions
if [ -f ~/.bashrc ]; then
        . ~/.bashrc
fi
# User specific environment and startup programs
export ANT_HOME=/home/wukong/usr/apache-ant-1.9.4
export HADOOP_HOME=/home/wukong/usr/hadoop-1.2.1
export PIG_HOME=/home/wukong/usr/pig-0.13.0
export PIG_CLASSPATH=$HADOOP_HOME/conf

PATH=$PATH:$HOME/bin:$ANT_HOME/bin:$HADOOP_HOME:$HADOOP_HOME/bin:$PIG_HOME/bin:$PIG_CLASSPATH
export PATH

Pig脚本

A = LOAD '/user/wukong/w08/access_log.txt' USING PigStorage(' ') AS (ip);
B = GROUP A BY ip;
C = FOREACH B GENERATE group AS ip, COUNT(A.ip) AS countip; 
STORE C INTO '/user/wukong/w08/access_log.out.txt';

执行过程

[wukong@bd11 ~]$ pig -x mapreduce
Warning: $HADOOP_HOME is deprecated.
14/08/28 01:10:51 INFO pig.ExecTypeProvider: Trying ExecType : LOCAL
14/08/28 01:10:51 INFO pig.ExecTypeProvider: Trying ExecType : MAPREDUCE
14/08/28 01:10:51 INFO pig.ExecTypeProvider: Picked MAPREDUCE as the ExecType
2014-08-28 01:10:51,242 [main] INFO  org.apache.pig.Main - Apache Pig version 0.13.0 (r1606446) compiled Jun 29 2014, 02:29:34
2014-08-28 01:10:51,242 [main] INFO  org.apache.pig.Main - Logging error messages to: /home/wukong/pig_1409159451241.log
2014-08-28 01:10:51,319 [main] INFO  org.apache.pig.impl.util.Utils - Default bootup file /home/wukong/.pigbootup not found
2014-08-28 01:10:51,698 [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://bd11:9000
2014-08-28 01:10:52,343 [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to map-reduce job tracker at: bd11:9001
grunt> ls
hdfs://bd11:9000/user/wukong/test       <dir>
hdfs://bd11:9000/user/wukong/w05        <dir>
hdfs://bd11:9000/user/wukong/w06        <dir>
hdfs://bd11:9000/user/wukong/w07        <dir>
grunt> mkdir w08
grunt> copyFromLocal ./access_log.txt ./w08/
grunt> ls
hdfs://bd11:9000/user/wukong/test       <dir>
hdfs://bd11:9000/user/wukong/w05        <dir>
hdfs://bd11:9000/user/wukong/w06        <dir>
hdfs://bd11:9000/user/wukong/w07        <dir>
hdfs://bd11:9000/user/wukong/w08        <dir>
grunt> cd w08
grunt> ls
hdfs://bd11:9000/user/wukong/w08/access_log.txt<r 1>    7118627
grunt> A = LOAD '/user/wukong/w08/access_log.txt' USING PigStorage(' ') AS (ip);
grunt> B = GROUP A BY ip;
grunt> C = FOREACH B GENERATE group AS ip, COUNT(A.ip) AS countip; 
grunt> STORE C INTO '/user/wukong/w08/out';

执行过程日志

2014-08-28 01:13:56,741 [main] INFO  org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: GROUP_BY
2014-08-28 01:13:56,875 [main] INFO  org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer - {RULES_ENABLED=[AddForEach, ColumnMapKeyPrune, GroupByConstParallelSetter, LimitOptimizer, LoadTypeCastInserter, MergeFilter, MergeForEach, PartitionFilterOptimizer, PushDownForEachFlatten, PushUpFilter, SplitFilter, StreamTypeCastInserter], RULES_DISABLED=[FilterLogicExpressionSimplifier]}
2014-08-28 01:13:57,091 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2014-08-28 01:13:57,121 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.CombinerOptimizer - Choosing to move algebraic foreach to combiner
2014-08-28 01:13:57,178 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
2014-08-28 01:13:57,179 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
2014-08-28 01:13:57,432 [main] INFO  org.apache.pig.tools.pigstats.mapreduce.MRScriptState - Pig script settings are added to the job
2014-08-28 01:13:57,471 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2014-08-28 01:13:57,479 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Reduce phase detected, estimating # of required reducers.
2014-08-28 01:13:57,480 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Using reducer estimator: org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator
2014-08-28 01:13:57,492 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator - BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=7118627
2014-08-28 01:13:57,492 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting Parallelism to 1
2014-08-28 01:13:57,492 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - This job cannot be converted run in-process
2014-08-28 01:13:57,492 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - creating jar file Job4751117514743080762.jar
2014-08-28 01:14:01,054 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - jar file Job4751117514743080762.jar created
2014-08-28 01:14:01,077 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
2014-08-28 01:14:01,095 [main] INFO  org.apache.pig.data.SchemaTupleFrontend - Key [pig.schematuple] is false, will not generate code.
2014-08-28 01:14:01,095 [main] INFO  org.apache.pig.data.SchemaTupleFrontend - Starting process to move generated code to distributed cacche
2014-08-28 01:14:01,129 [main] INFO  org.apache.pig.data.SchemaTupleFrontend - Setting key [pig.schematuple.classes] with classes to deserialize []
2014-08-28 01:14:01,304 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
2014-08-28 01:14:01,805 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
2014-08-28 01:14:02,067 [JobControl] INFO  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2014-08-28 01:14:02,067 [JobControl] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
2014-08-28 01:14:02,109 [JobControl] INFO  org.apache.hadoop.util.NativeCodeLoader - Loaded the native-hadoop library
2014-08-28 01:14:02,109 [JobControl] WARN  org.apache.hadoop.io.compress.snappy.LoadSnappy - Snappy native library not loaded
2014-08-28 01:14:02,114 [JobControl] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 1
2014-08-28 01:14:04,382 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_201408280106_0001
2014-08-28 01:14:04,382 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Processing aliases A,B,C
2014-08-28 01:14:04,382 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - detailed locations: M: A[1,4],C[3,4],B[2,4] C: C[3,4],B[2,4] R: C[3,4]
2014-08-28 01:14:04,382 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - More information at: http://bd11:50030/jobdetails.jsp?jobid=job_201408280106_0001
2014-08-28 01:14:18,476 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 50% complete
2014-08-28 01:14:18,476 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Running jobs are [job_201408280106_0001]
2014-08-28 01:14:30,058 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Running jobs are [job_201408280106_0001]
2014-08-28 01:14:39,202 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
2014-08-28 01:14:39,210 [main] INFO  org.apache.pig.tools.pigstats.mapreduce.SimplePigStats - Script Statistics:
HadoopVersion   PigVersion      UserId  StartedAt       FinishedAt      Features
1.2.1   0.13.0  wukong  2014-08-28 01:13:57     2014-08-28 01:14:39     GROUP_BY
Success!
Job Stats (time in seconds):
JobId   Maps    Reduces MaxMapTime      MinMapTIme      AvgMapTime      MedianMapTime   MaxReduceTime   MinReduceTime   AvgReduceTime   MedianReducetime       Alias    Feature Outputs
job_201408280106_0001   1       1       6       6       6       6       11     11       11      11      A,B,C   GROUP_BY,COMBINER       /user/wukong/w08/access_log.out.txt,
Input(s):
Successfully read 28134 records (7118993 bytes) from: "/user/wukong/w08/access_log.txt"
Output(s):
Successfully stored 476 records (8051 bytes) in: "/user/wukong/w08/out"
Counters:
Total records written : 476
Total bytes written : 8051
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0
Job DAG:
job_201408280106_0001

2014-08-28 01:14:39,227 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success!

执行结果查看

[wukong@bd11 ~]$ hadoop fs -cat ./w08/out/part-r-00000
Warning: $HADOOP_HOME is deprecated.
127.0.0.1       2
1.59.65.67      2
112.4.2.19      9
112.4.2.51      80
60.2.99.33      42
省略。。。。。 
221.194.180.166 4576

感谢各位的阅读,以上就是“怎么使用Pig分析Hadoop日志”的内容了,经过本文的学习后,相信大家对怎么使用Pig分析Hadoop日志这一问题有了更深刻的体会,具体使用情况还需要大家实践验证。这里是亿速云,小编将为大家推送更多相关知识点的文章,欢迎关注!

推荐阅读:
  1. 安装Pig
  2. 使用Hadoop统计日志数据

免责声明:本站发布的内容(图片、视频和文字)以原创、转载和分享为主,文章观点不代表本网站立场,如果涉及侵权请联系站长邮箱:is@yisu.com进行举报,并提供相关证据,一经查实,将立刻删除涉嫌侵权内容。

hadooop pig

上一篇:hive和hue集成报错怎么办

下一篇:Hadoop WritableSerialization是什么

相关阅读

您好,登录后才能下订单哦!

密码登录
登录注册
其他方式登录
点击 登录注册 即表示同意《亿速云用户服务条款》