怎么使用Pig分析Hadoop日志

220.181.108.151 - - [31/Jan/2012:00:02:32 +0800] "GET /home.php?mod=space&uid=158&do=album&view=me&from=space HTTP/1.1" 200 8784 "-" "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)" 208.115.113.82 - - [31/Jan/2012:00:07:54 +0800] "GET /robots.txt HTTP/1.1" 200 582 "-" "Mozilla/5.0 (compatible; Ezooms/1.0; ezooms.bot@gmail.com )" 220.181.94.221 - - [31/Jan/2012:00:09:24 +0800] "GET /home.php?mod=spacecp&ac=pm&op=showmsg&handlekey=showmsg_3&touid=3&pmid=0&daterange=2&pid=398&tid=66 HTTP/1.1" 200 10070 "-" "Sogou web spider/4.0(+http://www.sogou.com/docs/help/webmasters.htm#07)" 112.97.24.243 - - [31/Jan/2012:00:14:48 +0800] "GET /data/cache/style_2_common.css?AZH HTTP/1.1" 200 57752 " http://f.dataguru.cn/forum-58-1.html " "Mozilla/5.0 (iPhone; CPU iPhone OS 5_0_1 like Mac OS X) AppleWebKit/534.46 (KHTML, like Gecko) Mobile/9A406" 112.97.24.243 - - [31/Jan/2012:00:14:48 +0800] "GET /data/cache/style_2_widthauto.css?AZH HTTP/1.1" 200 1024 " http://f.dataguru.cn/forum-58-1.html " "Mozilla/5.0 (iPhone; CPU iPhone OS 5_0_1 like Mac OS X) AppleWebKit/534.46 (KHTML, like Gecko) Mobile/9A406" 112.97.24.243 - - [31/Jan/2012:00:14:48 +0800] "GET /data/cache/style_2_forum_forumdisplay.css?AZH HTTP/1.1" 200 11486 " http://f.dataguru.cn/forum-58-1.html " "Mozilla/5.0 (iPhone; CPU iPhone OS 5_0_1 like Mac OS X) AppleWebKit/534.46 (KHTML, like Gecko) Mobile/9A406"

# .bash_profile # Get the aliases and functions if [ -f ~/.bashrc ]; then . ~/.bashrc fi # User specific environment and startup programs export ANT_HOME=/home/wukong/usr/apache-ant-1.9.4 export HADOOP_HOME=/home/wukong/usr/hadoop-1.2.1 export PIG_HOME=/home/wukong/usr/pig-0.13.0 export PIG_CLASSPATH=$HADOOP_HOME/conf PATH=$PATH:$HOME/bin:$ANT_HOME/bin:$HADOOP_HOME:$HADOOP_HOME/bin:$PIG_HOME/bin:$PIG_CLASSPATH export PATH

A = LOAD '/user/wukong/w08/access_log.txt' USING PigStorage(' ') AS (ip); B = GROUP A BY ip; C = FOREACH B GENERATE group AS ip, COUNT(A.ip) AS countip; STORE C INTO '/user/wukong/w08/access_log.out.txt';

[wukong@bd11 ~]$ pig -x mapreduce Warning: $HADOOP_HOME is deprecated. 14/08/28 01:10:51 INFO pig.ExecTypeProvider: Trying ExecType : LOCAL 14/08/28 01:10:51 INFO pig.ExecTypeProvider: Trying ExecType : MAPREDUCE 14/08/28 01:10:51 INFO pig.ExecTypeProvider: Picked MAPREDUCE as the ExecType 2014-08-28 01:10:51,242 [main] INFO org.apache.pig.Main - Apache Pig version 0.13.0 (r1606446) compiled Jun 29 2014, 02:29:34 2014-08-28 01:10:51,242 [main] INFO org.apache.pig.Main - Logging error messages to: /home/wukong/pig_1409159451241.log 2014-08-28 01:10:51,319 [main] INFO org.apache.pig.impl.util.Utils - Default bootup file /home/wukong/.pigbootup not found 2014-08-28 01:10:51,698 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://bd11:9000 2014-08-28 01:10:52,343 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to map-reduce job tracker at: bd11:9001 grunt> ls hdfs://bd11:9000/user/wukong/test <dir> hdfs://bd11:9000/user/wukong/w05 <dir> hdfs://bd11:9000/user/wukong/w06 <dir> hdfs://bd11:9000/user/wukong/w07 <dir> grunt> mkdir w08 grunt> copyFromLocal ./access_log.txt ./w08/ grunt> ls hdfs://bd11:9000/user/wukong/test <dir> hdfs://bd11:9000/user/wukong/w05 <dir> hdfs://bd11:9000/user/wukong/w06 <dir> hdfs://bd11:9000/user/wukong/w07 <dir> hdfs://bd11:9000/user/wukong/w08 <dir> grunt> cd w08 grunt> ls hdfs://bd11:9000/user/wukong/w08/access_log.txt<r 1> 7118627 grunt> A = LOAD '/user/wukong/w08/access_log.txt' USING PigStorage(' ') AS (ip); grunt> B = GROUP A BY ip; grunt> C = FOREACH B GENERATE group AS ip, COUNT(A.ip) AS countip; grunt> STORE C INTO '/user/wukong/w08/out';

2014-08-28 01:13:56,741 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: GROUP_BY 2014-08-28 01:13:56,875 [main] INFO org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer - {RULES_ENABLED=[AddForEach, ColumnMapKeyPrune, GroupByConstParallelSetter, LimitOptimizer, LoadTypeCastInserter, MergeFilter, MergeForEach, PartitionFilterOptimizer, PushDownForEachFlatten, PushUpFilter, SplitFilter, StreamTypeCastInserter], RULES_DISABLED=[FilterLogicExpressionSimplifier]} 2014-08-28 01:13:57,091 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false 2014-08-28 01:13:57,121 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.CombinerOptimizer - Choosing to move algebraic foreach to combiner 2014-08-28 01:13:57,178 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1 2014-08-28 01:13:57,179 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1 2014-08-28 01:13:57,432 [main] INFO org.apache.pig.tools.pigstats.mapreduce.MRScriptState - Pig script settings are added to the job 2014-08-28 01:13:57,471 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3 2014-08-28 01:13:57,479 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Reduce phase detected, estimating # of required reducers. 2014-08-28 01:13:57,480 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Using reducer estimator: org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator 2014-08-28 01:13:57,492 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator - BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=7118627 2014-08-28 01:13:57,492 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting Parallelism to 1 2014-08-28 01:13:57,492 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - This job cannot be converted run in-process 2014-08-28 01:13:57,492 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - creating jar file Job4751117514743080762.jar 2014-08-28 01:14:01,054 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - jar file Job4751117514743080762.jar created 2014-08-28 01:14:01,077 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job 2014-08-28 01:14:01,095 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Key [pig.schematuple] is false, will not generate code. 2014-08-28 01:14:01,095 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Starting process to move generated code to distributed cacche 2014-08-28 01:14:01,129 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Setting key [pig.schematuple.classes] with classes to deserialize [] 2014-08-28 01:14:01,304 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission. 2014-08-28 01:14:01,805 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete 2014-08-28 01:14:02,067 [JobControl] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1 2014-08-28 01:14:02,067 [JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1 2014-08-28 01:14:02,109 [JobControl] INFO org.apache.hadoop.util.NativeCodeLoader - Loaded the native-hadoop library 2014-08-28 01:14:02,109 [JobControl] WARN org.apache.hadoop.io.compress.snappy.LoadSnappy - Snappy native library not loaded 2014-08-28 01:14:02,114 [JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 1 2014-08-28 01:14:04,382 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_201408280106_0001 2014-08-28 01:14:04,382 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Processing aliases A,B,C 2014-08-28 01:14:04,382 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - detailed locations: M: A[1,4],C[3,4],B[2,4] C: C[3,4],B[2,4] R: C[3,4] 2014-08-28 01:14:04,382 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - More information at: http://bd11:50030/jobdetails.jsp?jobid=job_201408280106_0001 2014-08-28 01:14:18,476 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 50% complete 2014-08-28 01:14:18,476 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Running jobs are [job_201408280106_0001] 2014-08-28 01:14:30,058 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Running jobs are [job_201408280106_0001] 2014-08-28 01:14:39,202 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete 2014-08-28 01:14:39,210 [main] INFO org.apache.pig.tools.pigstats.mapreduce.SimplePigStats - Script Statistics: HadoopVersion PigVersion UserId StartedAt FinishedAt Features 1.2.1 0.13.0 wukong 2014-08-28 01:13:57 2014-08-28 01:14:39 GROUP_BY Success! Job Stats (time in seconds): JobId Maps Reduces MaxMapTime MinMapTIme AvgMapTime MedianMapTime MaxReduceTime MinReduceTime AvgReduceTime MedianReducetime Alias Feature Outputs job_201408280106_0001 1 1 6 6 6 6 11 11 11 11 A,B,C GROUP_BY,COMBINER /user/wukong/w08/access_log.out.txt, Input(s): Successfully read 28134 records (7118993 bytes) from: "/user/wukong/w08/access_log.txt" Output(s): Successfully stored 476 records (8051 bytes) in: "/user/wukong/w08/out" Counters: Total records written : 476 Total bytes written : 8051 Spillable Memory Manager spill count : 0 Total bags proactively spilled: 0 Total records proactively spilled: 0 Job DAG: job_201408280106_0001 2014-08-28 01:14:39,227 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success!

[wukong@bd11 ~]$ hadoop fs -cat ./w08/out/part-r-00000 Warning: $HADOOP_HOME is deprecated. 127.0.0.1 2 1.59.65.67 2 112.4.2.19 9 112.4.2.51 80 60.2.99.33 42 省略。。。。。 221.194.180.166 4576

目标

待分析文件

环境配置

Pig脚本

执行过程

执行过程日志

执行结果查看

相关阅读