sed使用小结

发布时间：2020-08-04 03:46:41 作者：470499989
来源：网络阅读：870

对于行处理的工具，sed绝对是首选的。
sed处理文本过程如下：

sed使用小结

sed一次处理一行内容,处理时,先读入一行,去掉尾部换行符,存入pattern space,执行编辑命令.  
处理完毕,除非加了-n参数,把现在的pattern space打印出来,在后边打印曾去掉的换行符.  
把pattern space内容给hold space,把pattern space置空,接着读下一行,处理下一行.

这让人感觉很抽象，sedsed的工具就是为了展现sed处理过程的。sedsed安装和使用都很简单，为了更好理解sed，装上sedsed工具是很有必要的。

wget http://aurelio.net/sedsed/sedsed-1.0   
touch /usr/local/bin/sedsed   
cat sedsed-1.0 >> /usr/local/bin/sedsed   
chmod 755 /usr/local/bin/sedsed

用法如下

usage: sedsed OPTION [-e sedscript] [-f sedscriptfile] [inputfile]  
OPTIONS:  
 
     -f, --file          add file contents to the commands to be parsed  
     -e, --expression    add the script to the commands to be parsed  
     -n, --quiet         suppress automatic printing of pattern space  
         --silent        alias to --quiet  
 
     -d, --debug         debug the sed script  
         --hide          hide some debug info (options: PATT,HOLD,COMM)  
         --color         shows debug output in colors (default: ON)  
         --nocolor       no colors on debug output  
         --dump-debug    dumps to screen the debugged sed script  
 
         --emu           emulates GNU sed (INCOMPLETE)  
         --emudebug      emulates GNU sed debugging the sed script (INCOMPLETE)  
 
     -i, --indent        script beautifier, prints indented and  
                         one-command-per-line output do STDOUT  
         --prefix        indent prefix string (default: 4 spaces)  
 
     -t, --tokenize      script tokenizer, prints extensive  
                         command by command information  
     -H, --htmlize       converts sed script to a colorful HTML page  
       
-f选项和sed的-f选项一样  
-d打开debug，其中--hide表示隐藏指定的内容;如--hide=hold表示隐藏的保留空间缓冲区的内容  
-i 的--indent 格式化复杂的sed脚本变成更加人性化的脚本

先看个简单的例子：
[root@localhost ~]# cat 1.txt
linux
centos
redhat
linux
ubuntu
fedora
把linux换成debian，下面给出三种处理方式：sed处理；sedsed -d 和sedsed -d --hide

[root@localhost ~]# sed 's/linux/debian/g' 1.txt   
debian  
centos  
redhat  
debian  
ubuntu  
fedora  
[root@localhost ~]# sedsed -d  's/linux/debian/g' 1.txt   
PATT:linux$  
HOLD:$  
COMM:s/linux/debian/g  
PATT:debian$  
HOLD:$  
debian  
PATT:centos$  
HOLD:$  
COMM:s/linux/debian/g  
PATT:centos$  
HOLD:$  
centos  
PATT:redhat$  
HOLD:$  
COMM:s/linux/debian/g  
PATT:redhat$  
HOLD:$  
redhat  
PATT:linux$  
HOLD:$  
COMM:s/linux/debian/g  
PATT:debian$  
HOLD:$  
debian  
PATT:ubuntu$  
HOLD:$  
COMM:s/linux/debian/g  
PATT:ubuntu$  
HOLD:$  
ubuntu  
PATT:fedora$  
HOLD:$  
COMM:s/linux/debian/g  
PATT:fedora$  
HOLD:$  
fedora  
[root@localhost ~]# sedsed -d  --hide=hold 's/linux/debian/g' 1.txt   
PATT:linux$  
COMM:s/linux/debian/g  
PATT:debian$  
debian  
PATT:centos$  
COMM:s/linux/debian/g  
PATT:centos$  
centos  
PATT:redhat$  
COMM:s/linux/debian/g  
PATT:redhat$  
redhat  
PATT:linux$  
COMM:s/linux/debian/g  
PATT:debian$  
debian  
PATT:ubuntu$  
COMM:s/linux/debian/g  
PATT:ubuntu$  
ubuntu  
PATT:fedora$  
COMM:s/linux/debian/g  
PATT:fedora$  
fedora

其中：
PATT：sedsed输出显示模式空间缓冲区的内容
COMM：显示正在执行的命令
HOLD：显示hold sapce缓冲区的内容

sed的增删改查
sed用于删除操作的是：d和D
命令"d"作用是删除模式空间的内容,然后读入新的行,sed脚本从头再次开始执行.
命令"D"的不同之处在于它删除的是直到第一个内嵌换行符为止的模式空间的一部分,但是不会读入新的行,脚本将回到开始对剩下内容进行处理.
看个例子

[root@localhost ~]# cat 2.txt   
This line is followed by 1 blank line.  
 
This line is followed by 2 blank line.  
 
 
This line is followed by 3 blank line.  
 
 
 
This line is followed by 4 blank line.  
 
 
 
 
This is the end.

[root@localhost ~]# sed '/^$/{N;/^\n$/D}' 2.txt   
This line is followed by 1 blank line.  
 
This line is followed by 2 blank line.  
 
This line is followed by 3 blank line.  
 
This line is followed by 4 blank line.  
 
This is the end.  
[root@localhost ~]# sed '/^$/{N;/^\n$/d}' 2.txt   
This line is followed by 1 blank line.  
 
This line is followed by 2 blank line.  
This line is followed by 3 blank line.  
 
This line is followed by 4 blank line.  
This is the end.

用sedsed 打开debug看看执行过程 //后面的内容是我注释的

[root@localhost ~]# sedsed -d  '/^$/{N;/^\n$/D}' 2.txt   
PATT:This line is followed by 1 blank line.$  //pattern空间读入第一行内容  
HOLD:$                                        //hold空间开始为空  
COMM:/^$/ {            //正在执行的命令,判断是否为空行，很明显不是，所有不执行后面的命令，执行的结果送往屏幕并把结果给hold space.  
PATT:This line is followed by 1 blank line.$  
HOLD:$  
This line is followed by 1 blank line.  
PATT:$                //pattern空间读入第二行  
HOLD:$                //hold空间开始还是为空  
COMM:/^$/ {           //正在执行的命令,判断是否为空行，很明显是，所有执行后面的命令  
COMM:N                //执行N读取下一行进入pattern空间  
PATT:\nThis line is followed by 2 blank line.$  
HOLD:$               //此时hold空间还是为空  
COMM:/^\n$/ D        //对pattern空间继续执行后面的命令：如果是空行，执行D命令，很明显不是，所有不执行。  
PATT:\nThis line is followed by 2 blank line.$  //pattern空间内容  
HOLD:$              //hold空间内容，任然为空。  
COMM:}  
PATT:\nThis line is followed by 2 blank line.$  
HOLD:$  
 
This line is followed by 2 blank line.  //由于没有满足执行条件，继续读取下一行。  
PATT:$           //空行,满足执行条件，执行命令  
HOLD:$  
COMM:/^$/ {      
COMM:N  
PATT:\n$  
HOLD:$  
COMM:/^\n$/ D    //满足执行D命令条件，删除一空行（现在有两空行）  
PATT:$  
HOLD:$  
COMM:/^$/ {  
COMM:N  
PATT:\nThis line is followed by 3 blank line.$  
HOLD:$  
COMM:/^\n$/ D  
PATT:\nThis line is followed by 3 blank line.$  
HOLD:$  
COMM:}  
PATT:\nThis line is followed by 3 blank line.$  
HOLD:$  
 
This line is followed by 3 blank line.  
PATT:$         \\空行，满足命令执行条件，执行命令，读取下一行.  
HOLD:$  
COMM:/^$/ {  
COMM:N  
PATT:\n$  
HOLD:$  
COMM:/^\n$/ D  \\满足执行D命令，删除一空行.  
PATT:$         \\空行，满足命令执行条件，执行命令，读取下一行.  
HOLD:$  
COMM:/^$/ {  
COMM:N  
PATT:\n$  
HOLD:$  
COMM:/^\n$/ D  \\ \\满足执行D命令，删除一空行.  
PATT:$  
HOLD:$  
COMM:/^$/ {  
COMM:N  
PATT:\nThis line is followed by 4 blank line.$  
HOLD:$  
COMM:/^\n$/ D  
PATT:\nThis line is followed by 4 blank line.$  
HOLD:$  
COMM:}  
PATT:\nThis line is followed by 4 blank line.$  
HOLD:$  
 
This line is followed by 4 blank line.  
PATT:$  
HOLD:$  
COMM:/^$/ {  
COMM:N  
PATT:\n$  
HOLD:$  
COMM:/^\n$/ D  
PATT:$  
HOLD:$  
COMM:/^$/ {  
COMM:N  
PATT:\n$  
HOLD:$  
COMM:/^\n$/ D  
PATT:$  
HOLD:$  
COMM:/^$/ {  
COMM:N  
PATT:\n$  
HOLD:$  
COMM:/^\n$/ D  
PATT:$  
HOLD:$  
COMM:/^$/ {  
COMM:N  
PATT:\nThis is the end.$  
HOLD:$  
COMM:/^\n$/ D  
PATT:\nThis is the end.$  
HOLD:$  
COMM:}  
PATT:\nThis is the end.$  
HOLD:$  
 
This is the end.  
 
[root@localhost ~]# sedsed -d  '/^$/{N;/^\n$/d}' 2.txt   
PATT:This line is followed by 1 blank line.$  
HOLD:$  
COMM:/^$/ {  
PATT:This line is followed by 1 blank line.$  
HOLD:$  
This line is followed by 1 blank line.  
PATT:$  
HOLD:$  
COMM:/^$/ {  
COMM:N  
PATT:\nThis line is followed by 2 blank line.$  
HOLD:$  
COMM:/^\n$/ d  
PATT:\nThis line is followed by 2 blank line.$  
HOLD:$  
COMM:}  
PATT:\nThis line is followed by 2 blank line.$  
HOLD:$  
 
This line is followed by 2 blank line.  
PATT:$          \\空行，满足命令执行条件，执行命令，读取下一行.  
HOLD:$  
COMM:/^$/ {  
COMM:N  
PATT:\n$  
HOLD:$  
COMM:/^\n$/ d   \\满足执行d命令条件，此时模式空间的两个空行都被删除.  
PATT:This line is followed by 3 blank line.$  
HOLD:$  
COMM:/^$/ {  
PATT:This line is followed by 3 blank line.$  
HOLD:$  
This line is followed by 3 blank line.  
PATT:$  
HOLD:$  
COMM:/^$/ {  
COMM:N  
PATT:\n$  
HOLD:$  
COMM:/^\n$/ d  
PATT:$  
HOLD:$  
COMM:/^$/ {  
COMM:N  
PATT:\nThis line is followed by 4 blank line.$  
HOLD:$  
COMM:/^\n$/ d  
PATT:\nThis line is followed by 4 blank line.$  
HOLD:$  
COMM:}  
PATT:\nThis line is followed by 4 blank line.$  
HOLD:$  
 
This line is followed by 4 blank line.  
PATT:$  
HOLD:$  
COMM:/^$/ {  
COMM:N  
PATT:\n$  
HOLD:$  
COMM:/^\n$/ d  
PATT:$  
HOLD:$  
COMM:/^$/ {  
COMM:N  
PATT:\n$  
HOLD:$  
COMM:/^\n$/ d  
PATT:This is the end.$  
HOLD:$  
COMM:/^$/ {  
PATT:This is the end.$  
HOLD:$  
This is the end.

一般情况下，都用d来删除，如：

删除空行：  
sed '/^\s*$/d' filename  
sed '/^[[:space:]]*$/d' filename   
用行标示号来删除  
sed  'n1d'  filename   删除第n1行  
sed 'n1,n2d' filename  删除第n1行到n2行间内容（n1<=n2）  
sed  '5,$d' filename   删除第5行以后内容内容  
用特殊匹配来删除，格式如下  
sed '/regular_pattern/d' filename

sed基于行的插入和替换操作是由：a\,i\,c\来完成

a\命令是追加命令,追加将添加新文本到文件中当前行（即读入模式缓冲区中的行）的后面.所追加的文本行位于sed命令的下方另起一行.如果要追加的内容超过一行,则每一行都必须以反斜线结束,最后一行除外.最后一行将以引号和文件名结束.  
i\ 命令是在当前行的前面插入新的文本.  
c\ 用新的文本改变本行的文本

a\和i\用法比较简单，看一个c\的例子

[root@localhost ~]# cat 1.txt   
linux  
centos  
redhat  
linux  
ubuntu linux  
fedora  
[root@localhost ~]# sedsed -d '/linux/c\unix' 1.txt   
PATT:linux$  
HOLD:$  
COMM:/linux/ c\\N\unix  
unix  
PATT:centos$  
HOLD:$  
COMM:/linux/ c\\N\unix  
PATT:centos$  
HOLD:$  
centos  
PATT:redhat$  
HOLD:$  
COMM:/linux/ c\\N\unix  
PATT:redhat$  
HOLD:$  
redhat  
PATT:linux$  
HOLD:$  
COMM:/linux/ c\\N\unix  
unix  
PATT:ubuntu linux$  
HOLD:$  
COMM:/linux/ c\\N\unix  
unix  
PATT:fedora$  
HOLD:$  
COMM:/linux/ c\\N\unix  
PATT:fedora$  
HOLD:$  
fedora

sed的替换操作是用的最多的,用sed处理内容,最重要的就是要找特征数据（字符）,并以此为基础处理内容而不会处理的内容过多或者处理不完全.
如最开始的举的把linux换成debian的例子.

sed 的高级运用
打印匹配行号

[root@localhost ~]# cat 1.txt   
linux  
centos  
redhat  
linux  
ubuntu linux  
fedora  
[root@localhost ~]# sed -n '/^linux/=' 1.txt  
1  
4  
[root@localhost ~]# sed -n '/^linux/{=;p}' 1.txt  
1  
linux  
4  
linux  
[root@localhost ~]# sedsed -d  -n  '/^linux/{=;p}' 1.txt  
PATT:linux$  
HOLD:$  
COMM:/^linux/ {  
COMM:=  
1 
PATT:linux$  
HOLD:$  
COMM:p  
linux  
PATT:linux$  
HOLD:$  
COMM:}  
PATT:linux$  
HOLD:$  
PATT:centos$  
HOLD:$  
COMM:/^linux/ {  
PATT:centos$  
HOLD:$  
PATT:redhat$  
HOLD:$  
COMM:/^linux/ {  
PATT:redhat$  
HOLD:$  
PATT:linux$  
HOLD:$  
COMM:/^linux/ {  
COMM:=  
4 
PATT:linux$  
HOLD:$  
COMM:p  
linux  
PATT:linux$  
HOLD:$  
COMM:}  
PATT:linux$  
HOLD:$  
PATT:ubuntu linux$  
HOLD:$  
COMM:/^linux/ {  
PATT:ubuntu linux$  
HOLD:$  
PATT:fedora$  
HOLD:$  
COMM:/^linux/ {  
PATT:fedora$  
HOLD:$  
[root@localhost ~]# sedsed -d   '/^linux/{=;p}' 1.txt  
PATT:linux$  
HOLD:$  
COMM:/^linux/ {  
COMM:=  
1 
PATT:linux$  
HOLD:$  
COMM:p  
linux  
PATT:linux$  
HOLD:$  
COMM:}  
PATT:linux$  
HOLD:$  
linux  
PATT:centos$  
HOLD:$  
COMM:/^linux/ {  
PATT:centos$  
HOLD:$  
centos  
PATT:redhat$  
HOLD:$  
COMM:/^linux/ {  
PATT:redhat$  
HOLD:$  
redhat  
PATT:linux$  
HOLD:$  
COMM:/^linux/ {  
COMM:=  
4 
PATT:linux$  
HOLD:$  
COMM:p  
linux  
PATT:linux$  
HOLD:$  
COMM:}  
PATT:linux$  
HOLD:$  
linux  
PATT:ubuntu linux$  
HOLD:$  
COMM:/^linux/ {  
PATT:ubuntu linux$  
HOLD:$  
ubuntu linux  
PATT:fedora$  
HOLD:$  
COMM:/^linux/ {  
PATT:fedora$  
HOLD:$  
fedora

上面sedsed的调试显示了sed -n的参数的实际实现：禁止自动打印模式空间内容.

sed 多行处理

sed多行处理是通过n和N来实现  
多行Next(N)命令是相对next(n)命令的,后者将模式空间中的内容输出,然后把下一行读入模式空间,但是脚本并不会转移到开始而是从当前的n命令之后开始执行; 而前者则保存原来模式空间中的内容，再把新的一行读入，两者之间依靠一个换行符"\n"来分隔。在N命令执行后，控制流将继续用N命令以后的命令对模式空间进行处理.  
值得注意的是,在多行模式中,特殊字符"^"和"$"匹配的是模式空间的最开始与最末尾,而不是内嵌"\n"的开始与末尾.

如下现在要将"Owner and Operator Guide"替换为"Installation Guide"：

[root@localhost ~]# cat 3.txt   
Consult Section 3.1 in the Owner and Operator  
Guide for a description of the tape drives  
available on your system.  
[root@localhost ~]# sedsed -d --hide=hold '/Operator$/{n;s/Owner and Operator\nGuide /Installation Guide\n/}' 3.txt  
PATT:Consult Section 3.1 in the Owner and Operator$  
COMM:/Operator$/ {  
COMM:n  
Consult Section 3.1 in the Owner and Operator  
PATT:Guide for a description of the tape drives$  
COMM:s/Owner and Operator\nGuide /nstallation Guide\n/  
PATT:Guide for a description of the tape drives$  
COMM:}  
PATT:Guide for a description of the tape drives$  
Guide for a description of the tape drives  
PATT:available on your system.$  
COMM:/Operator$/ {  
PATT:available on your system.$  
available on your system.  
[root@localhost ~]# sedsed -d --hide=hold '/Operator$/{N;s/Owner and Operator\nGuide /Installation Guide\n/}' 3.txt  
PATT:Consult Section 3.1 in the Owner and Operator$  
COMM:/Operator$/ {  
COMM:N  
PATT:Consult Section 3.1 in the Owner and Operator\nGuide for a descr\  
iption of the tape drives$  
COMM:s/Owner and Operator\nGuide /nstallation Guide\n/  
PATT:Consult Section 3.1 in the nstallation Guide\nfor a description \  
of the tape drives$  
COMM:}  
PATT:Consult Section 3.1 in the nstallation Guide\nfor a description \  
of the tape drives$  
Consult Section 3.1 in the nstallation Guide  
for a description of the tape drives  
PATT:available on your system.$  
COMM:/Operator$/ {  
PATT:available on your system.$  
available on your system.

sed用于把文本读入模式空间或者模式空间的内容写入到指定文本中.
sed把模式空间的内容写入到指定文本中：由w和W实现
w filename：Write the current pattern space to filename.
W filename：Write the first line of the current pattern space to filename.
如下:

[root@localhost ~]# cat 1.txt   
linux server  
centos  
redhat  
linux web  
ubuntu linux  
fedora  
[root@localhost ~]# sed  -n '/^linux/,/^linux/w a.txt' 1.txt   
[root@localhost ~]# cat a.txt  
linux server  
centos  
redhat  
linux web

sed把文本读入模式空间由r实现

[root@localhost ~]# cat 1.txt   
linux server  
centos  
redhat  
linux web  
ubuntu linux  
fedora  
[root@localhost ~]# sed  '/centos/r /root/1.txt' 1.txt   
linux server  
centos  
linux server  
centos  
redhat  
linux web  
ubuntu linux  
fedora  
redhat  
linux web  
ubuntu linux  
fedora

sed高级流控制

b 分支：无条件转移 ,分支到脚本中带有标记的地方,如果分支不存在则分支到脚本的末尾.  
t 有条件的转移,if分支,从最后一行开始,条件一旦满足或者T，t命令,将导致分支到带有标号的命令处,或者到脚本的末尾.  
T 有条件的转移,错误分支,从最后一行开始,一旦发生错误或者T,t命令,将导致分支到带有标号的命令处,或者到脚本的末尾.

[root@localhost ~]# cat 4.txt
a b c a d a a a
s d d d x s a
h j s a s h j h
j d f j a s j k j
要求：删除行内与第一列字符重复的字符，shell、sed、awk各写一个。达到这个结果：
a b c d
s d d d x a
h j s a s j
j d f a s k
这个例子来自http://blog.chinaunix.net/uid-10540984-id-3086644.html

[root@localhost ~]# sed ':a;s/^\(.\)\(.*\) \1/\1\2/;ta' 4.txt   
a b c d  
s d d d x a  
h j s a s j  
j d f a s k  
[root@localhost ~]# sedsed -d --hide=hold ':a;s/^\(.\)\(.*\) \1/\1\2/;ta' 4.txt   
PATT:a b c a d a a a$  
COMM::a  
COMM:s/^\(.\)\(.*\) \1/\1\2/  
PATT:a b c a d a a$  
COMM:t a  
COMM:s/^\(.\)\(.*\) \1/\1\2/  
PATT:a b c a d a$  
COMM:t a  
COMM:s/^\(.\)\(.*\) \1/\1\2/  
PATT:a b c a d$  
COMM:t a  
COMM:s/^\(.\)\(.*\) \1/\1\2/  
PATT:a b c d$  
COMM:t a  
COMM:s/^\(.\)\(.*\) \1/\1\2/  
PATT:a b c d$  
COMM:t a  
PATT:a b c d$  
a b c d  
PATT:s d d d x s a$  
COMM::a  
COMM:s/^\(.\)\(.*\) \1/\1\2/  
PATT:s d d d x a$  
COMM:t a  
COMM:s/^\(.\)\(.*\) \1/\1\2/  
PATT:s d d d x a$  
COMM:t a  
PATT:s d d d x a$  
s d d d x a  
PATT:h j s a s h j h$  
COMM::a  
COMM:s/^\(.\)\(.*\) \1/\1\2/  
PATT:h j s a s h j$  
COMM:t a  
COMM:s/^\(.\)\(.*\) \1/\1\2/  
PATT:h j s a s j$  
COMM:t a  
COMM:s/^\(.\)\(.*\) \1/\1\2/  
PATT:h j s a s j$  
COMM:t a  
PATT:h j s a s j$  
h j s a s j  
PATT:j d f j a s j k j$  
COMM::a  
COMM:s/^\(.\)\(.*\) \1/\1\2/  
PATT:j d f j a s j k$  
COMM:t a  
COMM:s/^\(.\)\(.*\) \1/\1\2/  
PATT:j d f j a s k$  
COMM:t a  
COMM:s/^\(.\)\(.*\) \1/\1\2/  
PATT:j d f a s k$  
COMM:t a  
COMM:s/^\(.\)\(.*\) \1/\1\2/  
PATT:j d f a s k$  
COMM:t a  
PATT:j d f a s k$  
j d f a s k  
 
附加两种其他的解法  
while read a b;do echo "$a ${b// $a}";done <4.txt 
awk '{a=$1;gsub(" ?"a,"");print a""$0}' 4.txt

sed使用小结

相关阅读