[转]grep 在文本中查找内容

程序员文章站 2022-06-24 19:43:47

...

转自： http://www.lampweb.org/linux/3/27.html

功能：grep系列是Linux中使用频率最高的文本查找命令。主要功能在一个或者多个文件中查找特定模式的字符串。如果该行有匹配的字符串，则输出整个行的内容。如果没有匹配的内容，则不输出任何内容。grep命令不改动源文件。
Linux的grep家族包括grep、egrep、fgrep、rgrep。grep可以通过-G、-E、-F命令行选项来使用egrep和fgrep的功能。

语法：grep    [选项]   PATTERN  [FILE]

在每个 FILE 或是标准输入中查找 PATTERN。默认的 PATTERN 是一个基本正则表达式(缩写为 BRE)。
例如: grep -i 'hello world' menu.h main.c

FILE 文件控制
-B	--before-context=NUM	打印以文本起始的NUM 行
-A	--after-context=NUM	打印以文本结尾的NUM 行
-C	--context=NUM	打印输出文本NUM 行
-NUM		等同--context=NUM
	--color[=WHEN] --colour[=WHEN]	高亮颜色突出显示搜索的字符串。值'always', 'never', or 'auto'。
-U	--binary	将文件作为二进制文件处理。仅有MS-DOS和MS-Windows支持该选项
-u	--unix-byte-offsets	报告UNIX风格的字节偏移。这个选项仅在同时使用-b选项的情况下才有效；仅有MS-DOS和MS-Windows支持该选项
与 PATTERN 正则表达式相关的选项
-E	--extended-regexp	PATTERN 是一个可扩展的正则表达式(缩写为 ERE)
-F	--fixed-strings	PATTERN 是一组由断行符分隔的定长字符串
-G	--basic-regexp	PATTERN 是一个基本正则表达式(缩写为 BRE)
-P	--perl-regexp	PATTERN 是一个 Perl 正则表达式
-e	--regexp= PATTERN	用 PATTERN 来进行匹配操作
-f	--file=FILE	从 FILE 中取得 PATTERN
-i	--ignore-case	忽略大小写
-w	--line-regexp	强制 PATTERN 仅完全匹配字词
-x	--extended-regexp	强制 PATTERN 仅完全匹配一行
-z	--null-data	一个 0 字节的数据行，但不是空行
输出控制选项
-m	--max-count=NUM	在找到指定数量的匹配行后停止读文件
-b	--byte-offset	在显示符合样式的那一行之前，标示出该行第一个字符的编号
-n	--line-number	在显示符合样式的那一行之前，标示出该行的列数编号
	--line-buffered	刷新输出的每一行
-H	--with-filename	在显示符合样式的那一行之前，表示该行所属的文件名称
-h	--no-filename	在显示符合样式的那一行之前，不标示该行所属的文件名称
	--label=LABEL	打印标签作为文件名的标准输入(主要用于管道处理) 例如：cat test \|grep --label=test -H 123
-o	--only-matching	仅输出匹配行的匹配部分
-q	--quiet --silent	抑制所有正常输出
	--binary-files=TYPE	假定二进制文件为TYPE类型文件TYPE可以为binary、text或without-match
-a	--text	等价于-binary-files=text
-I	--binary-files=without-match	等价于--binary-files=without-match
-d	--directories=ACTION	当grep的对象为目录时用，处理目录可以读取、递归或跳过
-D	--devices=ACTION	当grep的对象为处理设备、栈或套接字时必须用，处理对象可以读取或跳过
-r -R	--recursive --directories=recurse	相当于--directories=recurse 遍历目录
	--include=FILE_PATTERN	仅grep匹配的文件模式的文件
	--exclude=FILE_PATTERN	跳过匹配的文件模式的文件和目录进行grep匹配
	--exclude-from=FILE	跳过任一匹配文件模式的文件
	---exclude-dir=PATTERN	跳过匹配的目录文件目录
-L	--files-without-match	仅仅打印未匹配的文件的文件名
-l	--files-with-matches	仅仅打印匹配的文件的文件名
-c	--count	仅仅打印每个文件的匹配次数
-T	--initial-tab	将标签排队（标签即文件名）
-Z	--null	打印文件名，文件名与匹配行中间没有空字节 -z与-Z的区别之一：当一个文件有多个匹配行时-z只打印一次文件名，而-Z每匹配一次打印一次文件名
杂项
-s	--no-messages	不显示错误信息
-v	--invert-match	打印不匹配的行

‘egrep’即‘grep -E’。‘fgrep’即‘grep -F’。直接使用‘egrep’或是‘fgrep’均已不可行了。
不带 FILE 参数，或是 FILE 为 -，将读取标准输入。如果少于两个 FILE 参数。
就要默认使用 -h 参数。如果选中任意一行，那退出状态为 0，否则为 1；如果有错误产生，且未指定 -q 参数，那退出状态为 2。

grep 实例

查找指定进程 
[[email protected] ~]# ps -ef|grep svn 
root 4943   1      0  Dec05 ?   00:00:00 svnserve -d -r /opt/svndata/grape/ 
root 16867 16838  0 19:53 pts/0    00:00:00 grep svn 
[[email protected] ~]# 
第一条记录是查找出的进程；第二条结果是grep进程本身，并非真正要找的进程。 
 
 
查找指定进程个数 
[[email protected] ~]# ps -ef|grep svn -c 
2 
[[email protected] ~]# ps -ef|grep -c svn  
2 
[[email protected] ~]# 
 
 
 
从文件中读取关键词进行搜索 
[[email protected] test]# cat test.txt 
hnlinux 
peida.cnblogs.com 
ubuntu 
ubuntu linux 
redhat 
Redhat 
linuxmint 
[[email protected] test]# cat test2.txt 
linux 
Redhat 
[[email protected] test]# cat test.txt | grep -f test2.txt 
hnlinux 
ubuntu linux 
Redhat 
linuxmint 
[[email protected] test]# 
输出test.txt文件中含有从test2.txt文件中读取出的关键词的内容行 
 
 
从文件中读取关键词进行搜索且显示行号 
[[email protected] test]# cat test.txt 
hnlinux 
peida.cnblogs.com 
ubuntu 
ubuntu linux 
redhat 
Redhat 
linuxmint 
[[email protected] test]# cat test2.txt  
linux 
Redhat 
[[email protected] test]# cat test.txt | grep -nf test2.txt 
1:hnlinux 
4:ubuntu linux 
6:Redhat 
7:linuxmint 
[[email protected] test]# 
输出test.txt文件中含有从test2.txt文件中读取出的关键词的内容行，并显示每一行的行号 
 
 
从文件中查找关键词 
[[email protected] test]# grep 'linux' test.txt 
hnlinux 
ubuntu linux 
linuxmint 
[[email protected] test]# grep -n 'linux' test.txt  
1:hnlinux 
4:ubuntu linux 
7:linuxmint 
 
 
 
从多个文件中查找关键词 
[[email protected] test]# grep -n 'linux' test.txt test2.txt 
test.txt:1:hnlinux 
test.txt:4:ubuntu linux 
test.txt:7:linuxmint 
test2.txt:1:linux 
[[email protected] test]# grep 'linux' test.txt test2.txt 
test.txt:hnlinux 
test.txt:ubuntu linux 
test.txt:linuxmint 
test2.txt:linux 
多文件时，输出查询到的信息内容行时，会把文件的命名在行最前面输出并且加上":"作为标示符 
 
 
 
grep不显示本身进程 
[[email protected] test]# ps aux|grep ssh 
root   2720  0.0  0.0  62656  1212 ?      Ss   Nov02   0:00 /usr/sbin/sshd 
root  16834  0.0  0.0  88088  3288 ?      Ss   19:53   0:00 sshd: [email protected]/0 
root  16901  0.0  0.0  61180   764 pts/0  S+   20:31   0:00 grep ssh 
[[email protected] test]# ps aux|grep \[s]sh] 
[[email protected] test]# ps aux|grep \[s]sh 
root   2720  0.0  0.0  62656  1212 ?      Ss   Nov02   0:00 /usr/sbin/sshd 
root  16834  0.0  0.0  88088  3288 ?      Ss   19:53   0:00 sshd: [email protected]/0 
[[email protected] test]# ps aux | grep ssh | grep -v "grep" 
root   2720  0.0  0.0  62656  1212 ?      Ss   Nov02   0:00 /usr/sbin/sshd 
root  16834  0.0  0.0  88088  3288 ?      Ss   19:53   0:00 sshd: [email protected]/0 
 
 
 
找出已u开头的行内容 
[[email protected] test]# cat test.txt |grep ^u 
ubuntu 
ubuntu linux 
[[email protected] test]# 
 
 
 
输出非u开头的行内容 
[[email protected] test]# cat test.txt |grep ^[^u] 
hnlinux 
peida.cnblogs.com 
redhat 
Redhat 
linuxmint 
[[email protected] test]# 
 
 
 
输出以hat结尾的行内容 
[[email protected] test]# cat test.txt |grep hat$ 
redhat 
Redhat 
[[email protected] test]# 
 
 
[[email protected] test]# ifconfig eth0|grep "[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}" 
          inet addr:192.168.120.204  Bcast:192.168.120.255  Mask:255.255.255.0 
[[email protected] test]# ifconfig eth0|grep -E "([0-9]{1,3}\.){3}[0-9]" 
          inet addr:192.168.120.204  Bcast:192.168.120.255  Mask:255.255.255.0 
[[email protected] test]# 
 
 
 
显示包含ed或者at字符的内容行 
[[email protected] test]# cat test.txt |grep -E "peida|com" 
peida.cnblogs.com 
[[email protected] test]# cat test.txt |grep -E "ed|at" 
redhat 
Redhat 
[[email protected] test]# 
 
 
 
显示当前目录下面以.txt 结尾的文件中的所有包含每个字符串至少有7个连续小写字符的字符串的行 
[[email protected] test]# grep '[a-z]\{7\}' *.txt 
test.txt:hnlinux 
test.txt:peida.cnblogs.com 
test.txt:linuxmint 
[[email protected] test]#

grep 常用方式

将/etc/passwd，有出现 root 的行取出来 
[[email protected] ~]# grep root /etc/passwd 
root:x:0:0:root:/root:/bin/bash 
operator:x:11:0:operator:/root:/sbin/nologin 
 
[[email protected] ~]# cat /etc/passwd | grep root  
root:x:0:0:root:/root:/bin/bash 
operator:x:11:0:operator:/root:/sbin/nologin 
 
 
将/etc/passwd，有出现 root 的行取出来,同时显示这些行在/etc/passwd的行号 
[[email protected] ~]# grep -n root /etc/passwd 
1:root:x:0:0:root:/root:/bin/bash 
30:operator:x:11:0:operator:/root:/sbin/nologin 
 
 
将/etc/passwd，将没有出现 root 的行取出来 
[[email protected] ~]# grep -v root /etc/passwd 
root:x:0:0:root:/root:/bin/bash 
operator:x:11:0:operator:/root:/sbin/nologin 
 
 
将/etc/passwd，将没有出现 root 和nologin的行取出来 
[[email protected] ~]# grep -v root /etc/passwd | grep -v nologin 
root:x:0:0:root:/root:/bin/bash 
operator:x:11:0:operator:/root:/sbin/nologin 
 
 
用 dmesg 列出核心信息，再以 grep 找出内含 eth 那行,要将捉到的关键字显色，且加上行号来表示 
[[email protected] ~]# dmesg | grep -n --color=auto 'eth' 
247:eth0: RealTek RTL8139 at 0xee846000, 00:90:cc:a6:34:84, IRQ 10 
248:eth0: Identified 8139 chip type 'RTL-8139C' 
294:eth0: link up, 100Mbps, full-duplex, lpa 0xC5E1 
305:eth0: no IPv6 routers present

grep与正规表达式

字符类 
 
字符类的搜索：如果我想要搜寻 test 或 taste 这两个单字时，可以发现到，其实她们有共通的 't?st' 存在～这个时候，我可以这样来搜寻： 
[[email protected] ~]# grep -n 't[ae]st' regular_express.txt 
8:I can't finish the test. 
9:Oh! The soup taste good. 
 
字符类的反向选择 [^] ：如果想要搜索到有 oo 的行，但不想要 oo 前面有 g 
[[email protected] ~]# grep -n '[^g]oo' regular_express.txt 
2:apple is my favorite food. 
3:Football game is not use feet only. 
18:google is the best tools for search keyword. 
19:goooooogle yes! 

我们要取得有数字的那一行，就这样 
[[email protected] ~]# grep -n '[0-9]' regular_express.txt 
5:However, this dress is about $ 3183 dollars. 
15:You are the best is mean you are the no. 1. 
 
 
 
行首与行尾字节 ^ $ 
 
行首字符：如果我想要让 the 只在行首列出呢？ 这个时候就得要使用定位字节了！我们可以这样做： 
[[email protected] ~]# grep -n '^the' regular_express.txt 
12:the symbol '*' is represented as start. 
 
此时，就只剩下第 12 行，因为只有第 12 行的行首是 the 开头啊～此外， 如果我想要开头是小写字节的那一行就列出呢？可以这样 
[[email protected] ~]# grep -n '^[a-z]' regular_express.txt 
2:apple is my favorite food. 
4:this dress doesn't fit me. 
10:motorcycle is cheap than car. 
12:the symbol '*' is represented as start. 
18:google is the best tools for search keyword. 
19:goooooogle yes! 
20:go! go! Let's go. 
 
如果我不想要开头是英文字母 
[[email protected] ~]# grep -n '^[^a-zA-Z]' regular_express.txt 
1:"Open Source" is a good mechanism to develop programs. 
21:# I am VBird 

找出空白行 
[[email protected] ~]# grep -n '^$' regular_express.txt 
22: 
 
 
 
任意一个字节 . 与重复字节 * 
 
如果我想要字串开头与结尾都是 g，但是两个 g 之间仅能存在至少一个 o ，亦即是 gog, goog, gooog.... 等等，那该如何？ 
[[email protected] ~]# grep -n 'goo*g' regular_express.txt 
18:google is the best tools for search keyword. 
19:goooooogle yes! 
 
如果我想要找出 g 开头与 g 结尾的行，当中的字符可有可无 
[[email protected] ~]# grep -n 'g.*g' regular_express.txt 
1:"Open Source" is a good mechanism to develop programs. 
14:The gd software is a library for drafting programs. 
18:google is the best tools for search keyword. 
19:goooooogle yes! 
20:go! go! Let's go. 
 
 
 
限定连续 RE 字符范围 {} 

假设我们要找出 g 后面接 2 到 5 个 o ，然后再接一个 g 的字串，他会是这样 
[[email protected] ~]# grep -n 'go\{2,5\}g' regular_express.txt 
18:google is the best tools for search keyword. 
 
如果我想要的是 2 个 o 以上的 goooo....g 呢？除了可以是 gooo*g ，也可以是： 
[[email protected] ~]# grep -n 'go\{2,\}g' regular_express.txt 
18:google is the best tools for search keyword. 
19:goooooogle yes!

上一篇：海量数据处理的 Top K相关问题

下一篇： GitBook使用笔记

[转]grep 在文本中查找内容

grep 实例

grep 常用方式

grep与正规表达式

在Word2010中同时显示所有查找到的内容

在Word2010文档中突出显示查找到的内容

在Word2003文档中查找带有格式的文本

在Word2007中突出显示查找到的内容

在Word2007中同时显示所有查找到的内容

在python中获取div的文本内容并和想定结果进行对比详解

(转)关于在中显示文本的自动截断

(转)关于在中显示文本的自动截断

在Word2007中如何让文本内容自动适应单元格列宽和行高

要求在H5中写一个下拉框，点击下拉框内容会出现在文本域中（代码详解）