shell入门学习笔记-16-命令详解:排序sort与去重uniq
程序员文章站
2024-02-24 10:41:40
...
系列目录与参考文献传送门: shell入门学习笔记-序章
今天有个接口出了问题,在排查日志时,用到了去重命令,因此今天复习一下去重相关的两个命令。
sort
正序排序sort
admindeMacBook-Pro:myshell admin$ cat logg.txt
thread-id:001
thread-id:002
thread-id:001
thread-id:001
thread-id:001
thread-id:002
thread-id:003
admindeMacBook-Pro:myshell admin$ cat logg.txt |sort
thread-id:001
thread-id:001
thread-id:001
thread-id:001
thread-id:002
thread-id:002
thread-id:003
倒序排序sort -r
admindeMacBook-Pro:myshell admin$ cat logg.txt |sort -r
thread-id:003
thread-id:002
thread-id:002
thread-id:001
thread-id:001
thread-id:001
thread-id:001
- 可以把
-r
理解成-reverse
,即:颠倒的。
排序之后去重sort -u
admindeMacBook-Pro:myshell admin$ cat log.txt | sort
thread-id:001
thread-id:001
thread-id:001
thread-id:001
thread-id:002
thread-id:002
thread-id:003
admindeMacBook-Pro:myshell admin$ cat log.txt | sort -u
thread-id:001
thread-id:002
thread-id:003
- 可以把
-u
理解成-unique
,即:独特的。
数值模式排序sort -n
admindeMacBook-Pro:myshell admin$ seq 9 11
9
10
11
admindeMacBook-Pro:myshell admin$ seq 9 11 | sort
10
11
9
admindeMacBook-Pro:myshell admin$ seq 9 11 | sort -n
9
10
11
- 默认
sort
将被排序内容当做字符串
处理,所以1xxxx
排在9xxxx
之前。 -
sort -n
即将被排序内容当做数值处理。 - 可以把
-n
理解成-number
,即:数值。
指定列排序sort -k
admindeMacBook-Pro:myshell admin$ cat log.txt
thread-id:001 beginTime 111
thread-id:002 beginTime 222
thread-id:001 endTime 333
thread-id:002 endTime 444
thread-id:003 beginTime 555
thread-id:003 endTime 666
admindeMacBook-Pro:myshell admin$ cat log.txt |sort
thread-id:001 beginTime 111
thread-id:001 endTime 333
thread-id:002 beginTime 222
thread-id:002 endTime 444
thread-id:003 beginTime 555
thread-id:003 endTime 666
admindeMacBook-Pro:myshell admin$ cat log.txt |sort -k 2
thread-id:001 beginTime 111
thread-id:002 beginTime 222
thread-id:003 beginTime 555
thread-id:001 endTime 333
thread-id:002 endTime 444
thread-id:003 endTime 666
-
sort -k 2
即按照第二列进行排序。 -
sort
相当于sort -k 1
。
指定列分隔符sort -t{symbol}
通过sort -k
可以实现指定列排序,那么如果所有内容都在一列呢?这时可以尝试通过-t
解决。
admindeMacBook-Pro:myshell admin$ cat log.txt
thread-id:001 beginTime 111:05
thread-id:002 beginTime 222:01
thread-id:001 endTime 333:06
thread-id:002 endTime 444:04
thread-id:003 beginTime 555:02
thread-id:003 endTime 666:03
admindeMacBook-Pro:myshell admin$ cat log.txt |sort -k 3
thread-id:001 beginTime 111:05
thread-id:002 beginTime 222:01
thread-id:001 endTime 333:06
thread-id:002 endTime 444:04
thread-id:003 beginTime 555:02
thread-id:003 endTime 666:03
admindeMacBook-Pro:myshell admin$ cat log.txt |sort -t: -k3
thread-id:002 beginTime 222:01
thread-id:003 beginTime 555:02
thread-id:003 endTime 666:03
thread-id:002 endTime 444:04
thread-id:001 beginTime 111:05
thread-id:001 endTime 333:06
- 上述日志中假定形如
111:05
的数据时一个时间格式,前面是日期,后面是小时。 - 现在需要按照小时进行排序,则通过
sort -k 3
无法解决,因为日期和小时在一起。 - 通过
sort -t:
:指定列的划分不是通过空格而是通过:
符号。 - 也就是说
thread-id:002 beginTime 222:01
划分为三列:thread-id
、002 beginTime 222
、01
- 然后再按照
-k3
实现按小时排序。
uniq
uniq
相对于sort
功能更加简单。
连续重复行去重uniq
admindeMacBook-Pro:myshell admin$ cat logg.txt
thread-id:001
thread-id:002
thread-id:001
thread-id:001
thread-id:001
thread-id:002
thread-id:003
admindeMacBook-Pro:myshell admin$ cat logg.txt |uniq
thread-id:001
thread-id:002
thread-id:001
thread-id:002
thread-id:003
- 一定注意:
uniq
只去重连续重复
的行。
所有重复行去重sort |uniq
admindeMacBook-Pro:myshell admin$ cat logg.txt
thread-id:001
thread-id:002
thread-id:001
thread-id:001
thread-id:001
thread-id:002
thread-id:003
admindeMacBook-Pro:myshell admin$ cat logg.txt |sort
thread-id:001
thread-id:001
thread-id:001
thread-id:001
thread-id:002
thread-id:002
thread-id:003
admindeMacBook-Pro:myshell admin$ cat logg.txt |sort |uniq
thread-id:001
thread-id:002
thread-id:003
admindeMacBook-Pro:myshell admin$ cat logg.txt |sort -u
thread-id:001
thread-id:002
thread-id:003
-
cat logg.txt |sort |uniq
与cat logg.txt |sort -u
效果一致。
统计重复次数uniq -c
admindeMacBook-Pro:myshell admin$ cat logg.txt
thread-id:001
thread-id:002
thread-id:001
thread-id:001
thread-id:001
thread-id:002
thread-id:003
admindeMacBook-Pro:myshell admin$ cat logg.txt |uniq -c
1 thread-id:001
1 thread-id:002
3 thread-id:001
1 thread-id:002
1 thread-id:003
admindeMacBook-Pro:myshell admin$ cat logg.txt |sort
thread-id:001
thread-id:001
thread-id:001
thread-id:001
thread-id:002
thread-id:002
thread-id:003
admindeMacBook-Pro:myshell admin$ cat logg.txt |sort |uniq -c
4 thread-id:001
2 thread-id:002
1 thread-id:003
- 排序会对统计结果产生影响。
- 可以把
-c
理解为-count
,即:总数。
只保留重复行uniq -d
admindeMacBook-Pro:myshell admin$ cat logg.txt |uniq -c
1 thread-id:001
1 thread-id:002
3 thread-id:001
1 thread-id:002
1 thread-id:003
admindeMacBook-Pro:myshell admin$ cat logg.txt |uniq -d
thread-id:001
admindeMacBook-Pro:myshell admin$ cat logg.txt |sort |uniq -c
4 thread-id:001
2 thread-id:002
1 thread-id:003
admindeMacBook-Pro:myshell admin$ cat logg.txt |sort |uniq -d
thread-id:001
thread-id:002
- 可以把
-d
理解为-duplicate
,即:重复的。
只保留不重复的行uniq -u
admindeMacBook-Pro:myshell admin$ cat logg.txt |uniq -c
1 thread-id:001
1 thread-id:002
3 thread-id:001
1 thread-id:002
1 thread-id:003
admindeMacBook-Pro:myshell admin$ cat logg.txt |uniq -u
thread-id:001
thread-id:002
thread-id:002
thread-id:003
admindeMacBook-Pro:myshell admin$ cat logg.txt |sort |uniq -c
4 thread-id:001
2 thread-id:002
1 thread-id:003
admindeMacBook-Pro:myshell admin$ cat logg.txt |sort |uniq -u
thread-id:003
- 可以把
-u
理解为-unique
,即:独特的。
上一篇: grep+cut+sort+uniq