欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页

shell入门学习笔记-16-命令详解:排序sort与去重uniq

程序员文章站 2024-02-24 10:41:40
...

系列目录与参考文献传送门: shell入门学习笔记-序章

今天有个接口出了问题,在排查日志时,用到了去重命令,因此今天复习一下去重相关的两个命令。

sort

正序排序sort

admindeMacBook-Pro:myshell admin$ cat logg.txt
thread-id:001
thread-id:002
thread-id:001
thread-id:001
thread-id:001
thread-id:002
thread-id:003
admindeMacBook-Pro:myshell admin$ cat logg.txt |sort
thread-id:001
thread-id:001
thread-id:001
thread-id:001
thread-id:002
thread-id:002
thread-id:003

倒序排序sort -r

admindeMacBook-Pro:myshell admin$ cat logg.txt |sort -r
thread-id:003
thread-id:002
thread-id:002
thread-id:001
thread-id:001
thread-id:001
thread-id:001
  • 可以把-r理解成-reverse,即:颠倒的。

排序之后去重sort -u

admindeMacBook-Pro:myshell admin$ cat log.txt | sort
thread-id:001
thread-id:001
thread-id:001
thread-id:001
thread-id:002
thread-id:002
thread-id:003
admindeMacBook-Pro:myshell admin$ cat log.txt | sort -u
thread-id:001
thread-id:002
thread-id:003
  • 可以把-u理解成-unique,即:独特的。

数值模式排序sort -n

admindeMacBook-Pro:myshell admin$ seq 9 11
9
10
11
admindeMacBook-Pro:myshell admin$ seq 9 11 | sort
10
11
9
admindeMacBook-Pro:myshell admin$ seq 9 11 | sort -n
9
10
11
  • 默认sort将被排序内容当做字符串处理,所以1xxxx排在9xxxx之前。
  • sort -n即将被排序内容当做数值处理。
  • 可以把-n理解成-number,即:数值。

指定列排序sort -k

admindeMacBook-Pro:myshell admin$ cat log.txt
thread-id:001 beginTime 111
thread-id:002 beginTime 222
thread-id:001 endTime 333
thread-id:002 endTime 444
thread-id:003 beginTime 555
thread-id:003 endTime 666
admindeMacBook-Pro:myshell admin$ cat log.txt |sort
thread-id:001 beginTime 111
thread-id:001 endTime 333
thread-id:002 beginTime 222
thread-id:002 endTime 444
thread-id:003 beginTime 555
thread-id:003 endTime 666
admindeMacBook-Pro:myshell admin$ cat log.txt |sort -k 2
thread-id:001 beginTime 111
thread-id:002 beginTime 222
thread-id:003 beginTime 555
thread-id:001 endTime 333
thread-id:002 endTime 444
thread-id:003 endTime 666
  • sort -k 2即按照第二列进行排序。
  • sort相当于sort -k 1

指定列分隔符sort -t{symbol}

通过sort -k可以实现指定列排序,那么如果所有内容都在一列呢?这时可以尝试通过-t解决。

admindeMacBook-Pro:myshell admin$ cat log.txt
thread-id:001 beginTime 111:05
thread-id:002 beginTime 222:01
thread-id:001 endTime 333:06
thread-id:002 endTime 444:04
thread-id:003 beginTime 555:02
thread-id:003 endTime 666:03
admindeMacBook-Pro:myshell admin$ cat log.txt |sort -k 3
thread-id:001 beginTime 111:05
thread-id:002 beginTime 222:01
thread-id:001 endTime 333:06
thread-id:002 endTime 444:04
thread-id:003 beginTime 555:02
thread-id:003 endTime 666:03
admindeMacBook-Pro:myshell admin$ cat log.txt |sort -t: -k3
thread-id:002 beginTime 222:01
thread-id:003 beginTime 555:02
thread-id:003 endTime 666:03
thread-id:002 endTime 444:04
thread-id:001 beginTime 111:05
thread-id:001 endTime 333:06
  • 上述日志中假定形如111:05的数据时一个时间格式,前面是日期,后面是小时。
  • 现在需要按照小时进行排序,则通过sort -k 3无法解决,因为日期和小时在一起。
  • 通过sort -t::指定列的划分不是通过空格而是通过:符号。
  • 也就是说thread-id:002 beginTime 222:01划分为三列:thread-id002 beginTime 22201
  • 然后再按照-k3实现按小时排序。

uniq

uniq相对于sort功能更加简单。

连续重复行去重uniq

admindeMacBook-Pro:myshell admin$ cat logg.txt
thread-id:001
thread-id:002
thread-id:001
thread-id:001
thread-id:001
thread-id:002
thread-id:003
admindeMacBook-Pro:myshell admin$ cat logg.txt |uniq
thread-id:001
thread-id:002
thread-id:001
thread-id:002
thread-id:003
  • 一定注意:uniq只去重连续重复的行。

所有重复行去重sort |uniq

admindeMacBook-Pro:myshell admin$ cat logg.txt
thread-id:001
thread-id:002
thread-id:001
thread-id:001
thread-id:001
thread-id:002
thread-id:003
admindeMacBook-Pro:myshell admin$ cat logg.txt |sort
thread-id:001
thread-id:001
thread-id:001
thread-id:001
thread-id:002
thread-id:002
thread-id:003
admindeMacBook-Pro:myshell admin$ cat logg.txt |sort |uniq
thread-id:001
thread-id:002
thread-id:003
admindeMacBook-Pro:myshell admin$ cat logg.txt |sort -u
thread-id:001
thread-id:002
thread-id:003
  • cat logg.txt |sort |uniqcat logg.txt |sort -u效果一致。

统计重复次数uniq -c

admindeMacBook-Pro:myshell admin$ cat logg.txt
thread-id:001
thread-id:002
thread-id:001
thread-id:001
thread-id:001
thread-id:002
thread-id:003
admindeMacBook-Pro:myshell admin$ cat logg.txt |uniq -c
   1 thread-id:001
   1 thread-id:002
   3 thread-id:001
   1 thread-id:002
   1 thread-id:003
   
admindeMacBook-Pro:myshell admin$ cat logg.txt |sort
thread-id:001
thread-id:001
thread-id:001
thread-id:001
thread-id:002
thread-id:002
thread-id:003
admindeMacBook-Pro:myshell admin$ cat logg.txt |sort |uniq -c
   4 thread-id:001
   2 thread-id:002
   1 thread-id:003
  • 排序会对统计结果产生影响。
  • 可以把-c理解为-count,即:总数。

只保留重复行uniq -d

admindeMacBook-Pro:myshell admin$ cat logg.txt |uniq -c
   1 thread-id:001
   1 thread-id:002
   3 thread-id:001
   1 thread-id:002
   1 thread-id:003
admindeMacBook-Pro:myshell admin$ cat logg.txt |uniq -d
thread-id:001

admindeMacBook-Pro:myshell admin$ cat logg.txt |sort |uniq -c
   4 thread-id:001
   2 thread-id:002
   1 thread-id:003
admindeMacBook-Pro:myshell admin$ cat logg.txt |sort |uniq -d
thread-id:001
thread-id:002
  • 可以把-d理解为-duplicate,即:重复的。

只保留不重复的行uniq -u

admindeMacBook-Pro:myshell admin$ cat logg.txt |uniq -c
   1 thread-id:001
   1 thread-id:002
   3 thread-id:001
   1 thread-id:002
   1 thread-id:003
admindeMacBook-Pro:myshell admin$ cat logg.txt |uniq -u
thread-id:001
thread-id:002
thread-id:002
thread-id:003

admindeMacBook-Pro:myshell admin$ cat logg.txt |sort |uniq -c
   4 thread-id:001
   2 thread-id:002
   1 thread-id:003
admindeMacBook-Pro:myshell admin$ cat logg.txt |sort |uniq -u
thread-id:003
  • 可以把-u理解为-unique,即:独特的。