欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页

Linux_Linux_sort 命令

程序员文章站 2022-04-30 23:10:01
...

 最近有被问到如何在Linux 中实现 2个可能重复文件的交集。下面,我们进行下梳理。

 

 

函数介绍 英文

首先,看下sort 的函数介绍 :

可以使用的方法 man sort / sort -h

[[email protected] linux_cmd_test]# sort --help
Usage: sort [OPTION]... [FILE]...
  or:  sort [OPTION]... --files0-from=F
Write sorted concatenation of all FILE(s) to standard output.

Mandatory arguments to long options are mandatory for short options too.
Ordering options:

  -b, --ignore-leading-blanks  ignore leading blanks
  -d, --dictionary-order      consider only blanks and alphanumeric characters
  -f, --ignore-case           fold lower case to upper case characters
  -g, --general-numeric-sort  compare according to general numerical value
  -i, --ignore-nonprinting    consider only printable characters
  -M, --month-sort            compare (unknown) < 'JAN' < ... < 'DEC'
  -h, --human-numeric-sort    compare human readable numbers (e.g., 2K 1G)
  -n, --numeric-sort          compare according to string numerical value
  -R, --random-sort           sort by random hash of keys
      --random-source=FILE    get random bytes from FILE
  -r, --reverse               reverse the result of comparisons
      --sort=WORD             sort according to WORD:
                                general-numeric -g, human-numeric -h, month -M,
                                numeric -n, random -R, version -V
  -V, --version-sort          natural sort of (version) numbers within text

Other options:

      --batch-size=NMERGE   merge at most NMERGE inputs at once;
                            for more use temp files
  -c, --check, --check=diagnose-first  check for sorted input; do not sort
  -C, --check=quiet, --check=silent  like -c, but do not report first bad line
      --compress-program=PROG  compress temporaries with PROG;
                              decompress them with PROG -d
      --debug               annotate the part of the line used to sort,
                              and warn about questionable usage to stderr
      --files0-from=F       read input from the files specified by
                            NUL-terminated names in file F;
                            If F is - then read names from standard input
  -k, --key=KEYDEF          sort via a key; KEYDEF gives location and type
  -m, --merge               merge already sorted files; do not sort
  -o, --output=FILE         write result to FILE instead of standard output
  -s, --stable              stabilize sort by disabling last-resort comparison
  -S, --buffer-size=SIZE    use SIZE for main memory buffer
  -t, --field-separator=SEP  use SEP instead of non-blank to blank transition
  -T, --temporary-directory=DIR  use DIR for temporaries, not $TMPDIR or /tmp;
                              multiple options specify multiple directories
      --parallel=N          change the number of sorts run concurrently to N
  -u, --unique              with -c, check for strict ordering;
                              without -c, output only the first of an equal run
  -z, --zero-terminated     end lines with 0 byte, not newline
      --help     display this help and exit
      --version  output version information and exit

KEYDEF is F[.C][OPTS][,F[.C][OPTS]] for start and stop position, where F is a
field number and C a character position in the field; both are origin 1, and
the stop position defaults to the line's end.  If neither -t nor -b is in
effect, characters in a field are counted from the beginning of the preceding
whitespace.  OPTS is one or more single-letter ordering options [bdfgiMhnRrV],
which override global ordering options for that key.  If no key is given, use
the entire line as the key.

SIZE may be followed by the following multiplicative suffixes:
% 1% of memory, b 1, K 1024 (default), and so on for M, G, T, P, E, Z, Y.

With no FILE, or when FILE is -, read standard input.

*** WARNING ***
The locale specified by the environment affects sort order.
Set LC_ALL=C to get the traditional sort order that uses
native byte values.

GNU coreutils online help: <http://www.gnu.org/software/coreutils/>
For complete documentation, run: info coreutils 'sort invocation'

 

函数介绍 中文

再看下中文的具体介绍:

语法

sort [-bcdfimMnr][-o<输出文件>][-t<分隔字符>][+<起始栏位>-<结束栏位>][--help][--verison][文件]

参数说明:

-b 忽略每行前面开始出的空格字符。

-c 检查文件是否已经按照顺序排序。

-d 排序时,处理英文字母、数字及空格字符外,忽略其他的字符。

-f 排序时,将小写字母视为大写字母。

-i 排序时,除了040至176之间的ASCII字符外,忽略其他的字符。

-m 将几个排序好的文件进行合并。

-M 将前面3个字母依照月份的缩写进行排序。

-n 依照数值的大小排序。

-u 意味着是唯一的(unique),输出的结果是去完重了的。

-o<输出文件> 将排序后的结果存入指定的文件。

-r 以相反的顺序来排序。

-t<分隔字符> 指定排序时所用的栏位分隔字符。

+<起始栏位>-<结束栏位> 以指定的栏位来排序,范围由起始栏位到结束栏位的前一栏位。

--help 显示帮助。

--version 显示版本信息。

 

实例

构建 2个测试文件

test.txt

[[email protected] linux_cmd_test]# cat test.txt
test
as
test
ss
pz
sda
as

test2.txt

[[email protected] linux_cmd_test]# cat test2.txt
test
wq
qq
ss
ssdz
ww

 

案例一

在使用sort命令以默认的式对文件的行进行排序,使用的命令如下:(默认是字典序)

[[email protected] linux_cmd_test]# sort test.txt
as
as
pz
sda
ss
test
test

 

案例二

-r

字典序倒序

[[email protected] linux_cmd_test]# sort -r test.txt
test
test
ss
sda
pz
as
as

 

案例三

-r -u 

字典序排序并去重

[[email protected] linux_cmd_test]# sort -r -u test.txt
test
ss
sda
pz
as

 

案例四

求 test.txt test2.txt 的交集,并去重输出到另一个文件

[[email protected] linux_cmd_test]# cat test2.txt test.txt | sort -u -o intersect.txt
[[email protected] linux_cmd_test]# cat intersect.txt 
as
pz
qq
sda
ss
ssdz
test
wq
ww

 

相关标签: Linux