Linux_Linux_sort 命令
最近有被问到如何在Linux 中实现 2个可能重复文件的交集。下面,我们进行下梳理。
函数介绍 英文
首先,看下sort 的函数介绍 :
可以使用的方法 man sort / sort -h
[[email protected] linux_cmd_test]# sort --help
Usage: sort [OPTION]... [FILE]...
or: sort [OPTION]... --files0-from=F
Write sorted concatenation of all FILE(s) to standard output.
Mandatory arguments to long options are mandatory for short options too.
Ordering options:
-b, --ignore-leading-blanks ignore leading blanks
-d, --dictionary-order consider only blanks and alphanumeric characters
-f, --ignore-case fold lower case to upper case characters
-g, --general-numeric-sort compare according to general numerical value
-i, --ignore-nonprinting consider only printable characters
-M, --month-sort compare (unknown) < 'JAN' < ... < 'DEC'
-h, --human-numeric-sort compare human readable numbers (e.g., 2K 1G)
-n, --numeric-sort compare according to string numerical value
-R, --random-sort sort by random hash of keys
--random-source=FILE get random bytes from FILE
-r, --reverse reverse the result of comparisons
--sort=WORD sort according to WORD:
general-numeric -g, human-numeric -h, month -M,
numeric -n, random -R, version -V
-V, --version-sort natural sort of (version) numbers within text
Other options:
--batch-size=NMERGE merge at most NMERGE inputs at once;
for more use temp files
-c, --check, --check=diagnose-first check for sorted input; do not sort
-C, --check=quiet, --check=silent like -c, but do not report first bad line
--compress-program=PROG compress temporaries with PROG;
decompress them with PROG -d
--debug annotate the part of the line used to sort,
and warn about questionable usage to stderr
--files0-from=F read input from the files specified by
NUL-terminated names in file F;
If F is - then read names from standard input
-k, --key=KEYDEF sort via a key; KEYDEF gives location and type
-m, --merge merge already sorted files; do not sort
-o, --output=FILE write result to FILE instead of standard output
-s, --stable stabilize sort by disabling last-resort comparison
-S, --buffer-size=SIZE use SIZE for main memory buffer
-t, --field-separator=SEP use SEP instead of non-blank to blank transition
-T, --temporary-directory=DIR use DIR for temporaries, not $TMPDIR or /tmp;
multiple options specify multiple directories
--parallel=N change the number of sorts run concurrently to N
-u, --unique with -c, check for strict ordering;
without -c, output only the first of an equal run
-z, --zero-terminated end lines with 0 byte, not newline
--help display this help and exit
--version output version information and exit
KEYDEF is F[.C][OPTS][,F[.C][OPTS]] for start and stop position, where F is a
field number and C a character position in the field; both are origin 1, and
the stop position defaults to the line's end. If neither -t nor -b is in
effect, characters in a field are counted from the beginning of the preceding
whitespace. OPTS is one or more single-letter ordering options [bdfgiMhnRrV],
which override global ordering options for that key. If no key is given, use
the entire line as the key.
SIZE may be followed by the following multiplicative suffixes:
% 1% of memory, b 1, K 1024 (default), and so on for M, G, T, P, E, Z, Y.
With no FILE, or when FILE is -, read standard input.
*** WARNING ***
The locale specified by the environment affects sort order.
Set LC_ALL=C to get the traditional sort order that uses
native byte values.
GNU coreutils online help: <http://www.gnu.org/software/coreutils/>
For complete documentation, run: info coreutils 'sort invocation'
函数介绍 中文
再看下中文的具体介绍:
语法
sort [-bcdfimMnr][-o<输出文件>][-t<分隔字符>][+<起始栏位>-<结束栏位>][--help][--verison][文件]
参数说明:
-b 忽略每行前面开始出的空格字符。
-c 检查文件是否已经按照顺序排序。
-d 排序时,处理英文字母、数字及空格字符外,忽略其他的字符。
-f 排序时,将小写字母视为大写字母。
-i 排序时,除了040至176之间的ASCII字符外,忽略其他的字符。
-m 将几个排序好的文件进行合并。
-M 将前面3个字母依照月份的缩写进行排序。
-n 依照数值的大小排序。
-u 意味着是唯一的(unique),输出的结果是去完重了的。
-o<输出文件> 将排序后的结果存入指定的文件。
-r 以相反的顺序来排序。
-t<分隔字符> 指定排序时所用的栏位分隔字符。
+<起始栏位>-<结束栏位> 以指定的栏位来排序,范围由起始栏位到结束栏位的前一栏位。
--help 显示帮助。
--version 显示版本信息。
实例
构建 2个测试文件
test.txt
[[email protected] linux_cmd_test]# cat test.txt
test
as
test
ss
pz
sda
as
test2.txt
[[email protected] linux_cmd_test]# cat test2.txt
test
wq
qq
ss
ssdz
ww
案例一
在使用sort命令以默认的式对文件的行进行排序,使用的命令如下:(默认是字典序)
[[email protected] linux_cmd_test]# sort test.txt
as
as
pz
sda
ss
test
test
案例二
-r
字典序倒序
[[email protected] linux_cmd_test]# sort -r test.txt
test
test
ss
sda
pz
as
as
案例三
-r -u
字典序排序并去重
[[email protected] linux_cmd_test]# sort -r -u test.txt
test
ss
sda
pz
as
案例四
求 test.txt test2.txt 的交集,并去重输出到另一个文件
[[email protected] linux_cmd_test]# cat test2.txt test.txt | sort -u -o intersect.txt
[[email protected] linux_cmd_test]# cat intersect.txt
as
pz
qq
sda
ss
ssdz
test
wq
ww
上一篇: 服务器安装thinkphp 主机宝 nginx伪静态
下一篇: Linux 下 ss 命令使用示例