Linux 命令整理 —— 文件操作——Join 博客分类: OS/Linux linuxjoin
程序员文章站
2024-03-22 12:46:52
...
工作中如何让自己变得强大?向周围人学习!
同事求助Linux下俩文件相同部分做筛选,数据文件很大,如何操作。终于发现牛人,我就顺道偷师学艺。
相关链接:
Linux 命令整理 —— 基本操作
Linux 命令整理 —— 用户管理
一、一般关联
比方说有两个文件:
写道
$ cat 1
a 100
b 200
c 300
d 500
$ cat 2
c 2012-03-01
d 2012-05-01
a 2012-01-01
a 100
b 200
c 300
d 500
$ cat 2
c 2012-03-01
d 2012-05-01
a 2012-01-01
我想要两个文件中相互匹配的部分,也就是a、c、d部分。
写道
$ join 1 2
c 300 2012-03-01
d 500 2012-05-01
c 300 2012-03-01
d 500 2012-05-01
但实际上只有c、d两部分,因为要使用JOIN,带关联文件中首列必须先做好排序。具体看下面的参数说明。
JOIN 参数
$ join --help
Usage: join [OPTION]... FILE1 FILE2
For each pair of input lines with identical join fields, write a line to
standard output. The default join field is the first, delimited
by whitespace. When FILE1 or FILE2 (not both) is -, read standard input.
-a FILENUM print unpairable lines coming from file FILENUM, where
FILENUM is 1 or 2, corresponding to FILE1 or FILE2
-e EMPTY replace missing input fields with EMPTY
-i, --ignore-case ignore differences in case when comparing fields
-j FIELD equivalent to `-1 FIELD -2 FIELD'
-o FORMAT obey FORMAT while constructing output line
-t CHAR use CHAR as input and output field separator
-v FILENUM like -a FILENUM, but suppress joined output lines
-1 FIELD join on this FIELD of file 1
-2 FIELD join on this FIELD of file 2
--help 显示此帮助信息并离开
--version 显示版本信息并离开
Unless -t CHAR is given, leading blanks separate fields and are ignored,
else fields are separated by CHAR. Any FIELD is a field number counted
from 1. FORMAT is one or more comma or blank separated specifications,
each being `FILENUM.FIELD' or `0'. Default FORMAT outputs the join field,
the remaining fields from FILE1, the remaining fields from FILE2, all
separated by CHAR.
Important: FILE1 and FILE2 must be sorted on the join fields.
Report bugs to <bug-coreutils@gnu.org>.
Usage: join [OPTION]... FILE1 FILE2
For each pair of input lines with identical join fields, write a line to
standard output. The default join field is the first, delimited
by whitespace. When FILE1 or FILE2 (not both) is -, read standard input.
-a FILENUM print unpairable lines coming from file FILENUM, where
FILENUM is 1 or 2, corresponding to FILE1 or FILE2
-e EMPTY replace missing input fields with EMPTY
-i, --ignore-case ignore differences in case when comparing fields
-j FIELD equivalent to `-1 FIELD -2 FIELD'
-o FORMAT obey FORMAT while constructing output line
-t CHAR use CHAR as input and output field separator
-v FILENUM like -a FILENUM, but suppress joined output lines
-1 FIELD join on this FIELD of file 1
-2 FIELD join on this FIELD of file 2
--help 显示此帮助信息并离开
--version 显示版本信息并离开
Unless -t CHAR is given, leading blanks separate fields and are ignored,
else fields are separated by CHAR. Any FIELD is a field number counted
from 1. FORMAT is one or more comma or blank separated specifications,
each being `FILENUM.FIELD' or `0'. Default FORMAT outputs the join field,
the remaining fields from FILE1, the remaining fields from FILE2, all
separated by CHAR.
Important: FILE1 and FILE2 must be sorted on the join fields.
Report bugs to <bug-coreutils@gnu.org>.
原来使用JOIN,类似于Database中的join一样,默认首列作为主外键自动关联。
JOIN还有很多参数,自己去领悟了。
二、特殊分隔符
“空格”是默认分隔符,如果要使用特定符号作为分隔符,譬如“,”,可以用“-t”参数:
写道
$ join 1 2 -t ,
c,300,2012-03-01
d,500,2012-05-01
c,300,2012-03-01
d,500,2012-05-01
三、特定字段关联
如果需要关联的字段不是首列,可以使用“-j”参数,以及“-1”,“-2”支出前后两个文件的具体列号:
写道
cat 1
a,100
b,200
c,300
d,500
$ cat 3
f,c
m,d
$ join 1 3 -t , -1 1 -2 2
c,300,f
d,500,m
a,100
b,200
c,300
d,500
$ cat 3
f,c
m,d
$ join 1 3 -t , -1 1 -2 2
c,300,f
d,500,m
文件3的关联字段是第二列,所以有了上面的这种写法。
四,显示特定列
如果我只需要特定列输出,不关心文件原有部分,可以用“-o”参数。
写道
$ join 1 2 -t , -o 1.1,1.2,2.2
c,300,2012-03-01
d,500,2012-05-01
c,300,2012-03-01
d,500,2012-05-01
注意待显示列,用“,”分割。
还有几个参数,请大家自悟吧!
目前这些,已经够用了!
相关链接:
Linux 命令整理 —— 基本操作
Linux 命令整理 —— 用户管理