欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页

awk命令示例详解

程序员文章站 2022-04-30 14:46:35
...

awk options program file

一种用于文本处理的编程语言工具

参数options通常可以有以下选项

  • F fs:指定文件分隔符
  • f file:指定awk脚本文件
  • v var=value:定义变量

使用变量

  • $0:表示整行
  • $1:表示第一个数据字段
  • $2:表示第二个数据字段
  • $n:表示第N个数据字段

假设我们有myfile定义如下:

[email protected]:~$ cat myfile
this is a test
this is the second test.
[email protected]:~$ 

执行命令:

awk '{print $1}' myfile

得到如下执行结果:

[email protected]:~$ awk '{print $1}' myfile
this
this
[email protected]:~$ 
[email protected]:~$ 

如果某些文件中分隔符不是空格或者Tab键,我们可以通过-F参数来指定文件分隔符:

awk -F: '{print $1}' /etc/passwd

得到如下结果:

[email protected]:~$ awk -F: '{print $1}' /etc/passwd
root
daemon
bin
sys
sync
games
man
lp
mail
news
uucp
proxy
www-data
backup
list
irc
gnats
nobody
systemd-timesync
systemd-network
systemd-resolve
syslog
_apt
messagebus
uuidd
lightdm
whoopsie
avahi-autoipd
avahi
dnsmasq
colord
speech-dispatcher
hplip
kernoops
pulse
rtkit
saned
usbmux
icbc
mysql
cups-pk-helper
geoclue
gdm
gnome-initial-setup
redis
jenkins
sshd
[email protected]:~$ 
[email protected]:~$ 

使用多个命令

echo "Hello Tom" | awk '{$2="Adam"; print $0}'

上述命令会将$2的值设置为Adam,打印整行得到如下结果:

[email protected]:~$ echo "Hello Tom" | awk '{$2="Adam"; print $0}'
Hello Adam
[email protected]:~$ 
[email protected]:~$ 

从文件中读取脚本文件

我们有testfile内容如下:

[email protected]:~$ 
[email protected]:~$ cat testfile
{print $1 " home at " $6}

[email protected]:~$ 
[email protected]:~$ 

调用脚本文件得到如下内容:

[email protected]:~$ 
[email protected]:~$ awk -F: -f testfile /etc/passwd
root home at /root
daemon home at /usr/sbin
bin home at /bin
sys home at /dev
sync home at /bin
games home at /usr/games
man home at /var/cache/man
lp home at /var/spool/lpd
mail home at /var/mail
news home at /var/spool/news
uucp home at /var/spool/uucp
proxy home at /bin
www-data home at /var/www
backup home at /var/backups
list home at /var/list
irc home at /var/run/ircd
gnats home at /var/lib/gnats
nobody home at /nonexistent
systemd-timesync home at /run/systemd
systemd-network home at /run/systemd/netif
systemd-resolve home at /run/systemd/resolve
syslog home at /home/syslog
_apt home at /nonexistent
messagebus home at /var/run/dbus
uuidd home at /run/uuidd
lightdm home at /var/lib/lightdm
whoopsie home at /nonexistent
avahi-autoipd home at /var/lib/avahi-autoipd
avahi home at /var/run/avahi-daemon
dnsmasq home at /var/lib/misc
colord home at /var/lib/colord
speech-dispatcher home at /var/run/speech-dispatcher
hplip home at /var/run/hplip
kernoops home at /
pulse home at /var/run/pulse
rtkit home at /proc
saned home at /var/lib/saned
usbmux home at /var/lib/usbmux
icbc home at /home/icbc
mysql home at /nonexistent
cups-pk-helper home at /home/cups-pk-helper
geoclue home at /var/lib/geoclue
gdm home at /var/lib/gdm3
gnome-initial-setup home at /run/gnome-initial-setup/
redis home at /var/lib/redis
jenkins home at /var/lib/jenkins
sshd home at /run/sshd
[email protected]:~$ 

我们将脚本文件内容修改如下:

[email protected]:~$ 
[email protected]:~$ cat testfile2
{
text = " home at "
print $1 $6
}
[email protected]:~$ 
[email protected]:~$ 

再次调用得到如下结果:

[email protected]:~$ awk -F: -f testfile2 /etc/passwd
root/root
daemon/usr/sbin
bin/bin
sys/dev
sync/bin
games/usr/games
man/var/cache/man
lp/var/spool/lpd
mail/var/mail
news/var/spool/news
uucp/var/spool/uucp
proxy/bin
www-data/var/www
backup/var/backups
list/var/list
irc/var/run/ircd
gnats/var/lib/gnats
nobody/nonexistent
systemd-timesync/run/systemd
systemd-network/run/systemd/netif
systemd-resolve/run/systemd/resolve
syslog/home/syslog
_apt/nonexistent
messagebus/var/run/dbus
uuidd/run/uuidd
lightdm/var/lib/lightdm
whoopsie/nonexistent
avahi-autoipd/var/lib/avahi-autoipd
avahi/var/run/avahi-daemon
dnsmasq/var/lib/misc
colord/var/lib/colord
speech-dispatcher/var/run/speech-dispatcher
hplip/var/run/hplip
kernoops/
pulse/var/run/pulse
rtkit/proc
saned/var/lib/saned
usbmux/var/lib/usbmux
icbc/home/icbc
mysql/nonexistent
cups-pk-helper/home/cups-pk-helper
geoclue/var/lib/geoclue
gdm/var/lib/gdm3
gnome-initial-setup/run/gnome-initial-setup/
redis/var/lib/redis
jenkins/var/lib/jenkins
sshd/run/sshd
[email protected]:~$ 

awk预处理

如果我们需要对我们的处理结果添加标题或者抬头,那么我们就可以使用BEGIN关键字来实现,BEGIN关键字会确保在数据处理前执行。

awk 'BEGIN {print "The File Contents:"}
 
{print $0}' myfile

得到执行结果如下:

[email protected]:~$ 
[email protected]:~$ awk 'BEGIN {print "The File Contents:"}
>  
> {print $0}' myfile
The File Contents:
this is a test
this is the second test.
[email protected]:~$ 
[email protected]:~$ 

###awk后处理
使用END关键字

awk 'BEGIN {print "The File Contents:"}
 
{print $0}
 
END {print "File footer"}' myfile

得到执行结果如下:

[email protected]:~$ 
[email protected]:~$ awk 'BEGIN {print "The File Contents:"}
>  
> {print $0}
>  
> END {print "File footer"}' myfile
The File Contents:
this is a test
this is the second test.
File footer
[email protected]:~$ 

组合起来一起使用

[email protected]:~$ cat testfile3
BEGIN{
print "USERS and their corresponding home"
print "UserName \t HomePath"
print "---\t---"
FS=":"
}
{
print $1 "\t" $6
}
END {
print "The end"
}
[email protected]:~$ 

调用执行得到如下结果

[email protected]:~$ awk -f testfile3  /etc/passwd
USERS and their corresponding home
UserName 	 HomePath
---	---
root	/root
daemon	/usr/sbin
bin	/bin
sys	/dev
sync	/bin
games	/usr/games
man	/var/cache/man
lp	/var/spool/lpd
mail	/var/mail
news	/var/spool/news
uucp	/var/spool/uucp
proxy	/bin
www-data	/var/www
backup	/var/backups
list	/var/list
irc	/var/run/ircd
gnats	/var/lib/gnats
nobody	/nonexistent
systemd-timesync	/run/systemd
systemd-network	/run/systemd/netif
systemd-resolve	/run/systemd/resolve
syslog	/home/syslog
_apt	/nonexistent
messagebus	/var/run/dbus
uuidd	/run/uuidd
lightdm	/var/lib/lightdm
whoopsie	/nonexistent
avahi-autoipd	/var/lib/avahi-autoipd
avahi	/var/run/avahi-daemon
dnsmasq	/var/lib/misc
colord	/var/lib/colord
speech-dispatcher	/var/run/speech-dispatcher
hplip	/var/run/hplip
kernoops	/
pulse	/var/run/pulse
rtkit	/proc
saned	/var/lib/saned
usbmux	/var/lib/usbmux
icbc	/home/icbc
mysql	/nonexistent
cups-pk-helper	/home/cups-pk-helper
geoclue	/var/lib/geoclue
gdm	/var/lib/gdm3
gnome-initial-setup	/run/gnome-initial-setup/
redis	/var/lib/redis
jenkins	/var/lib/jenkins
sshd	/run/sshd
The end
[email protected]:~$ 

使用内嵌变量

除了我们之前所提及的$1,$2等内嵌变量,awk还支持一些其他内嵌变量

  • FIELDWIDTHS :指定字段宽度
  • RS:指定记录分隔符
  • FS:指定字段分隔符
  • OFS:指定输出分隔符
  • ORS:指定输出分隔符

OFS默认为空格,也可以指定别的字符。

[email protected]:~$ awk 'BEGIN{FS=":"; OFS="-"} {print $1,$6,$7}' /etc/passwd
root-/root-/bin/bash
daemon-/usr/sbin-/usr/sbin/nologin
bin-/bin-/usr/sbin/nologin
sys-/dev-/usr/sbin/nologin
sync-/bin-/bin/sync
games-/usr/games-/usr/sbin/nologin
man-/var/cache/man-/usr/sbin/nologin
lp-/var/spool/lpd-/usr/sbin/nologin
mail-/var/mail-/usr/sbin/nologin
news-/var/spool/news-/usr/sbin/nologin
uucp-/var/spool/uucp-/usr/sbin/nologin
proxy-/bin-/usr/sbin/nologin
www-data-/var/www-/usr/sbin/nologin
backup-/var/backups-/usr/sbin/nologin
list-/var/list-/usr/sbin/nologin
irc-/var/run/ircd-/usr/sbin/nologin
gnats-/var/lib/gnats-/usr/sbin/nologin
nobody-/nonexistent-/usr/sbin/nologin
systemd-timesync-/run/systemd-/bin/false
systemd-network-/run/systemd/netif-/bin/false
systemd-resolve-/run/systemd/resolve-/bin/false
syslog-/home/syslog-/bin/false
_apt-/nonexistent-/bin/false
messagebus-/var/run/dbus-/bin/false
uuidd-/run/uuidd-/bin/false
lightdm-/var/lib/lightdm-/bin/false
whoopsie-/nonexistent-/bin/false
avahi-autoipd-/var/lib/avahi-autoipd-/bin/false
avahi-/var/run/avahi-daemon-/bin/false
dnsmasq-/var/lib/misc-/bin/false
colord-/var/lib/colord-/bin/false
speech-dispatcher-/var/run/speech-dispatcher-/bin/false
hplip-/var/run/hplip-/bin/false
kernoops-/-/bin/false
pulse-/var/run/pulse-/bin/false
rtkit-/proc-/bin/false
saned-/var/lib/saned-/bin/false
usbmux-/var/lib/usbmux-/bin/false
icbc-/home/icbc-/bin/bash
mysql-/nonexistent-/bin/false
cups-pk-helper-/home/cups-pk-helper-/usr/sbin/nologin
geoclue-/var/lib/geoclue-/usr/sbin/nologin
gdm-/var/lib/gdm3-/bin/false
gnome-initial-setup-/run/gnome-initial-setup/-/bin/false
redis-/var/lib/redis-/usr/sbin/nologin
jenkins-/var/lib/jenkins-/bin/bash
sshd-/run/sshd-/usr/sbin/nologin
[email protected]:~$ 

假设我们有文件内容如下:

[email protected]:~$ 
[email protected]:~$ cat cash
1235.96521
[email protected]:~$ 
[email protected]:~$ 

使用FIELDWIDTHS关键字得到如下结果:

[email protected]:~$ 
[email protected]:~$ awk 'BEGIN{FIELDWIDTHS="3 4 3"}{print $1,$2,$3}' cash
123 5.96 521
[email protected]:~$ 

假设我们有内容如下:

[email protected]:~$ 
[email protected]:~$ cat person
Person Name
123 High Street
(222)466-1234

Another person
487 High Street
(523)643-8754

[email protected]:~$ 

数据通过换行符区分,此时我们需要通过FS指定分隔符为换行符,RS为空

[email protected]:~$ 
[email protected]:~$ awk 'BEGIN{FS="\n"; RS=""} {print $1,$3}' person
Person Name (222)466-1234
Another person (523)643-8754
[email protected]:~$ 

其他变量

  • ARGC :传参个数

  • ARGV :命令行参数

  • ENVIRON :环境变量

  • FILENAME :awk处理目标文件

  • NF :正在处理行的记录数

  • NR :已处理记录数

  • FNR :被处理记录

  • IGNORECASE:忽略大小写

简单测试一下:

[email protected]:~$ 
[email protected]:~$ awk 'BEGIN{print ARGC,ARGV[1]}' myfile
2 myfile
[email protected]:~$ 

使用环境变量

[email protected]:~$  awk '
>  
> BEGIN{
>  
> print ENVIRON["PATH"]
>  
> }'
/usr/lib/jvm/java-8-openjdk-amd64/bin:/usr/lib/jvm/java-8-openjdk-amd64/jre/bin:/home/icbc/software/go/bin:/home/icbc/software/node-v9.4.0-linux-x64/bin:/home/icbc/bin:/home/icbc/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin
[email protected]:~$ 

也可以直接使用bash变量

[email protected]:~$ 
[email protected]:~$ echo | awk -v home=$HOME '{print "My home is " home}'
My home is /home/icbc
[email protected]:~$ 
[email protected]:~$ 

NF变量表示的是记录中的最后一个字段

[email protected]:~$ awk 'BEGIN{FS=":"; OFS=":"} {print $1,$NF}' /etc/passwd
root:/bin/bash
daemon:/usr/sbin/nologin
bin:/usr/sbin/nologin
sys:/usr/sbin/nologin
sync:/bin/sync
games:/usr/sbin/nologin
man:/usr/sbin/nologin
lp:/usr/sbin/nologin
mail:/usr/sbin/nologin
news:/usr/sbin/nologin
uucp:/usr/sbin/nologin
proxy:/usr/sbin/nologin
www-data:/usr/sbin/nologin
backup:/usr/sbin/nologin
list:/usr/sbin/nologin
irc:/usr/sbin/nologin
gnats:/usr/sbin/nologin
nobody:/usr/sbin/nologin
systemd-timesync:/bin/false
systemd-network:/bin/false
systemd-resolve:/bin/false
syslog:/bin/false
_apt:/bin/false
messagebus:/bin/false
uuidd:/bin/false
lightdm:/bin/false
whoopsie:/bin/false
avahi-autoipd:/bin/false
avahi:/bin/false
dnsmasq:/bin/false
colord:/bin/false
speech-dispatcher:/bin/false
hplip:/bin/false
kernoops:/bin/false
pulse:/bin/false
rtkit:/bin/false
saned:/bin/false
usbmux:/bin/false
icbc:/bin/bash
mysql:/bin/false
cups-pk-helper:/us

我们看下NRFNR的区别

[email protected]:~$ 
[email protected]:~$ awk 'BEGIN{FS=","}{print $1,"FNR="FNR}' myfile myfile
this is a test FNR=1
this is the second test. FNR=2
this is a test FNR=1
this is the second test. FNR=2
[email protected]:~$ 

上面的例子中,我们定义了两个输入文件,同一个文件处理两次。输出是第一个字段值和FNR。看下NRFNR的区别:处理到新文件时,FNR会重新变成1,而NR会继续累加。

[email protected]:~$ awk '
>  
> BEGIN {FS=","}
>  
> {print $1,"FNR="FNR,"NR="NR}
>  
> END{print "Total",NR,"processed lines"}' myfile myfile
this is a test FNR=1 NR=1
this is the second test. FNR=2 NR=2
this is a test FNR=1 NR=3
this is the second test. FNR=2 NR=4
Total 4 processed lines
[email protected]:~$ 

使用自定义变量

[email protected]:~$ 
[email protected]:~$ awk '
>  
> BEGIN{
>  
> test="Welcome to LikeGeeks website"
>  
> print test
>  
> }'
Welcome to LikeGeeks website
[email protected]:~$ 

结构化命令

假设有文件内容如下:

[email protected]:~$ cat numbers
10
15
6
33
45
[email protected]:~$ 
[email protected]:~$ 
[email protected]:~$ 

执行IF条件判断

[email protected]:~$ 
[email protected]:~$ awk '{if ($1 > 30) print $1}' numbers
33
45
[email protected]:~$ 

或者使用大括号执行多条语句

[email protected]:~$ 
[email protected]:~$ awk '{
>  
> if ($1 > 30)
>  
> {
>  
> x = $1 * 3
>  
> print x
>  
> }
>  
> }' numbers
99
135
[email protected]:~$ 

也可以使用ELSE语句

[email protected]:~$ awk '{
>  
> if ($1 > 30)
>  
> {
>  
> x = $1 * 3
>  
> print x
>  
> } else
>  
> {
>  
> x = $1 / 2
>  
> print x
>  
> }}' numbers
5
7.5
3
99
135
[email protected]:~$ 

也可以使用分号将ELSE写在一行上

[email protected]:~$ awk '{if ($1 > 20) print $1 *2;else print $1 /2 }' numbers
5
7.5
3
66
90
[email protected]:~$ 

While循环

有文件内容如下:

[email protected]:~$ cat numbers2
124 127 130
112 142 135
175 158 245
118 231 147
[email protected]:~$ 
[email protected]:~$ 

循环求平均

[email protected]:~$ awk '{
>  
> sum = 0
>  
> i = 1
>  
> while (i < 5)
>  
> {
>  
> sum += $i
>  
> i++
>  
> }
>  
> average = sum / 3
>  
> print "Average:",average
>  
> }' numbers2
Average: 127
Average: 129.667
Average: 192.667
Average: 165.333
[email protected]:~$ 

使用break中断循环

[email protected]:~$ awk '{
 
tot = 0
 
i = 1
 
while (i < 5)
 
{
 
tot += $i
 
if (i == 3)
 
break
 
i++
 
}
 
average = tot / 3
 
print "Average is:",average
 
}' numbers2
Average is: 127
Average is: 129.667
Average is: 192.667
Average is: 165.333
[email protected]:~$ 

For循环

[email protected]:~$ awk '{
 
total = 0
 
for (var = 1; var < 5; var++)
 
{
 
total += $var
 
}
 
avg = total / 3
 
print "Average:",avg
 
}' numbers2
Average: 127
Average: 129.667
Average: 192.667
Average: 165.333
[email protected]:~$ 

格式化打印

%[modifier]control-letter

  • c:将数字作为字串打印
  • d:打印整型数据
  • e:科学计数
  • f:浮点数
  • o:八进制
  • s:字符串
[email protected]:~$  awk 'BEGIN{
>  
> x = 100 * 100
>  
> printf "The result is: %e\n", x
>  
> }'
The result is: 1.000000e+04
[email protected]:~$ 

内嵌函数

数学函数

sin(x) | cos(x) | sqrt(x) | exp(x) | log(x) | rand()

[email protected]:~$ awk 'BEGIN{x=exp(5); print x}'
148.413
[email protected]:~$ 

字符串函数

[email protected]:~$ 
[email protected]:~$ awk 'BEGIN{x = "likegeeks"; print toupper(x)}'
LIKEGEEKS
[email protected]:~$ 

使用自定义函数

[email protected]:~$ awk '
>  
> function myfunc()
>  
> {
>  
> printf "The user %s has home path at %s\n", $1,$6
>  
> }
>  
> BEGIN{FS=":"}
>  
> {
>  
> myfunc()
>  
> }' /etc/passwd
The user root has home path at /root
The user daemon has home path at /usr/sbin
The user bin has home path at /bin
The user sys has home path at /dev
The user sync has home path at /bin
The user games has home path at /usr/games
The user man has home path at /var/cache/man
The user lp has home path at /var/spool/lpd
The user mail has home path at /var/mail
The user news has home path at /var/spool/news
The user uucp has home path at /var/spool/uucp
The user proxy has home path at /bin
The user www-data has home path at /var/www
The user backup has home path at /var/backups
The user list has home path at /var/list
The user irc has home path at /var/run/ircd
The user gnats has home path at /var/lib/gnats
The user nobody has home path at /nonexistent
The user systemd-timesync has home path at /run/systemd
The user systemd-network has home path at /run/systemd/netif
The user systemd-resolve has home path at /run/systemd/resolve
The user syslog has home path at /home/syslog
The user _apt has home path at /nonexistent
The user messagebus has home path at /var/run/dbus
The user uuidd has home path at /run/uuidd
The user lightdm has home path at /var/lib/lightdm
The user whoopsie has home path at /nonexistent
The user avahi-autoipd has home path at /var/lib/avahi-autoipd
The user avahi has home path at /var/run/avahi-daemon
The user dnsmasq has home path at /var/lib/misc
The user colord has home path at /var/lib/colord
The user speech-dispatcher has home path at /var/run/speech-dispatcher
The user hplip has home path at /var/run/hplip
The user kernoops has home path at /
The user pulse has home path at /var/run/pulse
The user rtkit has home path at /proc
The user saned has home path at /var/lib/saned
The user usbmux has home path at /var/lib/usbmux
The user icbc has home path at /home/icbc
The user mysql has home path at /nonexistent
The user cups-pk-helper has home path at /home/cups-pk-helper
The user geoclue has home path at /var/lib/geoclue
The user gdm has home path at /var/lib/gdm3
The user gnome-initial-setup has home path at /run/gnome-initial-setup/
The user redis has home path at /var/lib/redis
The user jenkins has home path at /var/lib/jenkins
The user sshd has home path at /run/sshd
[email protected]:~$ 

原文链接
https://likegeeks.com/awk-command/