去哪网的一个面试题:shell统计日志中各IP 出现的次数
程序员文章站
2022-04-30 08:25:09
...
去哪网的面试:数据量不大的话用awk最方便,但长时间没有用过了,忘记了awk数组的用法。
在这里复习一下。
假设数据格式为:
178.60.128.31 www.google.com.hk 193.192.250.158 www.google.com 210.242.125.35 adwords.google.com 210.242.125.35 accounts.google.com.hk 210.242.125.35 accounts.google.com 210.242.125.35 accounts.l.google.com 64.233.181.49 www.google.com 212.188.10.167 www.google.com 23.239.5.106 www.google.com 64.233.168.41 www.google.com 62.1.38.89 www.google.com 62.1.38.89 chrome.google.com 193.192.250.172 www.google.com 212.188.10.241 www.google.com 37.228.69.57 www.google.com 222.255.120.42 www.google.com 222.255.120.42 www.gstatic.com 212.188.10.167 www.googleapis.com 64.233.181.49 www.googleapis.com 64.233.181.49 fonts.googleapis.com 193.192.250.158 plus.google.com 193.192.250.158 talkgadget.google.com 193.192.250.158 ssl.gstatic.com 193.192.250.158 images-pos-opensocial.googleusercontent.com 193.192.250.158 images1-focus-opensocial.googleusercontent.com 193.192.250.158 images2-focus-opensocial.googleusercontent.com 193.192.250.158 images3-focus-opensocial.googleusercontent.com 193.192.250.158 images4-focus-opensocial.googleusercontent.com 193.192.250.158 images5-focus-opensocial.googleusercontent.com 193.192.250.158 images6-focus-opensocial.googleusercontent.com 193.192.250.158 clients4.google.com 222.255.120.42 google.com 222.255.120.42 apis.google.com 222.255.120.42 clients1.google.com 193.192.250.158 clients2.google.com 193.192.250.158 clients3.google.com 193.192.250.158 clients5.google.com 64.233.181.49 maps.google.com 64.233.181.49 mts0.google.com 64.233.181.49 maps.gstatic.com
awk的统计代码:
awk '{arr[$1]++;}END{for(i in arr){print i , arr[i] }}' test.txt
输出:
[blog@AY1310301904525972ddZ ~]$ awk '{arr[$1]++;}END{for(i in arr){print i , arr[i] }}' test.txt 212.188.10.241 1 64.233.168.41 1 23.239.5.106 1 193.192.250.158 15 178.60.128.31 1 37.228.69.57 1 212.188.10.167 2 193.192.250.172 1 62.1.38.89 2 64.233.181.49 6 210.242.125.35 4 222.255.120.42 5
增加排序:
[blog@AY1310301904525972ddZ ~]$ awk '{arr[$1]++;}END{for(i in arr){print i , arr[i] }}' test.txt | sort -n -k 2 178.60.128.31 1 193.192.250.172 1 212.188.10.241 1 23.239.5.106 1 37.228.69.57 1 64.233.168.41 1 212.188.10.167 2 62.1.38.89 2 210.242.125.35 4 222.255.120.42 5 64.233.181.49 6 193.192.250.158 15
=============对网友:【hattah】 回答的补充===============
测试了两种方法的效率:
理论上sort排序数据量越大,速度越慢。
实测结果:
[blog@AY1310301904525972ddZ ~]$ time awk '{print $1}' test.txt |sort|uniq -c 1380 178.60.128.31 17312 193.192.250.158 1160 193.192.250.172 4640 210.242.125.35 2320 212.188.10.167 1160 212.188.10.241 5734 222.255.120.42 1160 23.239.5.106 1160 37.228.69.57 2320 62.1.38.89 1160 64.233.168.41 6894 64.233.181.49 real 0m0.236s user 0m0.228s sys 0m0.004s
[blog@AY1310301904525972ddZ ~]$ time awk '{arr[$1]++;}END{for(i in arr){print i , arr[i] }}' test.txt | sort -n -k 2 193.192.250.172 1160 212.188.10.241 1160 23.239.5.106 1160 37.228.69.57 1160 64.233.168.41 1160 178.60.128.31 1380 212.188.10.167 2320 62.1.38.89 2320 210.242.125.35 4640 222.255.120.42 5734 64.233.181.49 6894 193.192.250.158 17312 real 0m0.025s user 0m0.022s sys 0m0.001s