利用PHP如何统计Nginx日志的User Agent数据
程序员文章站
2022-10-29 20:11:17
前言
即将用到爬虫,于是打算收集一下user agent(ua)数据。接着马上想到自己网站的访问日志不就是现成的优质数据源吗?于是愉快的决定写个脚本统计一下nginx访问...
前言
即将用到爬虫,于是打算收集一下user agent(ua)数据。接着马上想到自己网站的访问日志不就是现成的优质数据源吗?于是愉快的决定写个脚本统计一下nginx访问日志中的ua信息。
这类简单操作,用脚本语言就足够,毫无疑问肯定要用最熟悉的php。打开vim就开撸,十几分钟下来,功能简单的统计脚本就搞定了。
脚本目前有三个功能:
1. 找出所有的ua信息并排序; 2. 统计操作系统数据; 3. 统计浏览器数据。
程序运行截图如下:
1、ua信息
2、操作系统信息
3、浏览器
用脚本统计最近一个月的访问日志,得到以下结果:
- 搜索引擎爬虫比较频繁,每天有好几千次数据访问;
- windows仍是份额最大的操作系统,linux桌面依然份额很小;
- chrome目前是浏览器领域的霸主,其次是firefox,opera已经很小众了。
最后附上php脚本的代码,也可以从本人的github里找到:https://github.com/tlanyan/scripts/blob/master/statua.php
#!/usr/bin/php <?php /** * @brief stat ua in access log * * @author tlanyan<tlanyan@hotmail.com> * @link http://tlanyan.me */ /* vim: set ts=4; set sw=4; set ss=4; set expandtab; */ function getfilelist(string $path) : array { return glob(rtrim($path, "/") . "/*access.log*"); } function statfiles(array $files) : array { $stat = []; echo php_eol, "start to read files...", php_eol; foreach ($files as $file) { echo "read file: $file ...", php_eol; $contents = getfilecontent($file); foreach ($contents as $line) { $ua = getua($line); if (isset($stat[$ua])) { $stat[$ua] += 1; } else { $stat[$ua] = 1; } } } echo "stat all files done!", php_eol, php_eol; return $stat; } function getfilecontent(string $file) : array { if (substr($file, -3, 3) === ".gz") { return gzfile($file); } return file($file); } function getua(string $line) : ?string { // important! nginx log format determins the ua location in the line! // you may have to refactor following codes to get the right result // ua starts from fifth double quote $count = 0; $offset = 0; while ($count < 5) { $pos = strpos($line, '"', $offset); if ($pos === false) { echo "error! unknown line: $line", php_eol; return null; } $count ++; $offset = $pos + 1; } $end = strpos($line, '"', $offset); return substr($line, $offset, $end - $offset); } function usage() { echo "usage: php statua.php [option] [dir]", php_eol; echo " options:", php_eol; echo " -h: show this help", php_eol; echo " -v: verbose mode", php_eol; echo "-n num: ua list number", php_eol; echo " dir: directory to the log files", php_eol; echo php_eol; } function filterua(array& $stat, array $uafilters) { $filtercount = 0; foreach ($uafilters as $filter) { foreach ($stat as $ua => $count) { if (stripos($ua, $filter) !== false) { $filtercount += $count; unset($stat[$ua]); } } } echo "filter $filtercount records!", php_eol; } function printcount(array $stat) { $sum = array_sum($stat); foreach ($stat as $key => $count) { echo $key, " : ", $count, ", percent: ", sprintf("%.2f", 100*$count/$sum), php_eol; } } function statos(array $uas) : array { global $debug; echo php_eol, "stat os...", php_eol; $os = ["windows", "macos", "linux", "android", "ios", "other"]; $stat = array_fill_keys($os, 0); foreach ($uas as $key => $count) { if (strpos($key, "windows") !== false) { $stat["windows"] += $count; } else if (strpos($key, "macintosh") !== false) { $stat["macos"] += $count; // must deal android first, then linux } else if (strpos($key, "android") !== false) { $stat["android"] += $count; } else if (strpos($key, "linux") !== false) { $stat["linux"] += $count; } else if (strpos($key, "iphone") !== false || strpos($key, "ios") !== false || strpos($key, "like mac os") !== false || strpos($key, "darwin") !== false) { $stat["ios"] += $count; } else { if ($debug) { echo "other: $key, count: $count", php_eol; } $stat["other"] += $count; } } return $stat; } function statbrowser(array $uas) : array { global $debug; echo php_eol, "stat brwoser...", php_eol; $browsers = ["chrome", "firefox", "ie", "safari", "edge", "opera", "other"]; $stat = array_fill_keys($browsers, 0); foreach ($uas as $key => $count) { if (strpos($key, "msie") !== false) { $stat["ie"] += $count; } else if (strpos($key, "edge") !== false) { $stat["edge"] += $count; } else if (strpos($key, "firefox") !== false) { $stat["firefox"] += $count; } else if (strpos($key, "opr") !== false) { $stat["opera"] += $count; // first chrome, then safari } else if (strpos($key, "chrome") !== false) { $stat["chrome"] += $count; } else if (strpos($key, "safari") !== false) { $stat["safari"] += $count; } else { if ($debug) { echo "other: $key, count: $count", php_eol; } $stat["other"] += $count; } } return $stat; } function parsecmd() { global $debug, $num, $path, $argc, $argv; $optind = null; $options = getopt("hvn:", [], $optind); if ($argc > 2 && empty($options)) { usage(); exit(1); } if (isset($options['h'])) { usage(); exit(0); } if (isset($options['v'])) { $debug = true; } if (isset($options['n'])) { $num = intval($options['n']); if ($num <= 0) { $num = 10; } } if ($argc === 2 && empty($options)) { $path = $argv[1]; } if ($argc > $optind) { $path = $argv[$optind]; } if (!is_dir($path)) { echo "invalid directory: $path", php_eol; exit(1); } if ($debug) { echo "num: $num", php_eol; echo "verbose: ", var_export($debug, true), php_eol; echo "path: $path", php_eol; } } if (version_compare(php_version, "7.1") < 0) { exit("scripts require php >=7.1"); } $path = "."; $debug = false; $num = 10; $uafilters = [ "spider", "bot", "wget", "curl", ]; parsecmd(); $files = getfilelist($path); if (empty($files)) { echo '"' . realpath($path) . '" does not contain access log files.', php_eol; exit(0); } $allua = statfiles($files); if (empty($allua)) { echo "no data", php_eol; exit(0); } filterua($allua, $uafilters); // sort array with count uasort($allua, function ($a, $b) { return $b - $a; }); if ($debug) { print_r($allua); } echo php_eol, "---- top $num ua ----", php_eol; printcount(array_slice($allua, 0, $num)); echo "-------------------", php_eol; $os = statos($allua); echo php_eol, "os count:", php_eol; printcount($os); $browser = statbrowser($allua); echo php_eol, "browser count:", php_eol; printcount($browser);
总结
以上就是这篇文章的全部内容了,希望本文的内容对大家的学习或者工作具有一定的参考学习价值,如果有疑问大家可以留言交流,谢谢大家对的支持。
上一篇: 几个方法可以让女生减脂又增肌
下一篇: 分手后不联系你的男人自己该去主动联系吗?