欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页  >  IT编程

利用PHP如何统计Nginx日志的User Agent数据

程序员文章站 2022-05-14 14:27:43
前言 即将用到爬虫,于是打算收集一下user agent(ua)数据。接着马上想到自己网站的访问日志不就是现成的优质数据源吗?于是愉快的决定写个脚本统计一下nginx访问...

前言

即将用到爬虫,于是打算收集一下user agent(ua)数据。接着马上想到自己网站的访问日志不就是现成的优质数据源吗?于是愉快的决定写个脚本统计一下nginx访问日志中的ua信息。

这类简单操作,用脚本语言就足够,毫无疑问肯定要用最熟悉的php。打开vim就开撸,十几分钟下来,功能简单的统计脚本就搞定了。

脚本目前有三个功能:

1. 找出所有的ua信息并排序; 2. 统计操作系统数据; 3. 统计浏览器数据。

程序运行截图如下:

1、ua信息

利用PHP如何统计Nginx日志的User Agent数据

2、操作系统信息

利用PHP如何统计Nginx日志的User Agent数据

3、浏览器

利用PHP如何统计Nginx日志的User Agent数据

用脚本统计最近一个月的访问日志,得到以下结果:

  • 搜索引擎爬虫比较频繁,每天有好几千次数据访问;
  • windows仍是份额最大的操作系统,linux桌面依然份额很小;
  • chrome目前是浏览器领域的霸主,其次是firefox,opera已经很小众了。

最后附上php脚本的代码,也可以从本人的github里找到:https://github.com/tlanyan/scripts/blob/master/statua.php

#!/usr/bin/php
<?php
/**
 * @brief stat ua in access log
 *
 * @author tlanyan<tlanyan@hotmail.com>
 * @link http://tlanyan.me
 */
/* vim: set ts=4; set sw=4; set ss=4; set expandtab; */
function getfilelist(string $path) : array {
 return glob(rtrim($path, "/") . "/*access.log*");
}
function statfiles(array $files) : array {
 $stat = [];
 echo php_eol, "start to read files...", php_eol;
 foreach ($files as $file) {
  echo "read file: $file ...", php_eol;
  $contents = getfilecontent($file);
  foreach ($contents as $line) {
   $ua = getua($line);
   if (isset($stat[$ua])) {
    $stat[$ua] += 1;
   } else {
    $stat[$ua] = 1;
   }
  }
 }
 echo "stat all files done!", php_eol, php_eol;
 return $stat;
}
function getfilecontent(string $file) : array {
 if (substr($file, -3, 3) === ".gz") {
  return gzfile($file);
 }
 return file($file);
}
function getua(string $line) : ?string {
 // important! nginx log format determins the ua location in the line!
 // you may have to refactor following codes to get the right result
 // ua starts from fifth double quote 
 $count = 0; $offset = 0;
 while ($count < 5) {
  $pos = strpos($line, '"', $offset);
  if ($pos === false) {
   echo "error! unknown line: $line", php_eol;
   return null;
  }
  $count ++;
  $offset = $pos + 1;
 }
 $end = strpos($line, '"', $offset);
 return substr($line, $offset, $end - $offset);
}
function usage() {
 echo "usage: php statua.php [option] [dir]", php_eol;
 echo " options:", php_eol;
 echo " -h: show this help", php_eol;
 echo " -v: verbose mode", php_eol;
 echo "-n num: ua list number", php_eol;
 echo " dir: directory to the log files", php_eol;
 echo php_eol;
}
function filterua(array& $stat, array $uafilters) {
 $filtercount = 0;
 foreach ($uafilters as $filter) {
  foreach ($stat as $ua => $count) {
   if (stripos($ua, $filter) !== false) {
    $filtercount += $count;
    unset($stat[$ua]);
   }
  }
 }
 echo "filter $filtercount records!", php_eol;
}
function printcount(array $stat) {
 $sum = array_sum($stat);
 foreach ($stat as $key => $count) {
  echo $key, " : ", $count, ", percent: ", sprintf("%.2f", 100*$count/$sum), php_eol;
 }
}
function statos(array $uas) : array {
 global $debug;
 echo php_eol, "stat os...", php_eol;
 $os = ["windows", "macos", "linux", "android", "ios", "other"];
 $stat = array_fill_keys($os, 0);
 foreach ($uas as $key => $count) {
  if (strpos($key, "windows") !== false) {
   $stat["windows"] += $count;
  } else if (strpos($key, "macintosh") !== false) {
   $stat["macos"] += $count;
  // must deal android first, then linux
  } else if (strpos($key, "android") !== false) {
   $stat["android"] += $count;
  } else if (strpos($key, "linux") !== false) {
   $stat["linux"] += $count;
  } else if (strpos($key, "iphone") !== false || strpos($key, "ios") !== false || strpos($key, "like mac os") !== false || strpos($key, "darwin") !== false) {
   $stat["ios"] += $count;
  } else {
   if ($debug) {
    echo "other: $key, count: $count", php_eol;
   }
   $stat["other"] += $count;
  }
 }
 return $stat;
}
function statbrowser(array $uas) : array {
 global $debug;
 echo php_eol, "stat brwoser...", php_eol;
 $browsers = ["chrome", "firefox", "ie", "safari", "edge", "opera", "other"];
 $stat = array_fill_keys($browsers, 0);
 foreach ($uas as $key => $count) {
  if (strpos($key, "msie") !== false) {
   $stat["ie"] += $count;
  } else if (strpos($key, "edge") !== false) {
   $stat["edge"] += $count;
  } else if (strpos($key, "firefox") !== false) {
   $stat["firefox"] += $count;
  } else if (strpos($key, "opr") !== false) {
   $stat["opera"] += $count;
  // first chrome, then safari
  } else if (strpos($key, "chrome") !== false) {
   $stat["chrome"] += $count;
  } else if (strpos($key, "safari") !== false) {
   $stat["safari"] += $count;
  } else {
   if ($debug) {
    echo "other: $key, count: $count", php_eol;
   }
   $stat["other"] += $count;
  }
 }
 return $stat;
}
function parsecmd() {
 global $debug, $num, $path, $argc, $argv;
 $optind = null;
 $options = getopt("hvn:", [], $optind);
 if ($argc > 2 && empty($options)) {
  usage();
  exit(1);
 }
 if (isset($options['h'])) {
  usage();
  exit(0);
 }
 if (isset($options['v'])) {
  $debug = true;
 }
 if (isset($options['n'])) {
  $num = intval($options['n']);
  if ($num <= 0) {
   $num = 10;
  }
 }
 if ($argc === 2 && empty($options)) {
  $path = $argv[1];
 }
 if ($argc > $optind) {
  $path = $argv[$optind];
 }
 if (!is_dir($path)) {
  echo "invalid directory: $path", php_eol;
  exit(1);
 }
 if ($debug) {
  echo "num: $num", php_eol;
  echo "verbose: ", var_export($debug, true), php_eol;
  echo "path: $path", php_eol;
 }
}
if (version_compare(php_version, "7.1") < 0) {
 exit("scripts require php >=7.1");
}
$path = ".";
$debug = false;
$num = 10;
$uafilters = [
 "spider",
 "bot",
 "wget",
 "curl",
];
parsecmd();
$files = getfilelist($path);
if (empty($files)) {
 echo '"' . realpath($path) . '" does not contain access log files.', php_eol;
 exit(0);
}
$allua = statfiles($files);
if (empty($allua)) {
 echo "no data", php_eol;
 exit(0);
}
filterua($allua, $uafilters);
// sort array with count
uasort($allua, function ($a, $b) {
 return $b - $a;
});
if ($debug) {
 print_r($allua);
}
echo php_eol, "---- top $num ua ----", php_eol;
printcount(array_slice($allua, 0, $num));
echo "-------------------", php_eol;
$os = statos($allua);
echo php_eol, "os count:", php_eol;
printcount($os);
$browser = statbrowser($allua);
echo php_eol, "browser count:", php_eol;
printcount($browser);

总结

以上就是这篇文章的全部内容了,希望本文的内容对大家的学习或者工作具有一定的参考学习价值,如果有疑问大家可以留言交流,谢谢大家对的支持。