PHP解析Apache access_log
使用PHP 来解析、读取apache 日志文件的应用很少,一般都是通过服务器端脚本来统计日志文件,但是在特殊情况下,可能PHP也会需要这个功能,这里我们就分享一下我的脚本给大家,首先Apache的访问日志文件一般存放在:Apache Access Log /var/log/httpd/access
使用PHP 来解析、读取apache 日志文件的应用很少,一般都是通过服务器端脚本来统计日志文件,但是在特殊情况下,可能PHP也会需要这个功能,这里我们就分享一下我的脚本给大家,首先Apache的访问日志文件一般存放在:Apache Access Log – /var/log/httpd/access_log,这个日志文件的格式需要这样:
IP地址 – [服务器日期/时间] "GET /path/to/page HTTP请求类型" HTTP响应码 HTTP发送给客户端字节 引用 客户端浏览器
IP ADDRESS – - Server Date / Time [SPACE] "GET /path/to/page HTTP/Type Request" Success Code Bytes Sent To Client Referer Client Browser
我简单的从服务器端访问日志文件中列出2行数据:
123.125.71.83 – - [30/May/2013:00:26:58 +0800] "GET / HTTP/1.1" 301 593 "-" "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)"
65.55.215.72 – - [26/May/2013:11:35:12 +0800] "GET /robots.txt HTTP/1.1" 200 335
主要目的是统计提供给用户下载的文件是不是成功了,下载了多少字节,什么时间下载的等等。
通过解析,最终会得到的结果:
这里的代码就是我做所的工作:
set_time_limit(0);
error_reporting(E_ALL);
ini_set(‘display_errors’, ‘on’);
$ac_arr = file(DRUPAL_PATH . ‘/cron/logs/access_log’);
foreach($ac_arr as $key => $record) {
$records = preg_split("/([0-9]+\.[0-9]+\.[0-9]+\.[0-9]+)/", $record, -1, PREG_SPLIT_DELIM_CAPTURE);
$ip = $records[1];
$left_str = $records[2];
// parse other fields
preg_match("/\[(.+)\]/", $left_str, $match);
$access_time = $match[1];
$access_unixtime = strtotime($access_time);
$access_date = date(‘Y-m-d’, $access_unixtime);
$yesterday_unixtime = strtotime(date("Y-m-d", time())."-1 day");
$yesterday_date = date(‘Y-m-d’, $yesterday_unixtime);
//定时任务只保留昨天的访问日志
if ($yesterday_date != $access_date) {
continue;
}
$left_str = preg_replace("/^([- ]*)\[(.+)\]/", "", $left_str);
$left_str = trim($left_str);
preg_match("/^\"[A-Z]{3,7} (.[^\"]+)\"/i", $left_str, $match);
$full_path = $match[0];
$http = $match[1];
$link = explode(" ", $http);
$uaid = "";
//统计某个指定访问路径下的下载
if ($link && preg_match("/^\/course\/automation\/MP+/", $link[0])) {
preg_match("/uaid=([0-9]+)/", $link[0], $match);
$uaid = $match[1];
preg_match("/^\/course\/automation\/(MP[0-9]+\.zip)/", $link[0], $match);
$course = $match[1];
}
else {
continue;
}
$left_str = str_replace($full_path, "", $left_str);
$left_arr = explode(" ", trim($left_str));
preg_match("/([0-9]{3})/", $left_arr[0], $match);
$success_code = $match[1];
preg_match("/([0-9]+\b)/", $left_arr[1], $match);
$bytes = $match[1];
$left_str = str_replace($success_code, "", $left_str);
$left_str = str_replace($bytes, "", $left_str);
$left_str = trim($left_str);
preg_match("/^\"(.[^\"]+)\"/", $left_str, $match);
$ref = $match[1];
$left_str = str_replace($match[0], "", $left_str);
preg_match("/\"(.[^\"]+)/", trim($left_str), $match);
$browser = $match[1];
print("
IP: $ip
Access Time: $access_time
Page: $link[0]
Type: $link[1]
Success Code: $success_code
Bytes Transferred: $bytes
Referer: $ref
Browser: $browser
");
//insert into database
//db_query("INSERT INTO {automation_file_download} (uaid, course, download_date, ip, access_time, page, type, success_code, bytes, referer, browser) VALUES (‘%s’, ‘%s’, ‘%s’, ‘%s’, %d, ‘%s’, ‘%s’, %d, %d, ‘%s’, ‘%s’)", $uaid, $course, $access_date, $ip, $access_unixtime, $link[0], $link[1], $success_code, $bytes, $ref, $browser);
}
简单说明一下我的代码:
(...)
Read the rest of PHP解析Apache access_log (44 words)
© lixiphp for LixiPHP, 2013. | Permalink | No comment |
Add to del.icio.us
Post tags: access_log, Apache, crontab, Linux
Feed enhanced by Better Feed from Ozh