PHP代码实现爬虫记录——超管用
程序员文章站
2022-06-20 23:39:01
实现爬虫记录本文从创建crawler 数据库,robot.php记录来访的爬虫从而将信息插入数据库crawler,然后从数据库中就可以获得所有的爬虫信息。实现代码具体如下:...
实现爬虫记录本文从创建crawler 数据库,robot.php记录来访的爬虫从而将信息插入数据库crawler,然后从数据库中就可以获得所有的爬虫信息。实现代码具体如下:
数据库设计
create table crawler ( crawler_id bigint() unsigned not null auto_increment primary key, crawler_category varchar() not null, crawler_date datetime not null default '-- ::', crawler_url varchar() not null, crawler_ip varchar() not null )default charset=utf;
以下文件 robot.php 记录来访的爬虫,并将信息写入数据库:
<?php $servername = $_server["server_name"] ; $serverport = $_server["server_port"] ; $scriptname = $_server["script_name"] ; $querystring = $_server["query_string"]; $serverip = $_server["remote_addr"] ; $url="http://".$servername; if ($serverport != "") { $url = $url.":".$serverport ; } $url=$url.$scriptname; if ($querystring !="") { $url=$url."?".$querystring; } $getlocationurl=$url ; $agent = $_server["http_user_agent"]; $agent=strtolower($agent); $bot =""; if (strpos($agent,"bot")>-) { $bot = "other crawler"; } if (strpos($agent,"googlebot")>-) { $bot = "google"; } if (strpos($agent,"mediapartners-google")>-) { $bot = "google adsense"; } if (strpos($agent,"baiduspider")>-) { $bot = "baidu"; } if (strpos($agent,"sogou spider")>-) { $bot = "sogou"; } if (strpos($agent,"yahoo")>-) { $bot = "yahoo!"; } if (strpos($agent,"msn")>-) { $bot = "msn"; } if (strpos($agent,"ia_archiver")>-) { $bot = "alexa"; } if (strpos($agent,"iaarchiver")>-) { $bot = "alexa"; } if (strpos($agent,"sohu")>-) { $bot = "sohu"; } if (strpos($agent,"sqworm")>-) { $bot = "aol"; } if (strpos($agent,"yodaobot")>-) { $bot = "yodao"; } if (strpos($agent,"iaskspider")>-) { $bot = "iask"; } require("./dbinfo.php"); date_default_timezone_set('prc'); $shijian=date("y-m-d h:i:s", time()); // 连接到 mysql 服务器 $connection = mysql_connect ($host, $username, $password); if (!$connection) { die('not connected : ' . mysql_error()); } // 设置活动的 mysql 数据库 $db_selected = mysql_select_db($database, $connection); if (!$db_selected) { die ('can\'t use db : ' . mysql_error()); } // 向数据库插入数据 $query = "insert into crawler (crawler_category, crawler_date, crawler_url, crawler_ip) values ('$bot','$shijian','$getlocationurl','$serverip')"; $result = mysql_query($query); if (!$result) { die('invalid query: ' . mysql_error()); } ?>
成功了,现在访问数据库即可得知什么时候哪里的蜘蛛爬过你的什么页面。
view sourceprint? <?php include './robot.php'; include '../library/page.class.php'; $page = $_get['page']; include '../library/conn_new.php'; $count = $mysql -> num_rows($mysql -> query("select * from crawler")); $pages = new pageclass($count,,$_get['page'],$_server['php_self'].'?page={page}'); $sql = "select * from crawler order by "; $sql .= "crawler_date desc limit ".$pages -> page_limit.",".$pages -> myde_size; $result = $mysql -> query($sql); ?> <table width=""> <thead> <tr> <td bgcolor="#ccffff"></td> <td bgcolor="#ccffff" align="center" style="color:#">爬虫访问时间</td> <td bgcolor="#ccffff" align="center" style="color:#">爬虫分类</td> <td bgcolor="#ccffff" align="center" style="color:#">爬虫ip</td> <td bgcolor="#ccffff" align="center" style="color:#">爬虫访问的url</td> </tr> </thead> <?php while($myrow = $mysql -> fetch_array($result)){ ?> <tr> <td width=""><img src="../images/topicnew.gif" /></td> <td width="" style="font-family:georgia"><? echo $myrow["crawler_date"] ?></td> <td width="" style="color:#fa"><? echo $myrow["crawler_category"] ?></td> <td width=""><? echo $myrow["crawler_ip"] ?></td> <td width=""><? echo $myrow["crawler_url"] ?></td> </tr> <?php } ?> </table> <?php echo $pages -> myde_write(); ?>
以上代码就是php代码实现爬虫记录——超管用的全部内容,希望对大家有所帮助。
下一篇: 新婚感觉如何