欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页  >  php教程

抓取文章列表

程序员文章站 2022-05-15 12:06:20
...
跳至
<?php
/**
 * 
 * @authors HG (hg0728@qq.com)
 * @date    2015-05-22 17:00:48
 * @version 1.0
 */
header("Content-type:text/html;charset=utf-8");
function getCurl($url) {
		$ch = curl_init();
		curl_setopt($ch, CURLOPT_URL, $url);
		curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
		curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
		curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, false);
		$result = curl_exec($ch);
		curl_close($ch);
		return $result;
}

function preg_list($str){//从curl获得指定内容
	$regex = '/(.*?)/';
	$isMatched = preg_match_all($regex, $str, $matches);

	for ($i=0; $i < $isMatched; $i++) { 
		$str = $matches[1][$i] .' '. $matches[2][$i];
		echo $matches[1][$i];
		file_put_contents('blogs.txt', $str. "\n", FILE_APPEND);

		
	}
}
for ($i=0; $i < 201; $i++) { //翻页抓取
	if($i==0){
		$url = 'http://www.cnblogs.com/';
		$str = getCurl($url);
	}
	else {
		$url = 'http://www.cnblogs.com/sitehome/p/'.$i;
		$str = getCurl($url);
	}
	preg_list($str);
}