欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页  >  后端开发

这个网站,为啥我用file_get_contents抓取不到任何内容?

程序员文章站 2024-02-05 13:38:28
...
http://www.hdwallpapersimages.com/
浏览器显示正常,先使用file_get_contents,抓取内容为空,用ChinaZ的百度蜘蛛和谷歌蜘蛛模拟抓取,还是请求超时,于是我干脆复制我浏览器的header,用file_get_contents抓取,还是抓取为空,这是我的代码:
$opts = array(
            'http'=>array(
                'method'=>"GET",
                'header'=>"Accept:text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8\r\n".
                    "Accept-Encoding:gzip, deflate, sdch\r\n".
                    "Accept-Language:zh-CN,zh;q=0.8,en;q=0.6\r\n".
                    "Cache-Control:max-age=0\r\n".
                    "Cookie:viewed_cookie_policy=yes; __utmt=1; __utma=37938810.875942873.1452954236.1453114091.1453209277.3; __utmb=37938810.30.10.1453209277; __utmc=37938810; __utmz=37938810.1452954236.1.1.utmcsr=bing|utmccn=(organic)|utmcmd=organic|utmctr=hd%20wallpaper; __unam=eb5fde1-1524ad24043-4a580705-62\r\n".
                    "Host:www.hdwallpapersimages.com\r\n".
                    "Proxy-Connection:keep-alive\r\n".
                    "Upgrade-Insecure-Requests:1\r\n".
                    "User-Agent:Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.111 Safari/537.36\r\n"
            )
        );
        $context = stream_context_create($opts);
        echo file_get_contents('http://www.hdwallpapersimages.com', false, $context);

回复内容:

http://www.hdwallpapersimages.com/
浏览器显示正常,先使用file_get_contents,抓取内容为空,用ChinaZ的百度蜘蛛和谷歌蜘蛛模拟抓取,还是请求超时,于是我干脆复制我浏览器的header,用file_get_contents抓取,还是抓取为空,这是我的代码:

$opts = array(
            'http'=>array(
                'method'=>"GET",
                'header'=>"Accept:text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8\r\n".
                    "Accept-Encoding:gzip, deflate, sdch\r\n".
                    "Accept-Language:zh-CN,zh;q=0.8,en;q=0.6\r\n".
                    "Cache-Control:max-age=0\r\n".
                    "Cookie:viewed_cookie_policy=yes; __utmt=1; __utma=37938810.875942873.1452954236.1453114091.1453209277.3; __utmb=37938810.30.10.1453209277; __utmc=37938810; __utmz=37938810.1452954236.1.1.utmcsr=bing|utmccn=(organic)|utmcmd=organic|utmctr=hd%20wallpaper; __unam=eb5fde1-1524ad24043-4a580705-62\r\n".
                    "Host:www.hdwallpapersimages.com\r\n".
                    "Proxy-Connection:keep-alive\r\n".
                    "Upgrade-Insecure-Requests:1\r\n".
                    "User-Agent:Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.111 Safari/537.36\r\n"
            )
        );
        $context = stream_context_create($opts);
        echo file_get_contents('http://www.hdwallpapersimages.com', false, $context);

你抓取的网站打不开么

因为网站我也打不开,哈哈哈哈,在你运行的机子上 直接curl 看看有内容么

相关标签: php