欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页

爬虫利器,chrome headless ,无头浏览器Puppeteer

程序员文章站 2022-05-27 09:02:12
...

之前使用phantomjs爬取京东搜索页数据,发现无法爬取后三十条数据,原因是京东数据动态加载的原因,后发现一款.net爬虫神器Puppeteer

上代码,十分简单:
首先引用headless, chrome .net api
爬虫利器,chrome headless ,无头浏览器Puppeteer

        //Enabled headless option
        var launchOptions = new LaunchOptions { Headless = true };
        //Starting headless browser
        var browser = await Puppeteer.LaunchAsync(launchOptions);

        //New tab page
        var page = await browser.NewPageAsync();
        //Request URL to get the page

        string url;
        string key = HttpUtility.UrlEncode("水果");
        url = "https://search.jd.com/Search?keyword=" + key + "&enc=utf-8$page=1";

        await page.GoToAsync(url);

        await page.Keyboard.PressAsync("Space");
        await page.Keyboard.PressAsync("Space");
        await page.Keyboard.PressAsync("Space");
        await page.Keyboard.PressAsync("Space");
        await page.Keyboard.PressAsync("Space");
        await page.Keyboard.PressAsync("Space");
        await page.Keyboard.PressAsync("Space");
        await page.Keyboard.PressAsync("Space");
        await page.Keyboard.PressAsync("Space");
        await page.Keyboard.PressAsync("Space");
        await page.Keyboard.PressAsync("Space");
        await page.Keyboard.PressAsync("Space");
        //NavigationOptions nav = new NavigationOptions();
        //nav.WaitUntil = WaitUntilNavigation.DOMContentLoaded;

        await page.WaitForSelectorAsync(".p-num");
        await page.ScreenshotAsync("example.png");
        
        //Get and return the HTML content of the page
        var htmlString =await page.GetContentAsync();

        #region Dispose resources
        //Close tab page
        await page.CloseAsync();

        //Close headless browser, all pages will be closed here.
        await browser.CloseAsync();
        #endregion

        return htmlString;

改利器使用异步,故很实用
demo:https://download.csdn.net/download/v18770350613/11229549