欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页  >  IT编程

C# 关于爬取网站数据遇到csrf-token的分析与解决

程序员文章站 2022-03-20 12:19:32
需求 某航空公司物流单信息查询,是一个post请求。通过后台模拟POST HTTP请求发现无法获取页面数据,通过查看航空公司网站后,发现网站使用避免CSRF攻击机制,直接发挥40X错误。 关于CSRF 读者自行百度 网站HTTP请求分析 Headers Form Data 在head里包含了cook ......

需求

某航空公司物流单信息查询,是一个post请求。通过后台模拟post http请求发现无法获取页面数据,通过查看航空公司网站后,发现网站使用避免csrf攻击机制,直接发挥40x错误。

关于csrf

读者自行百度

网站http请求分析 

headers

C# 关于爬取网站数据遇到csrf-token的分析与解决

form data

C# 关于爬取网站数据遇到csrf-token的分析与解决

在head里包含了cookie 与 x-csrf-token  formdata 里包含了_csrf (与head里的值是一样的).

 

C# 关于爬取网站数据遇到csrf-token的分析与解决

C# 关于爬取网站数据遇到csrf-token的分析与解决

这里通过查看该网站的js源代码发现_csrf 来自于网页的head标签里

猜测cookie与 x-csrf-token是有一定的有效期,并且他们共同作用来防御csrf攻击。

解决方案

1,首先请求一下该航空公司的网站,获取cookie与_csrf

2,然后c# 模拟http分别在head和formdata里加入如上参数,发起请求

 

 代码

 

 public class csrftoken
    {
        string cookie;//用于请求的站点的cookie
        list<string> csrfs;//用于请求站点的token的key 以及 value

        public csrftoken(string url)
        {
            //校验传输安全
            if (!string.isnullorwhitespace(url))
            {
                try
                {
                    //设置请求的头信息.获取url的host
                    var _http = new httphelper(url);
                    string cookie;
                    string html = _http.creategethttpresponseforpc(out cookie);
                    this.cookie = cookie;

                    string headregex = @"<meta name=""_csrf.*"" content="".*""/>";

                    matchcollection matches = regex.matches(html, headregex);
                    regex re = new regex("(?<=content=\").*?(?=\")", regexoptions.none);
                    csrfs = new list<string>();
                    foreach (match math in matches)
                    {

                        matchcollection mc = re.matches(math.value);
                        foreach (match ma in mc)
                        {
                            csrfs.add(ma.value);
                        }
                    }

                }
                catch (exception e)
                {

                }
            }
        }

        public string getcookie()
        {
            return cookie;
        }
        public void setcookie(string cookie)
        {
            this.cookie = cookie;
        }
        public list<string> getcsrf_token()
        {
            return csrfs;
        }
    }

httphelper

  public string createposthttpresponse(idictionary<string, string> headers, idictionary<string, string> parameters)
        {
            httpwebrequest request = null;
            //httpsq请求  
            utf8encoding encoding = new system.text.utf8encoding();
            servicepointmanager.servercertificatevalidationcallback = new remotecertificatevalidationcallback(checkvalidationresult);
            request = webrequest.create(_baseipaddress) as httpwebrequest;
            request.protocolversion = httpversion.version10;
            servicepointmanager.securityprotocol = securityprotocoltype.tls12 | securityprotocoltype.tls11;
            request.method = "post";
            request.contenttype = "application/x-www-form-urlencoded";
            // request.contenttype = "application/json";
            request.useragent = defaultuseragent;
            //request.headers.add("x-csrf-token", "bc0cc533-60cc-484a-952d-0b4c1a95672c");
            //request.referer = "https://www.asianacargo.com/tracking/viewtraceairwaybill.do";

            //request.headers.add("origin", "https://www.asianacargo.com");
            //request.headers.add("cookie", "jsessionid=hp21d2dq5foslg4fyw4slwwhb0-sl1cg6jgtj7he41e5f4an_r1p!-435435446!117330181");
            //request.host = "www.asianacargo.com";


            if (!(headers == null || headers.count == 0))
            {

                foreach (string key in headers.keys)
                {
                    request.headers.add(key, headers[key]);
                }

            }


            //如果需要post数据     
            if (!(parameters == null || parameters.count == 0))
            {
                stringbuilder buffer = new stringbuilder();
                int i = 0;
                foreach (string key in parameters.keys)
                {
                    if (i > 0)
                    {
                        buffer.appendformat("&{0}={1}", key, parameters[key]);
                    }
                    else
                    {
                        buffer.appendformat("{0}={1}", key, parameters[key]);
                    }
                    i++;
                }
                byte[] data = encoding.getbytes(buffer.tostring());
                using (stream stream = request.getrequeststream())
                {
                    stream.write(data, 0, data.length);
                }
            }

            httpwebresponse response;

            try
            {
                //获得响应流
                response = (httpwebresponse)request.getresponse();
                stream s = response.getresponsestream();

                streamreader readstream = new streamreader(s, encoding.utf8);
                string sourcecode = readstream.readtoend();
                response.close();
                readstream.close();
                return sourcecode;
            }
            catch (webexception ex)
            {
                response = ex.response as httpwebresponse; return null;
            }

        }

   public string creategethttpresponse(out string cookie)
        {
            httpwebrequest request = null;
            //httpsq请求  
            utf8encoding encoding = new system.text.utf8encoding();
            servicepointmanager.servercertificatevalidationcallback = new remotecertificatevalidationcallback(checkvalidationresult);
            request = webrequest.create(_baseipaddress) as httpwebrequest;
            request.protocolversion = httpversion.version10;
            servicepointmanager.securityprotocol = securityprotocoltype.tls12 | securityprotocoltype.tls11;
            request.method = "get";
            request.contenttype = "application/x-www-form-urlencoded";
            request.useragent = defaultuseragent;

            httpwebresponse response;

            try
            {
                //获得响应流
                response = (httpwebresponse)request.getresponse();

                cookie = response.headers["set-cookie"];
                stream s = response.getresponsestream();

                streamreader readstream = new streamreader(s, encoding.utf8);
                string sourcecode = readstream.readtoend();
                response.close();
                readstream.close();
                return sourcecode;
            }
            catch (webexception ex)
            {
                response = ex.response as httpwebresponse;
                cookie = "";
                return null;
            }

        }

爬取程序

C# 关于爬取网站数据遇到csrf-token的分析与解决

 

 

爬取结果

C# 关于爬取网站数据遇到csrf-token的分析与解决

浏览器结果

C# 关于爬取网站数据遇到csrf-token的分析与解决

注意事项与结论

1,不同的网站,获取cstf的方式不一样,无论怎么做,只要信息传到前台我们都可以有相应的方法来获取。

2,请求时候的http验证可能不一样,测试的某航空公司物流信息的时候,http请求的安全协议是tis12。

 servicepointmanager.securityprotocol = securityprotocoltype.tls12 | securityprotocoltype.tls11; 还有其他参数比如useragent后台可能也会验证

3,基于如上航空公司,发现它的cookie和cstf_token一定时间内不会改变,那么当实际爬取的时候可以考虑缓存cookie以及cstf_token,只有当请求失败的时候,才重新获取