Day02-20210525-Css选择器和request
程序员文章站
2022-07-15 15:39:07
...
总结
1.CSS选择器1
<!-- css负责网页内容的样式和布局 -->
<!--
1.css语法
语法:
选择器{属性名1 :属性值1;属性名2:属性值2;...}
css 层叠样式表(简称样式表)
说明:
选择器 -选中需要设置样式的标签
{} -固定写法
属性名 -决定需要设置哪些样式
属性值 -如果是表示数值大小,熟自制需要单位,一般是px
常用属性:color -文件颜色(颜色值:颜色英文单词\rgb#颜色的16进制值)
font-size -字体大小
backround-color -背景颜色
2.css代码写在哪
1)内联样式表 -将css代码写在标签的style属性中,这个时候不需要写选择器{})
2)内部样式表 -将css代码写在style标签中(style标签既可以放在head中,也可以放在body中)
3)外部样式表 -将css代码写在css文件中,然后再html里面通过link标签导入
-->
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<title></title>
<style type="text/css">
p{
color: aquamarine ;
}
</style>
</head>
<body>
<h1>1.内联样式示例:</h1>
<p style="color: brown;">我是段落1</p>
<p>我是段落2</p>
<a href="www.baidu.com">我是超链接1</a>
</body>
</html>
- CSS选择器2
<!--
css选择器 (元素选择器)
直接将标签作为选择器,选中整个页面所有的指定标签.
例如:p{} -选中所有的p标签,a{}选中所有的a标签
2.id选择器
在标签的id属性值钱加#作为一个选择器,选中id属性值是指定的标签.(id是唯一的)
例如:#p2{} -选中id值为p2的标签,#a - 选中id属性值为a的标签
3.类选择器(class选择器)
在标签的class属性前加.作为一个选择器,选中class属性值是指定值的所有标签
例如: .C1{} -选中class属性值为c1的所有标签,.a{} - 选中class属性值为a的所有标签
.c1.c2 -选中class属性值中同时有c1和c2和标签
4.群组选择器
将多个独立的选择器用逗号隔开作为选择器,选中每个独立选择器选中的所有标签.
例如:p,a{} -选中所有的p标签和a标签
.c,p() -选中所有的class标签是c1的标签和所有的p标签
.c1,#p1,#p2{} -选中所有class是c1,id是p1和id是p2的标签
5.后代选择器
将多个独立的选择器用空格隔开作为一个选择器
例如:p a -选中所有作为p标签的a标签 (选中p标签下面的a标签,a是p的后代)
div #p1 .c1 -选中div下面的id为p1下面class为c1的标签
6.子代选择器
将多个独立的选择器用>隔开作为一个选择器
例如: p>a -选中所有作为p标签的子代的a标签
div.info{} - 选中class为info的div标签
#p1.c1{} -选中class为c1,id为p的标签
p:nth-child(N) -选中第N个p标签
div p:nth_child(N) -选中div里面的第N个p标签
-->
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<title></title>
<style type="text/css">
/* 1.标签选择器 */
</style>
</head>
<body>
<p>我是段落1</p>
<a href="">我是超链接1</a>
<div id="">
<p id="p2">我是段落2</p>
<font>我是font1</font>
<input type="" name="" id="" value="" />
<a href="">我是超链接2</a>
<div id="">
<h2>我是标题1</h2>
<p>我是段落3</p>
<a href="">我是超链接3</a>
<p>我是段落4</p>
</div>
</div>
</body>
</html>
3.request的使用
import requests
# 1.发送请求,获取响应
# 请求地址
url = 'https://movie.douban.com/top250'
# 请求头
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.93 Safari/537.36'
}
response = requests.get(url,headers=headers)
# 2.响应内容
# 1)设置编码方式(乱码的时候才设置)
# response.encoding = '编码方式'
# 2)获取状态码
print(response.status_code)
# 3)获取请求的文本内容(针对URL是网页地址)
print(response.text)
# 4)获取json数据(针对返回的数据是json的时候
print(response.json())
# 5)获取二进制数据(针对是url是二进制的文件地址)
print(response.content)
# 3.响应头
print(response.headers)
4.request请求数据接口
import requests
url = 'http://api.tianapi.com/auto/index?key=c9d408fefd8ed4081a9079d0d6165d43&num=10'
response = requests.get(url)
print(response.text)
result = response.json()
print(result['msg'])
5.图片下载
import requests
# 图片的网络地址
url = 'http://a1.qpic.cn/psc?/V51XXeBM4CRV1i38PJGe1d3fuo1MyosD/bqQfVz5yrrGYSXMvKr.cqaUArM45aGVnIk*nA8qKe9ZjwsLstGKumQ1ok9Chu4fDU2GdbIE1n1CayCYH7O2shnsTAWkBTl0oqUCY.ZYKD7k!/b&ek=1&kp=1&pt=0&bo=OAQ4BDgEOAQRECc!&tl=3&vuin=1033616262&tm=1621933200&sce=60-2-2&rf=viewer_311'
# 请求图片数据
response = requests.get(url)
# 保存图片数据
if response.status_code == 200:
f = open('./a.jpg', 'wb')
f.write(response.content)
f.close()
6.设置cookies
import requests
# url = 'https://www.zhihu.com/signin?next=%2F'
# response = requests.get(url)
# print(response)
#<Response [403]>
url = 'https://zhuanlan.zhihu.com/p/371419303'
response = requests.get(url)
headers = {
'User-Agent':'Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.93 Mobile Safari/537.36',
'cookies':'_zap=34fc6188-6374-48dc-b7b7-49991824ceef; d_c0="AHCdy9QEBxOPTlk_llX9KV4-dq3Tj8yHUAE=|1619668543"; capsion_ticket="2|1:0|10:1620639213|14:capsion_ticket|44:MTI0MDFkNzBjZDZiNDEzODliNjllZDAzMmY2YjBjM2M=|f8903f2612b2cf31fe41f546e1c78b55849814ec720bdb7bf277721307b6cc58"; _9755xjdesxxd_=32; YD00517437729195%3AWM_TID=nYAAr63irEZAAQVBEUJ%2F0ycmg%2Bu9oet4; _xsrf=mZlJirABKaukg75GesAHOsUI2yWxZRiQ; Hm_lvt_98beee57fd2ef70ccdd5ca52b9740c49=1621914252,1621924747,1621929005,1621935996; captcha_session_v2="2|1:0|10:1621935997|18:captcha_session_v2|88:bis4azZGQUhHRVpScEdOOGN1NWRqck44NVFtSEZQT2Zvd1M5VTRSRURqWnhPSFdvbFVLangybzc1Q0JCSW5iKw==|8bed2d41568936bf9dc3fa134a339e43c17c4bc98f9dab95de9d037d6e8d3161"; SESSIONID=BTf6yceqYtim40nWt9yahj3WXFSyssCXS2dEbZwRscj; JOID=UV0VB0lWLueulFRkXFFpuS1OV2RMEn-FnuwxBCkOXIvHpmUMDivoScmTV2VeqFUcJ-OoePku8QdgvI4oMzfBPKQ=; osd=V1ERBkhQIuOvlVJoWFBovyFKVmVKHnuEn-o9ACgPWofDp2QKAi_pSM-fU2RfrlkYJuKudP0v8AFsuI8pNTvFPaU=; __snaker__id=lfOQYtSrbGTvNgb2; gdxidpyhxdE=oI%2FdBh1pMOQ%5CVJUveEjlxLizTnuffA6uw2dX9%5Czi%2FX2TjubQMwkuZ3G%2Bjtqy7IlRGbV2pqYKlG9VbT6mabviSwYiHkdwVJe9QXerenADWBtGp00bpzEDZRS2OtZ87SVKQnGHtrTiWaVGJoOLc%2Bfwv5QH6J%2BXcfZaacYzW6xLIi7T5eQS%3A1621936897942; YD00517437729195%3AWM_NI=UbrjhhOGlX8aoG%2BL5z1ocE2HXfebnvtHBekck1JkGeYXeYjJdHQMd6QjgOTqgqlXMPvjjPWdJU%2BOn5AmR6sUr4s3%2BoD%2FgF%2Ftg0EqGm%2BML9PGgl6t5n2y2V1m6h61atVBYjU%3D; YD00517437729195%3AWM_NIKE=9ca17ae2e6ffcda170e2e6eeb2e54787bdfd8db87f898e8aa7c44f978a9ebaf46db8b497d7d07cb5efbcd0b32af0fea7c3b92a97bebe8beb648d99a4b9bb6bad99e5adb872f3b1ada6b1449cbaab83ed7eb8988e86ea69a9ef868cc66f9b88abbae942a8989faee73e93b6a9b0d77ba5afa4b9d05cb6b6a6b1d6708a8be5b0ed3f83b39ed7f25fb2ecbdd4aa6588f1a9a7ae80a5f08b92c667e99384dad34da5b4fa95b866f2a7feb7b467f6b8fad2e480969e978bd837e2a3; captcha_ticket_v2="2|1:0|10:1621936333|17:captcha_ticket_v2|704:eyJ2YWxpZGF0ZSI6IkNOMzFfbldON3U4cmZ5QzAtWnZwaWR3c3p4bWdmZ1N1bGNKNFBzSzRNWUg5VnRrZ2Zjc2ZrTDl2VEpsRkhvSl85WjVDSXFwdjZqMGVYU0MuYi42bDhWVkNRNVdFSTZMd3pwLmNVN1NNNDZyYjFjUkFpUUY5Zmp2UGtLdnB5ZVpOejl2VTFqVGs0TndhZ3JxWUp6UlVMVnBQanRPS3Z5NVk5ZjdIRFVhUUNsdEFLNlRrbnhwN3RlRklKcm8ycDlBZ1o5YW5hQkdSSEFXdURpZGJlQlFtajBKNTRDT256ZVA1eDh6aUZnZUJyeVcuR21IOUx0TFdnR0JOa3Q5SXlhZi5yeFV5anNwNkdZNXNHUG82U1dMVFVRbnF0Y0xuUGZUQW91ZHhvRVhSTW02blc5NjhNZlVHMmFwNDlPZUQyclNyc0k2YzhEYmNadWlyai1OZGs2RTdHSzBjWFk5UHBkRk5iSUd5a2tQVXFUSnAyVGFlMXpZa0xUWTk2d1dfemF6N2ZWaVJ0aWlMZ0ZMNERaRGI4MDh0ZHNLdFlMOFJwdFBIdDJaci1NT2JtR0h6Mjdtczl1V3NQeXRXTTZtb1pFYWVXYURaWlI3UTRsY2I2d3d6aUpzd2M2ZGxXNXJNbVlvUnIwZ3JVZy5NUVVXUzI4dGxuQVlQc29CTE1nU3IxdnJjMyJ9|30b9295d8d37f71b2b7c1b8889fac796d458a8a5edd5f4804ee232be528482f4"; z_c0="2|1:0|10:1621936360|4:z_c0|92:Mi4xeHMxZUJBQUFBQUFBY0ozTDFBUUhFeVlBQUFCZ0FsVk41eGFhWVFBY2hLcTlHNFFKLUxBSDhDREdaODA4d1ZVQUJ3|bcb21077d9db9722bd3397bdb687ec6e62f201284531ff4cf404af2937465124"; unlock_ticket="AIBCllgAcwsmAAAAYAJVTe_PrGBk2BtEUuw1NWf_KvEqKqGQmmrEsw=="; tst=r; Hm_lpvt_98beee57fd2ef70ccdd5ca52b9740c49=1621936369; KLBRSID=f48cb29c5180c5b0d91ded2e70103232|1621936633|1621935996'
}
response = requests.get(url,headers=headers)
print(response.text)
7.多页面数据
# 如果目标网站以多页的方式来提供数据,爬虫的时候先找不同页面的url之间的规矩
import requests
headers = {
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.212 Safari/537.36'
}
def get_top250(url):
response = requests.get(url, headers=headers)
if response.status_code != 200:
print('请求失败:', response)
return
print(response.text)
print('=============================================================')
if __name__ == '__main__':
for start in range(0, 226, 25):
url = f'https://movie.douban.com/top250?start={start}&filter='
get_top250(url)
练习
import requests
urls = 'https://www.imdb.cn/IMDB250'
headers = {
'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.93 Safari/537.36'
# 'User-Agent':'Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.93 Mobile Safari/537.36'
}
response = requests.get(urls,headers=headers)
# print(response.text)
from re import findall
re_str = r'''target="_blank"><img src="(.*?)" alt='(.*?)\''''
result = findall(re_str,response.text)
# print(result)
for X in result:
urls = X[0]
name = X[1]
response = requests.get(urls)
f = open(f'./pictures/{name}.jpg','wb')
f.write(response.content)
f.close()