欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页

Day02-20210525-Css选择器和request

程序员文章站 2022-07-15 15:39:07
...

总结

1.CSS选择器1

<!-- css负责网页内容的样式和布局 -->
<!-- 
 1.css语法
 语法:
 选择器{属性名1 :属性值1;属性名2:属性值2;...}
 css 层叠样式表(简称样式表)
 
 说明:
 选择器	-选中需要设置样式的标签
 {} 	-固定写法
 属性名	-决定需要设置哪些样式
 属性值	-如果是表示数值大小,熟自制需要单位,一般是px
 
 常用属性:color 	-文件颜色(颜色值:颜色英文单词\rgb#颜色的16进制值)
		 font-size 	-字体大小
		 backround-color 	-背景颜色
 2.css代码写在哪
 1)内联样式表	-将css代码写在标签的style属性中,这个时候不需要写选择器{})
 2)内部样式表	-将css代码写在style标签中(style标签既可以放在head中,也可以放在body中)
 3)外部样式表	-将css代码写在css文件中,然后再html里面通过link标签导入
 -->
<!DOCTYPE html>
<html>
	<head>
		<meta charset="utf-8">
		<title></title>
			<style type="text/css">
				p{
					color: aquamarine ;
				}
				
			</style>
	</head>
	<body>
		<h1>1.内联样式示例:</h1>
		<p style="color: brown;">我是段落1</p>
		<p>我是段落2</p>
		<a href="www.baidu.com">我是超链接1</a>
	</body>
</html>

  1. CSS选择器2
<!-- 
css选择器 (元素选择器)
直接将标签作为选择器,选中整个页面所有的指定标签.
例如:p{} -选中所有的p标签,a{}选中所有的a标签

2.id选择器
在标签的id属性值钱加#作为一个选择器,选中id属性值是指定的标签.(id是唯一的)
例如:#p2{}  -选中id值为p2的标签,#a - 选中id属性值为a的标签

3.类选择器(class选择器)
在标签的class属性前加.作为一个选择器,选中class属性值是指定值的所有标签
例如: .C1{}  -选中class属性值为c1的所有标签,.a{} - 选中class属性值为a的所有标签
	 .c1.c2 -选中class属性值中同时有c1和c2和标签

4.群组选择器
将多个独立的选择器用逗号隔开作为选择器,选中每个独立选择器选中的所有标签.
例如:p,a{}   -选中所有的p标签和a标签
	.c,p()	 -选中所有的class标签是c1的标签和所有的p标签
	.c1,#p1,#p2{}  -选中所有class是c1,id是p1和id是p2的标签
	
5.后代选择器
将多个独立的选择器用空格隔开作为一个选择器
例如:p a -选中所有作为p标签的a标签  (选中p标签下面的a标签,a是p的后代)
	div #p1 .c1 -选中div下面的id为p1下面class为c1的标签
	
6.子代选择器
将多个独立的选择器用>隔开作为一个选择器
例如: p>a  -选中所有作为p标签的子代的a标签

div.info{}   -	选中class为info的div标签
#p1.c1{}     -选中class为c1,id为p的标签
p:nth-child(N)   -选中第N个p标签
div p:nth_child(N)   -选中div里面的第N个p标签
 -->
<!DOCTYPE html>
<html>
	<head>
		<meta charset="utf-8">
		<title></title>
		<style type="text/css">
			/* 1.标签选择器 */
		</style>
	</head>
	<body>
		<p>我是段落1</p>
		        <a href="">我是超链接1</a>
		        <div id="">
		            <p id="p2">我是段落2</p>
		            <font>我是font1</font>
		            <input type="" name="" id="" value="" />
		            <a href="">我是超链接2</a>
		            
		            <div id="">
		               <h2>我是标题1</h2> 
		               <p>我是段落3</p>
		               <a href="">我是超链接3</a>
		               <p>我是段落4</p>
		            </div>
		            
		        </div>
	</body>
</html>

3.request的使用

import requests
# 1.发送请求,获取响应
# 请求地址
url = 'https://movie.douban.com/top250'
# 请求头
headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.93 Safari/537.36'
}
response = requests.get(url,headers=headers)
# 2.响应内容
# 1)设置编码方式(乱码的时候才设置)
# response.encoding = '编码方式'

# 2)获取状态码
print(response.status_code)

# 3)获取请求的文本内容(针对URL是网页地址)
print(response.text)

# 4)获取json数据(针对返回的数据是json的时候
print(response.json())

# 5)获取二进制数据(针对是url是二进制的文件地址)
print(response.content)

#  3.响应头
print(response.headers)

4.request请求数据接口

import requests
url = 'http://api.tianapi.com/auto/index?key=c9d408fefd8ed4081a9079d0d6165d43&num=10'
response = requests.get(url)
print(response.text)
result = response.json()
print(result['msg'])

5.图片下载

import requests

# 图片的网络地址
url = 'http://a1.qpic.cn/psc?/V51XXeBM4CRV1i38PJGe1d3fuo1MyosD/bqQfVz5yrrGYSXMvKr.cqaUArM45aGVnIk*nA8qKe9ZjwsLstGKumQ1ok9Chu4fDU2GdbIE1n1CayCYH7O2shnsTAWkBTl0oqUCY.ZYKD7k!/b&ek=1&kp=1&pt=0&bo=OAQ4BDgEOAQRECc!&tl=3&vuin=1033616262&tm=1621933200&sce=60-2-2&rf=viewer_311'
# 请求图片数据
response = requests.get(url)

# 保存图片数据
if response.status_code == 200:
    f = open('./a.jpg', 'wb')
    f.write(response.content)
    f.close()

6.设置cookies

import requests
# url = 'https://www.zhihu.com/signin?next=%2F'
# response = requests.get(url)
# print(response)
#<Response [403]>

url = 'https://zhuanlan.zhihu.com/p/371419303'
response = requests.get(url)
headers = {
    'User-Agent':'Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.93 Mobile Safari/537.36',
    'cookies':'_zap=34fc6188-6374-48dc-b7b7-49991824ceef; d_c0="AHCdy9QEBxOPTlk_llX9KV4-dq3Tj8yHUAE=|1619668543"; capsion_ticket="2|1:0|10:1620639213|14:capsion_ticket|44:MTI0MDFkNzBjZDZiNDEzODliNjllZDAzMmY2YjBjM2M=|f8903f2612b2cf31fe41f546e1c78b55849814ec720bdb7bf277721307b6cc58"; _9755xjdesxxd_=32; YD00517437729195%3AWM_TID=nYAAr63irEZAAQVBEUJ%2F0ycmg%2Bu9oet4; _xsrf=mZlJirABKaukg75GesAHOsUI2yWxZRiQ; Hm_lvt_98beee57fd2ef70ccdd5ca52b9740c49=1621914252,1621924747,1621929005,1621935996; captcha_session_v2="2|1:0|10:1621935997|18:captcha_session_v2|88:bis4azZGQUhHRVpScEdOOGN1NWRqck44NVFtSEZQT2Zvd1M5VTRSRURqWnhPSFdvbFVLangybzc1Q0JCSW5iKw==|8bed2d41568936bf9dc3fa134a339e43c17c4bc98f9dab95de9d037d6e8d3161"; SESSIONID=BTf6yceqYtim40nWt9yahj3WXFSyssCXS2dEbZwRscj; JOID=UV0VB0lWLueulFRkXFFpuS1OV2RMEn-FnuwxBCkOXIvHpmUMDivoScmTV2VeqFUcJ-OoePku8QdgvI4oMzfBPKQ=; osd=V1ERBkhQIuOvlVJoWFBovyFKVmVKHnuEn-o9ACgPWofDp2QKAi_pSM-fU2RfrlkYJuKudP0v8AFsuI8pNTvFPaU=; __snaker__id=lfOQYtSrbGTvNgb2; gdxidpyhxdE=oI%2FdBh1pMOQ%5CVJUveEjlxLizTnuffA6uw2dX9%5Czi%2FX2TjubQMwkuZ3G%2Bjtqy7IlRGbV2pqYKlG9VbT6mabviSwYiHkdwVJe9QXerenADWBtGp00bpzEDZRS2OtZ87SVKQnGHtrTiWaVGJoOLc%2Bfwv5QH6J%2BXcfZaacYzW6xLIi7T5eQS%3A1621936897942; YD00517437729195%3AWM_NI=UbrjhhOGlX8aoG%2BL5z1ocE2HXfebnvtHBekck1JkGeYXeYjJdHQMd6QjgOTqgqlXMPvjjPWdJU%2BOn5AmR6sUr4s3%2BoD%2FgF%2Ftg0EqGm%2BML9PGgl6t5n2y2V1m6h61atVBYjU%3D; YD00517437729195%3AWM_NIKE=9ca17ae2e6ffcda170e2e6eeb2e54787bdfd8db87f898e8aa7c44f978a9ebaf46db8b497d7d07cb5efbcd0b32af0fea7c3b92a97bebe8beb648d99a4b9bb6bad99e5adb872f3b1ada6b1449cbaab83ed7eb8988e86ea69a9ef868cc66f9b88abbae942a8989faee73e93b6a9b0d77ba5afa4b9d05cb6b6a6b1d6708a8be5b0ed3f83b39ed7f25fb2ecbdd4aa6588f1a9a7ae80a5f08b92c667e99384dad34da5b4fa95b866f2a7feb7b467f6b8fad2e480969e978bd837e2a3; captcha_ticket_v2="2|1:0|10:1621936333|17:captcha_ticket_v2|704:eyJ2YWxpZGF0ZSI6IkNOMzFfbldON3U4cmZ5QzAtWnZwaWR3c3p4bWdmZ1N1bGNKNFBzSzRNWUg5VnRrZ2Zjc2ZrTDl2VEpsRkhvSl85WjVDSXFwdjZqMGVYU0MuYi42bDhWVkNRNVdFSTZMd3pwLmNVN1NNNDZyYjFjUkFpUUY5Zmp2UGtLdnB5ZVpOejl2VTFqVGs0TndhZ3JxWUp6UlVMVnBQanRPS3Z5NVk5ZjdIRFVhUUNsdEFLNlRrbnhwN3RlRklKcm8ycDlBZ1o5YW5hQkdSSEFXdURpZGJlQlFtajBKNTRDT256ZVA1eDh6aUZnZUJyeVcuR21IOUx0TFdnR0JOa3Q5SXlhZi5yeFV5anNwNkdZNXNHUG82U1dMVFVRbnF0Y0xuUGZUQW91ZHhvRVhSTW02blc5NjhNZlVHMmFwNDlPZUQyclNyc0k2YzhEYmNadWlyai1OZGs2RTdHSzBjWFk5UHBkRk5iSUd5a2tQVXFUSnAyVGFlMXpZa0xUWTk2d1dfemF6N2ZWaVJ0aWlMZ0ZMNERaRGI4MDh0ZHNLdFlMOFJwdFBIdDJaci1NT2JtR0h6Mjdtczl1V3NQeXRXTTZtb1pFYWVXYURaWlI3UTRsY2I2d3d6aUpzd2M2ZGxXNXJNbVlvUnIwZ3JVZy5NUVVXUzI4dGxuQVlQc29CTE1nU3IxdnJjMyJ9|30b9295d8d37f71b2b7c1b8889fac796d458a8a5edd5f4804ee232be528482f4"; z_c0="2|1:0|10:1621936360|4:z_c0|92:Mi4xeHMxZUJBQUFBQUFBY0ozTDFBUUhFeVlBQUFCZ0FsVk41eGFhWVFBY2hLcTlHNFFKLUxBSDhDREdaODA4d1ZVQUJ3|bcb21077d9db9722bd3397bdb687ec6e62f201284531ff4cf404af2937465124"; unlock_ticket="AIBCllgAcwsmAAAAYAJVTe_PrGBk2BtEUuw1NWf_KvEqKqGQmmrEsw=="; tst=r; Hm_lpvt_98beee57fd2ef70ccdd5ca52b9740c49=1621936369; KLBRSID=f48cb29c5180c5b0d91ded2e70103232|1621936633|1621935996'
}
response = requests.get(url,headers=headers)
print(response.text)

7.多页面数据

# 如果目标网站以多页的方式来提供数据,爬虫的时候先找不同页面的url之间的规矩
import requests
headers = {
    'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.212 Safari/537.36'
}

def get_top250(url):
    response = requests.get(url, headers=headers)
    if response.status_code != 200:
        print('请求失败:', response)
        return
    print(response.text)
    print('=============================================================')


if __name__ == '__main__':
    for start in range(0, 226, 25):
        url = f'https://movie.douban.com/top250?start={start}&filter='
        get_top250(url)

练习

import requests
urls = 'https://www.imdb.cn/IMDB250'
headers = {
    'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.93 Safari/537.36'
    # 'User-Agent':'Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.93 Mobile Safari/537.36'
}
response = requests.get(urls,headers=headers)
# print(response.text)
from re import findall
re_str = r'''target="_blank"><img src="(.*?)" alt='(.*?)\''''
result = findall(re_str,response.text)
# print(result)
for X in result:
    urls = X[0]
    name = X[1]
    response = requests.get(urls)
    f = open(f'./pictures/{name}.jpg','wb')
    f.write(response.content)
    f.close()
相关标签: python