【爬虫学习日志】关于我“从零开始学习爬虫并成为爬虫咸鱼“这档事
【爬虫学习日志】关于我"从零开始学习爬虫并成为爬虫咸鱼"这档事
我作为一个爬虫小白,于 2020-11-21 被一股异能卷入。。。当我眼开眼时,只见一个类似展示游戏属性的面板弹了出来:
【我的初始配置】
① 从《Python入门与实践》习得的基础到不能再基础的 ⌈ \lceil ⌈ Python3 基础知识 ⌋ \rfloor ⌋
② 陈旧而朴素的 ⌈ \lceil ⌈ 算法和数据结构 ⌋ \rfloor ⌋
③ 基础而零碎的 ⌈ \lceil ⌈ web知识 ⌋ \rfloor ⌋
④ 普通得不能再普通的 ⌈ \lceil ⌈ 脑容量 ⌋ \rfloor ⌋
⑤ 比较通畅的 ⌈ \lceil ⌈ 网络 ⌋ \rfloor ⌋
⑥ 健在的 ⌈ \lceil ⌈ 双手 ⌋ \rfloor ⌋
⑦ 勉强能用的 ⌈ \lceil ⌈ 笔记本 ⌋ \rfloor ⌋
⑧ 爆炸牛逼但牛逼不属于我的 ⌈ \lceil ⌈ PyCharm 2019.3.3 ⌋ \rfloor ⌋
⑨ 巨 dio 但 dio 不属于我的 ⌈ \lceil ⌈ Anaconda 3.7 ⌋ \rfloor ⌋
同时在界面的右上角还有一封电子信件:
你好,陌生人。欢迎来到 ⌈ \lceil ⌈ 爬虫世界 ⌋ \rfloor ⌋
当你读到这封信时,我或许已经 GG 了。
虽然我不知道你是谁,但毫无疑问你将会在不久的被卷入巨大的 ⌈ \lceil ⌈ 阴谋 ⌋ \rfloor ⌋ 之中,为此你必须做好万全地准备才有可能幸免于这场灾难。至于你问我具体要怎么做,我只能选择沉默。。。但至少,我从中找到了解除危机的一些 ⌈ \lceil ⌈ 线索 ⌋ \rfloor ⌋,并放在了 ⌈ \lceil ⌈ 某个地方 ⌋ \rfloor ⌋,当然现在的你是难以找到并获取它的 —— 需要采用一定的手段进行 ⌈ \lceil ⌈ 爬取 ⌋ \rfloor ⌋。
时间紧迫!就从 ⌈ \lceil ⌈ 爬取 Python 之禅 ⌋ \rfloor ⌋ 开始你的生涯吧~
GOOD LUCK!
- 初章 HelloSpider -
2020-11-21 小雨
虽然天气不咋地,但为了能回到 ⌈ \lceil ⌈ 现世 ⌋ \rfloor ⌋,我只能赶紧照办了,爬取官网的 Python 之禅:
from lxml import html
import requests as req
url = 'https://www.python.org/dev/peps/pep-0020/'
xpath = '//*[@id="the-zen-of-python"]/pre/text()'
res = req.get(url)
ht = html.fromstring(res.text)
text = ht.xpath(xpath)
print ("Hello,\n" + ''.join(text))
运行后:
Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!
【任务】 ⌈ \lceil ⌈ 爬取 Python 之禅 ⌋ \rfloor ⌋ 成功 ~
…
【提示】你收到了一封新邮件!!
TO BE CONTINUED…
本文地址:https://blog.csdn.net/weixin_42430021/article/details/109908991