怎样用Python设计一个爬虫模拟登陆知乎？

发布网友发布时间：2022-04-21 13:20

共1个回答

热心网友时间：2023-10-23 10:23

知乎现在登录貌似每次都会有密码了，修改如下：
import requests
from xtls.util import BeautifulSoup
INDEX_URL = 'xxx
LOGIN_URL = 'xxx'
CAPTCHA_URL = 'xxx'
def gen_time_stamp():
return str(int(time.time())) + '%03d' % random.randint(0, 999)
def login(username, password, oncaptcha):
session = requests.session()
_xsrf = BeautifulSoup(session.get(INDEX_URL).content).find('input', attrs={'name': '_xsrf'})['value']
data = {
'_xsrf': _xsrf,
'email': username,
'password': password,
'remember_me': 'true',
'captcha': oncaptcha(session.get(CAPTCHA_URL + gen_time_stamp()).content)
}
resp = session.post(LOGIN_URL, data)
if 2 != resp.status_code / 100 or u"登陆成功" not in resp.content:
raise Exception('captcha error.')
return session
其中，oncaptcha为一个回调函数（需要自己实现的），接受的参数为验证码的二进制内容，返回的为验证码内容。
P.S.你可以自己做识别验证码，或者手动输入，其中最简单的oncaptcha为：
def oncaptcha(data):
with open('captcha file save path', 'wb') as fp:
fp.write(data)
return raw_input('captcha : ')