python使用requests库爬取拉勾网招聘信息的实现

站长资源 2026/4/21 佚名

7 1538 7

相思资源网 Design By www.200059.com

按F12打开开发者工具抓包，可以定位到招聘信息的接口

在请求中可以获取到接口的url和formdata，表单中pn为请求的页数，kd为关请求职位的关键字

使用python构建post请求

data = {
  'first': 'true',
  'pn': '1',
  'kd': 'python'
}

headers = {
  'referer': 'https://www.lagou.com/jobs/list_python/p-city_0"https://www.lagou.com/jobs/positionAjax.json", data=data,headers=headers)
print(res.text)

发现没有从接口获取到数据

换了个网络后接口还是会返回操作频繁的错误信息，仔细检查后发现这个接口需要一个动态的cookies不然会一值返回错误频繁

data = {
  'first': 'true',
  'pn': '1',
  'kd': 'python'
}

#头部中必须有user-agent和referer不然不会返回cookies
headers = {
  'referer': 'https://www.lagou.com/jobs/list_python/p-city_0"https://www.lagou.com/jobs/list_python/p-city_0",headers=headers)

#再post请求中传入cookies
r2 = requests.post("https://www.lagou.com/jobs/positionAjax.json", data=data,headers=headers, cookies=r2.cookies)
print(r2.text)

注意！每请求十次接口cookies也会刷新一次,下面贴上完整爬虫代码

import json
import logging

import requests

#获取cookie
def getCookie():
  res = requests.get("https://www.lagou.com/jobs/list_python/p-city_0",
        headers=headers)
  return res.cookies

#获取json数据
def getPage(i, cookies, kw):
  data = {
    'first': 'true',
    'pn': i,
    'kd': kw
  }
  res = requests.post("https://www.lagou.com/jobs/positionAjax.json", data=data,
             headers=headers, cookies=cookies)
  return json.loads(res.text)

#合并列表
def reduceList(l):
  text = ""
  for i in l:
    text += i + " "
  return text.strip()

#提取字段并保存到文件中
def saveInCsv(f, data):
  js = data["content"]["positionResult"]["result"]
  for node in js:

    # 对空值进行处理
    district = node["district"]
    if district != None:
      district = "-" + district
    else:
      district = ""

    f.write(
      node["positionName"] + "·" + node["city"] + district + "·" + node[
        "salary"] + "·" +
      node["workYear"] + "·" + node["education"] + "·" + reduceList(node["skillLables"]) + "·" +
      node["companyShortName"] + "·" + node["companySize"] + "·" + node["positionAdvantage"] + "\n")

if __name__ == '__main__':
  #定义头部
  headers = {
    'referer': 'https://www.lagou.com/jobs/list_python/p-city_0"file.csv", "w", encoding="utf-8") as f:
    for i in range(1, 31):
      #每十个请求重新获取cookie
      if (i % 10 == 0):
        cookies = getCookie()

      #解析字段并存储
      data = getPage(i, cookies, "python")
      saveInCsv(f, data)

python,requests爬取拉勾网,python,requests爬取

标签：

python,requests爬取拉勾网,python,requests爬取

相思资源网 Design By www.200059.com

广告合作：本站广告合作请联系QQ：858582 申请时备注：广告合作（否则不回）
免责声明：本站文章均来自网站采集或用户投稿，网站不提供任何软件下载或自行开发的软件！如有用户或公司发现本站内容信息存在侵权行为，请邮件告知！ 858582#qq.com

相思资源网 Design By www.200059.com

评论“python使用requests库爬取拉勾网招聘信息的实现”

暂无python使用requests库爬取拉勾网招聘信息的实现的评论...

www.200059.com 相思资源网

139,976影音资源

144,792福利资源

21,817软件资源

631,128技术资源

最新文章

转载一个别人收藏的精典网站Ruby,HIBERNATE

2026/4/21

可与Spreadsheets媲美的在线表格系统:EditG

2026/4/21

cygwin使用心得

2026/4/21

脚本的DVD开发

2026/4/21

局域网设置自动配置脚本文件的写法与用途

2026/4/21

一句话新闻

一口气升级7个大模型SaaS应用，百度智能云：突出一个“开箱即用” - 2026/4/21

这一波大模型产业落地浪潮里，不少企业其实处在 “干瞪眼“的状态。

一种情况是，很多大模型产品看得见却摸不着，在台上一个个遥遥领先——今天Sora技精四座，明天英伟达的机器人又赢得满堂彩，可是到了台下一问：啥时候能用上啊？答曰：遥遥无期。

另一种情况是，企业想用上大模型，却又难免瞻前顾后——既要考虑场景融合，又得兼顾安全性，还要考虑打通现有系统，再加上各种部署成本和繁琐的采购流程……最后只能拂袖：罢了，再等等吧。

python使用requests库爬取拉勾网招聘信息的实现

python,requests爬取拉勾网,python,requests爬取

Django vue前后端分离整合过程解析

基于Python的图像阈值化分割(迭代法)

评论“python使用requests库爬取拉勾网招聘信息的实现”

稳了！魔兽国服回归的3条重磅消息！官宣时间再确认！

友情链接