-
利用VBA爬取豆瓣Top250到Excel文件,并给出源代码
-
-
Jackll2302说:
Jackll2302 说:
利用VBA爬取豆瓣Top250到Excel文件,并给出源代码
Sub Get_Douban_Top250() Dim http As Object Dim html As Object Dim i As Long Dim tr As Object Dim td As Object Dim a As Object Dim txt As String Dim ws As Worksheet Set ws = Sheets("Sheet1") Set http = CreateObject("MSXML2.XMLHTTP") http.Open "GET", "https://movie.douban.com/top250", False http.send Set html = CreateObject("htmlfile") html.body.innerHTML = http.responsetext For Each tr In html.getelementsbytagname("tr") i = i + 1 If i > 1 Then For Each td In tr.getelementsbytagname("td") For Each a In td.getelementsbytagname("a") If a.classname = "title" Then txt = txt & "| " & a.innertext & " |" ws.Range("A" & i).Value = a.innertext End If If a.classname = "rating_num" Then txt = txt & " 评分:" & a.innertext ws.Range("B" & i).Value = a.innertext End If Next Next ws.Range("C" & i).Value = txt txt = "" End If Next MsgBox "豆瓣Top250已经成功爬取到Excel文件中!" End Sub
-
Jackll2302说:
利用VBA爬取豆瓣Top250到Excel文件中,注意使用模拟浏览器语句,并给出源代码
Sub DoubanTop250()
Dim http As Object, html As Object, i As Integer
Set http = CreateObject("Microsoft.XMLHTTP")
i = 1
Do Until i > 250
http.Open "GET", "https://movie.douban.com/top250?start=" & 25 * (i - 1) & "&filter=", False
http.send
Set html = CreateObject("htmlfile")
html.body.innerHTML = http.responseText
Dim tb As Object
Set tb = html.getelementsbytagname("table")(0)
Dim tr As Object
Set tr = tb.getelementsbytagname("tr")
Dim td As Object
Set td = tr(0).getelementsbytagname("td")
Cells(i, 1).Value = td(1).getelementsbytagname("a")(0).innerText
Cells(i, 2).Value = td(2).getelementsbytagname("div")(0).innerText
i = i + 1
Loop
End Sub -
Jackll2302说:
刚才的VBA爬取豆瓣代码不能正常运行,请再修改。
Sub Get_DouBan_Data()
Dim xhr As New MSXML2.XMLHTTP60
Dim html As New HTMLDocument
Dim post As ObjectDim Rows As Long
Rows = 1With xhr
.Open "GET", "https://movie.douban.com/top250", False
.send
html.body.innerHTML = .responseText
End WithFor Each post In html.getElementsByClassName("item")
Cells(Rows, 1) = post.getElementsByTagName("em")(0).innerText
Cells(Rows, 2) = post.getElementsByTagName("span")(0).innerText
Cells(Rows, 3) = post.getElementsByClassName("rating_num")(0).innerText
Cells(Rows, 4) = post.getElementsByTagName("p")(0).innerText
Rows = Rows + 1
Next postEnd Sub
-
刚才第三个VBA爬取豆瓣TOP250程序仍然不能工作,有可能引用有问题,请再解答@ChatGPT
-
Jackll2302说:
利用Python爬取香港交易所沪股通及深股通持股纪录按日查询的数据,并写入Excel文件,请给出源码
import requests import json import pandas as pd # 定义字典用来存放请求参数 params = { 'sort': '', 'order': 'desc', 'limit': '20', 'start': '0', '_': '1597602926818', 'date': '2020-08-14' } # 发起请求,获取数据 url = 'http://www.hkex.com.hk/sc/ccass-data-search/getCCASSDayData' res = requests.get(url, params).text data = json.loads(res) # 将数据写入Excel文件 df = pd.DataFrame(data['data']) df.to_excel('hkex.xlsx', index=False)
-
Jackll2302说:
请求参数都不对
这种情况下,服务器会返回一个HTTP 400状态码,表示请求无效。