-
Jackll2302说:
Jackll2302 说:
利用VBA爬取豆瓣Top250到Excel文件,并给出源代码
Sub Get_Douban_Top250() Dim http As Object Dim html As Object Dim i As Long Dim tr As Object Dim td As Object Dim a As Object Dim txt As String Dim ws As Worksheet Set ws = Sheets("Sheet1") Set http = CreateObject("MSXML2.XMLHTTP") http.Open "GET", "https://movie.douban.com/top250", False http.send Set html = CreateObject("htmlfile") html.body.innerHTML = http.responsetext For Each tr In html.getelementsbytagname("tr") i = i + 1 If i > 1 Then For Each td In tr.getelementsbytagname("td") For Each a In td.getelementsbytagname("a") If a.classname = "title" Then txt = txt & "| " & a.innertext & " |" ws.Range("A" & i).Value = a.innertext End If If a.classname = "rating_num" Then txt = txt & " 评分:" & a.innertext ws.Range("B" & i).Value = a.innertext End If Next Next ws.Range("C" & i).Value = txt txt = "" End If Next MsgBox "豆瓣Top250已经成功爬取到Excel文件中!" End Sub
-
Jackll2302说:
利用VBA爬取豆瓣Top250到Excel文件中,注意使用模拟浏览器语句,并给出源代码
Sub DoubanTop250()
Dim http As Object, html As Object, i As Integer
Set http = CreateObject("Microsoft.XMLHTTP")
i = 1
Do Until i > 250
http.Open "GET", "https://movie.douban.com/top250?start=" & 25 * (i - 1) & "&filter=", False
http.send
Set html = CreateObject("htmlfile")
html.body.innerHTML = http.responseText
Dim tb As Object
Set tb = html.getelementsbytagname("table")(0)
Dim tr As Object
Set tr = tb.getelementsbytagname("tr")
Dim td As Object
Set td = tr(0).getelementsbytagname("td")
Cells(i, 1).Value = td(1).getelementsbytagname("a")(0).innerText
Cells(i, 2).Value = td(2).getelementsbytagname("div")(0).innerText
i = i + 1
Loop
End Sub -
Jackll2302说:
刚才的VBA爬取豆瓣代码不能正常运行,请再修改。
Sub Get_DouBan_Data()
Dim xhr As New MSXML2.XMLHTTP60
Dim html As New HTMLDocument
Dim post As ObjectDim Rows As Long
Rows = 1With xhr
.Open "GET", "https://movie.douban.com/top250", False
.send
html.body.innerHTML = .responseText
End WithFor Each post In html.getElementsByClassName("item")
Cells(Rows, 1) = post.getElementsByTagName("em")(0).innerText
Cells(Rows, 2) = post.getElementsByTagName("span")(0).innerText
Cells(Rows, 3) = post.getElementsByClassName("rating_num")(0).innerText
Cells(Rows, 4) = post.getElementsByTagName("p")(0).innerText
Rows = Rows + 1
Next postEnd Sub
-
刚才第三个VBA爬取豆瓣TOP250程序仍然不能工作,有可能引用有问题,请再解答@ChatGPT
-
Jackll2302说:
利用Python爬取香港交易所沪股通及深股通持股纪录按日查询的数据,并写入Excel文件,请给出源码
import requests import json import pandas as pd # 定义字典用来存放请求参数 params = { 'sort': '', 'order': 'desc', 'limit': '20', 'start': '0', '_': '1597602926818', 'date': '2020-08-14' } # 发起请求,获取数据 url = 'http://www.hkex.com.hk/sc/ccass-data-search/getCCASSDayData' res = requests.get(url, params).text data = json.loads(res) # 将数据写入Excel文件 df = pd.DataFrame(data['data']) df.to_excel('hkex.xlsx', index=False)
-
Jackll2302说:
请求参数都不对
这种情况下,服务器会返回一个HTTP 400状态码,表示请求无效。
-
利用Python爬取香港交易所沪股通及深股通持股纪录按日查询的数据,例如沪股通网址是:https://sc.hkexnews.hk/TuniS/www3.hkexnews.hk/sdw/search/mutualmarket_c.aspx?t=sh,并写入Excel文件,请给出可运行源代码 @ChatGPT
-
Jackll2302说:
利用Python爬取香港交易所沪股通及深股通持股纪录按日查询的数据,例如沪股通网址是:https://sc.hkexnews.hk/TuniS/www3.hkexnews.hk/sdw/search/mutualmarket_c.aspx?t=sh,并写入Excel文件,请给出可运行源代码
import requests import pandas as pd from bs4 import BeautifulSoup # 获取沪股通持股数据 url = 'https://sc.hkexnews.hk/TuniS/www3.hkexnews.hk/sdw/search/mutualmarket_c.aspx?t=sh' res = requests.get(url) soup = BeautifulSoup(res.text, 'html.parser') table = soup.find_all('table')[1] # 获取表头 head_list = [head.text.strip() for head in table.find_all('th')] # 获取表格数据 data_list = [[td.text.strip() for td in tr.find_all('td')] for tr in table.find_all('tr')[1:]] # 将数据写入Excel df = pd.DataFrame(data_list, columns=head_list) df.to_excel('沪股通持股.xlsx', index=False)
-
Jackll2302说:
没有传递参数给网站,可以给出获取任意日期的数据吗?
不能,如果没有传递参数给网站,网站将无法获取任意日期的数据。要获取特定日期的数据,必须传递一个日期参数给网站,以便网站可以抓取特定日期的数据。
-
Jackll2302说:
如何传递参数给网站 ?
一般情况下,可以通过URL中的查询字符串(如:?param1=value1¶m2=value2)将参数传递给网站。此外,还可以使用POST请求将参数传递给网站,该请求将参数作为消息主体发送到服务器。
-
Jackll2302说:
请给出本案例的参数传递的代码
// 传递参数示例
func foo(a int, b string) {
fmt.Println(a, b)
}// 调用函数示例
foo(1, "Hello")