Selenium is unable to extract page source and returning empty body of html page
Clash Royale CLAN TAG#URR8PPP
Selenium is unable to extract page source and returning empty body of html page
Here is my python code:
import pandas as pd
import pandas_datareader.data as web
import bs4 as bs
import urllib.request as ul
from selenium import webdriver
style.use('ggplot')
driver = webdriver.PhantomJS(executable_path='C:\Phantomjs\bin\phantomjs.exe')
def getBondRate():
#driver.deleteAllCookies();
url = "https://www.marketwatch.com/investing/index/tnx?countrycode=xx"
driver.get(url)
driver.implicitly_wait(10)
html = driver.page_source
return html
bondRate = getBondRate()
print(bondRate)
Few days back it was reading perfectly fine from Market watch. Now it is returning nothing in Body tag. Is selenium not loading page?
2 Answers
2
Do you require the HTML tags also? If not, you can try retrieving using the body tag. Here's how I would do it using Java.
String src=driver.findElement(By.tagName("body")).getText();
As per the url https://www.marketwatch.com/investing/index/tnx?countrycode=xx
the behavior you are observing is pretty much justified.
https://www.marketwatch.com/investing/index/tnx?countrycode=xx
I have taken up your code and along with a simple tweak tried to extract the page_source
with PhantomJS as well as ChromeDriver. It is observed that when you use any WebDriver variant, the WebDriver fingerprints are geting detected and a Fingerprinting error
is raised as follows:
page_source
Fingerprinting error
Error details:
Failed to load resource: the server responded with a status of 404 (Not Found)
kpf.js?url=/149e9513-01fa-4fb0-aad4-566afd725d1b/2d206a39-8ed7-437e-a3be-862e0f06eea3/fingerprint&token=058cbc6a-f8b8-f175-ca68-8c2e0fd6a4e3:1 Fingerprinting error
name: Error
message: Error issuing AJAX request (status code: 404)
stack: Error: Error issuing AJAX request (status code: 404)
at XMLHttpRequest.N.a.onreadystatechange (https://www.marketwatch.com/149e9513-01fa-4fb0-aad4-566afd725d1b/2d206a39-8ed7-437e-a3be-862e0f06eea3/fingerprint/script/kpf.js?url=/149e9513-01fa-4fb0-aad4-566afd725d1b/2d206a39-8ed7-437e-a3be-862e0f06eea3/fingerprint&token=058cbc6a-f8b8-f175-ca68-8c2e0fd6a4e3:1:1884)
DevTools failed to parse SourceMap: https://www.marketwatch.com/149e9513-01fa-4fb0-aad4-566afd725d1b/2d206a39-8ed7-437e-a3be-862e0f06eea3/fingerprint/script/fingerprint.js.map
DevTools Snapshot:
Browser Automation with Selenium: Fingerprints, recognizability and traceability?
Can a website detect when you are using selenium with chromedriver?
Selenium Webdriver is detectable
@JAGS8386 There are multiple ways. You can compile the WebDriver binary i.e. chromedriver binary with a few tweeks or use a PROXY. I have updated the answer and added some more references.
– New contributor
Aug 9 at 15:25
@JAGS8386 Glad to help you out. If my answer have catered to your question please Accept the answer by clicking on the hollow check mark beside my answer which is just below the votedown arrow so the check mark turns green.
– New contributor
Aug 9 at 15:26
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.
Thank you. How do I overcome with this issue if I still need to access data?
– JAGS8386
Aug 9 at 15:23