Selenium is unable to extract page source and returning empty body of html page

Here is my python code:

import pandas as pd import pandas_datareader.data as web import bs4 as bs import urllib.request as ul from selenium import webdriver style.use('ggplot') driver = webdriver.PhantomJS(executable_path='C:\Phantomjs\bin\phantomjs.exe') def getBondRate(): #driver.deleteAllCookies(); url = "https://www.marketwatch.com/investing/index/tnx?countrycode=xx" driver.get(url) driver.implicitly_wait(10) html = driver.page_source return html bondRate = getBondRate() print(bondRate)

Few days back it was reading perfectly fine from Market watch. Now it is returning nothing in Body tag. Is selenium not loading page?

2 Answers
2

Do you require the HTML tags also? If not, you can try retrieving using the body tag. Here's how I would do it using Java.

String src=driver.findElement(By.tagName("body")).getText();

As per the url https://www.marketwatch.com/investing/index/tnx?countrycode=xx the behavior you are observing is pretty much justified.

https://www.marketwatch.com/investing/index/tnx?countrycode=xx

I have taken up your code and along with a simple tweak tried to extract the page_source with PhantomJS as well as ChromeDriver. It is observed that when you use any WebDriver variant, the WebDriver fingerprints are geting detected and a Fingerprinting error is raised as follows:

page_source

Fingerprinting error

Error details:

Failed to load resource: the server responded with a status of 404 (Not Found) kpf.js?url=/149e9513-01fa-4fb0-aad4-566afd725d1b/2d206a39-8ed7-437e-a3be-862e0f06eea3/fingerprint&token=058cbc6a-f8b8-f175-ca68-8c2e0fd6a4e3:1 Fingerprinting error name: Error message: Error issuing AJAX request (status code: 404) stack: Error: Error issuing AJAX request (status code: 404) at XMLHttpRequest.N.a.onreadystatechange (https://www.marketwatch.com/149e9513-01fa-4fb0-aad4-566afd725d1b/2d206a39-8ed7-437e-a3be-862e0f06eea3/fingerprint/script/kpf.js?url=/149e9513-01fa-4fb0-aad4-566afd725d1b/2d206a39-8ed7-437e-a3be-862e0f06eea3/fingerprint&token=058cbc6a-f8b8-f175-ca68-8c2e0fd6a4e3:1:1884) DevTools failed to parse SourceMap: https://www.marketwatch.com/149e9513-01fa-4fb0-aad4-566afd725d1b/2d206a39-8ed7-437e-a3be-862e0f06eea3/fingerprint/script/fingerprint.js.map

DevTools Snapshot:

fingerprintingerror

Browser Automation with Selenium: Fingerprints, recognizability and traceability?

Can a website detect when you are using selenium with chromedriver?

Selenium Webdriver is detectable

Thank you. How do I overcome with this issue if I still need to access data?
– JAGS8386
Aug 9 at 15:23

@JAGS8386 There are multiple ways. You can compile the WebDriver binary i.e. chromedriver binary with a few tweeks or use a PROXY. I have updated the answer and added some more references.
– New contributor
Aug 9 at 15:25

@JAGS8386 Glad to help you out. If my answer have catered to your question please Accept the answer by clicking on the hollow check mark beside my answer which is just below the votedown arrow so the check mark turns green.
– New contributor
Aug 9 at 15:26

By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

搜尋此網誌

Sfyjdyy

Selenium is unable to extract page source and returning empty body of html page

Selenium is unable to extract page source and returning empty body of html page

2 Answers
2

Popular posts from this blog

Firebase Auth - with Email and Password - Check user already registered

Dynamically update html content plain JS

How to determine optimal route across keyboard

Selenium is unable to extract page source and returning empty body of html page

Selenium is unable to extract page source and returning empty body of html page

2 Answers 2

Popular posts from this blog

Firebase Auth - with Email and Password - Check user already registered

Dynamically update html content plain JS

How to determine optimal route across keyboard

2 Answers
2