How to get JavaScript variables from a script tag using Python and Beautifulsoup
Clash Royale CLAN TAG#URR8PPP
How to get JavaScript variables from a script tag using Python and Beautifulsoup
I want to return the "id" value from the variable meta using beautifulsoup and python. This possible? Additionally, I don't know how to find the certain 'script' tag that contains the meta variable because it does not have a unique identifier, as well as many other 'script' tags on the site. I'm also using selenium as well, so I can understand any answers with that.
<script>
var meta = "variants":["id":12443604615241,"price":14000,
"id":12443604648009,"price":14000]
</script>
@FrankDiGiacomoKnarFTHUNDER Update the HTML with the parent node of the
<script>
tag– New contributor
Aug 10 at 2:12
<script>
2 Answers
2
If you are using selenium there's no need to parse the html to get the js variable, just use selenum webdriver.execute_script()
to get it to python:
webdriver.execute_script()
from selenium import webdriver
driver = webdriver.Firefox()
driver.get('https://whatever.com/')
meta = driver.execute_script('return meta')
And thats it, meta now holds the js variable, and it maintains its type
Thanks, didn't know it was that simple for my case.
– Frank DiGiacomo KnarF THUNDER
Aug 10 at 12:39
You can use builtin re
and json
module for extracting Javascript variables:
re
json
from bs4 import BeautifulSoup
import re
import json
from pprint import pprint
data = '''
<html>
<body>
<script>
var meta = "variants":["id":12443604615241,"price":14000,
"id":12443604648009,"price":14000]
</script>
</body>
'''
soup = BeautifulSoup(data, 'lxml')
json_string = re.search(r'metas*=s*(.*?}])s*n', str(soup.find('script')), flags=re.DOTALL)
json_data = json.loads('' + json_string[1] + '')
pprint(json_data)
This prints:
'variants': ['id': 12443604615241, 'price': 14000,
'id': 12443604648009, 'price': 14000]
That seems like the right idea, but i got an error: stating "TypeError: 'NoneType' object is not subscriptable," remember that there are about 50 other script tags without any unique identifier on the site sometimes, so i think I need to find this unique one with the variable meta in it. Don't know if that's the problem, thanks
– Frank DiGiacomo KnarF THUNDER
Aug 10 at 12:33
@FrankDiGiacomoKnarFTHUNDER I don't know the structure of the html code you have, so helping you is hard without knowing it. All I can say it's selecting the script you want and having the right regular expression to extract the variable.
– Andrej Kesely
Aug 10 at 13:00
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.
What are you trying so far with python?
– Lex
Aug 10 at 1:37