How to get JavaScript variables from a script tag using Python and Beautifulsoup

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP



How to get JavaScript variables from a script tag using Python and Beautifulsoup



I want to return the "id" value from the variable meta using beautifulsoup and python. This possible? Additionally, I don't know how to find the certain 'script' tag that contains the meta variable because it does not have a unique identifier, as well as many other 'script' tags on the site. I'm also using selenium as well, so I can understand any answers with that.


<script>
var meta = "variants":["id":12443604615241,"price":14000,
"id":12443604648009,"price":14000]
</script>





What are you trying so far with python?
– Lex
Aug 10 at 1:37





@FrankDiGiacomoKnarFTHUNDER Update the HTML with the parent node of the <script> tag
– New contributor
Aug 10 at 2:12


<script>




2 Answers
2



If you are using selenium there's no need to parse the html to get the js variable, just use selenum webdriver.execute_script() to get it to python:


webdriver.execute_script()


from selenium import webdriver

driver = webdriver.Firefox()
driver.get('https://whatever.com/')
meta = driver.execute_script('return meta')



And thats it, meta now holds the js variable, and it maintains its type





Thanks, didn't know it was that simple for my case.
– Frank DiGiacomo KnarF THUNDER
Aug 10 at 12:39



You can use builtin re and json module for extracting Javascript variables:


re


json


from bs4 import BeautifulSoup
import re
import json
from pprint import pprint

data = '''
<html>
<body>

<script>
var meta = "variants":["id":12443604615241,"price":14000,
"id":12443604648009,"price":14000]
</script>

</body>
'''

soup = BeautifulSoup(data, 'lxml')
json_string = re.search(r'metas*=s*(.*?}])s*n', str(soup.find('script')), flags=re.DOTALL)

json_data = json.loads('' + json_string[1] + '')

pprint(json_data)



This prints:


'variants': ['id': 12443604615241, 'price': 14000,
'id': 12443604648009, 'price': 14000]





That seems like the right idea, but i got an error: stating "TypeError: 'NoneType' object is not subscriptable," remember that there are about 50 other script tags without any unique identifier on the site sometimes, so i think I need to find this unique one with the variable meta in it. Don't know if that's the problem, thanks
– Frank DiGiacomo KnarF THUNDER
Aug 10 at 12:33






@FrankDiGiacomoKnarFTHUNDER I don't know the structure of the html code you have, so helping you is hard without knowing it. All I can say it's selecting the script you want and having the right regular expression to extract the variable.
– Andrej Kesely
Aug 10 at 13:00






By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

Popular posts from this blog

Firebase Auth - with Email and Password - Check user already registered

Dynamically update html content plain JS

How to determine optimal route across keyboard