Find and delete HTML5 data-* attributes with bs4

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP



Find and delete HTML5 data-* attributes with bs4



HTML5 files may contain custom data-* attributes.



I want to find and delete all of these data-* attributes with bs4.



According to the the bs4 documentation, it's possible to search for these attributes using the attrs property.



For example:


import re
from bs4 import BeautifulSoup
data_soup = BeautifulSoup('<div data-foo="value">foo!</div>')
data_soup.find_all(attrs="data-foo": "value")



However, the following line does not work:


data_soup.find_all(attrs=re.compile('data.*') : True)



What regular expression do I need to use find all data-* attributes (regardless of their values)?



Once found, how do I delete them using del?




1 Answer
1



Yes, to delete attribute you simply use del on tag.attrs:


del


tag.attrs


data = '''
<ul>
<li data-animal-type="bird" data-other="this is other data">Owl</li>
<li data-animal-type="fish">Salmon</li>
<li data-animal-type="spider">Tarantula</li>
</ul>'''

from bs4 import BeautifulSoup

soup = BeautifulSoup(data, 'lxml')

print('Original soup:')
print(soup)
print('-' * 80)

for tag in soup.find_all(lambda t: any(i.startswith('data-') for i in t.attrs)):
for attr in list(tag.attrs):
if attr.startswith('data-'):
del tag.attrs[attr]

print()
print('Soup without data-* tags:')
print(soup)
print('-' * 80)



This prints:


Original soup:
<html><body><ul>
<li data-animal-type="bird" data-other="this is other data">Owl</li>
<li data-animal-type="fish">Salmon</li>
<li data-animal-type="spider">Tarantula</li>
</ul></body></html>
--------------------------------------------------------------------------------

Soup without data-* tags:
<html><body><ul>
<li>Owl</li>
<li>Salmon</li>
<li>Tarantula</li>
</ul></body></html>
--------------------------------------------------------------------------------






By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

Popular posts from this blog

Firebase Auth - with Email and Password - Check user already registered

Dynamically update html content plain JS

How to determine optimal route across keyboard