Storing Python RegEx multiple groups
Clash Royale CLAN TAG#URR8PPP
Storing Python RegEx multiple groups
I'm webscrapping a site using python. The returned results have the following format, ( https://regex101.com/r/irr14u/10 ), where everything works ok apart from the last occassion where i get 2 matches for the dates (1st match:Thur.-Sun., Tue., Wed. and second match: Mon.)
I'm using the following code to get the values that i want. I use BeautifoulSoup to get movieDate string, but here i hardcoded it.
movieDate="Thur.-Sun., Tue., Wed.: 20.50/ 23.00, Mon. 23.00"
weekDays=re.match(',? *(?P<weekDays>[^d:n]+):? *(?P<startTime>[^,n]+)', movieDate).groupdict()['weekDays']
startTime=re.match(',? *(?P<weekDays>[^d:n]+):? *(?P<startTime>[^,n]+)', movieDate).groupdict()['startTime']
I want to create a dictionary as following (it has two keys because the are two startTime values);
The first key will be Thur.-Sun., Tue., Wed. with value =20.50/ 23.00
and the second key will be Mon. with value=23:00.
There might be occassions with one or more than two keys. So the dictionary will be as following;
dictionary= Thur.-Sun., Tue., Wed.: 20.50/ 23.00, Mon.: 23.00
Any suggestions to achieve that in a non boggy way?
I dont know how to create the dictionary and especially how to get the multiple regex matches
– sotokan80
Aug 8 at 17:56
Can you provide a minimal complete verifiable example? Please read this
– Mike Tung
Aug 8 at 18:23
1 Answer
1
You can achieve the desired output using finditer
function, appending result of the captured groups to a dict dynamically.
finditer
Python snippet:
import re
movieDate = """
Thur.-Sun., Tue., Wed.: 20.50/ 23.00, Mon. 23.00
"""
d = dict();
r = re.compile(',? *(?P<weekDays>[^d:n]+):? *(?P<startTime>[^,n]+)')
for m in r.finditer(movieDate):
d[m.group(1)] = m.group(2)
print(d)
Prints:
'Thur.-Sun., Tue., Wed.': '20.50/ 23.00', 'Mon. ': '23.00'
Thank you very much it worked fine. Another thing i want to ask is; weekDays=re.match(',? *(?P<weekDays>[^d:n]+):? *(?P<startTime>[^,n]+)', movieDate).groupdict()['weekDays'] prints only first match, how i have to modify it to print all the matches?
– sotokan80
Aug 8 at 19:09
You are welcome, I believe that is because
re.match
is anchored at the beginning of the string. An alternative is to re.search
instead. You may refer to this answer for more details.– UnbearableLightness
Aug 8 at 19:15
re.match
re.search
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.
Can you provide details on where you are stuck and how things work?
– Mike Tung
Aug 8 at 17:37