How to find overlapping matches with a regexp?

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP



How to find overlapping matches with a regexp?


>>> match = re.findall(r'ww', 'hello')
>>> print match
['he', 'll']



Since ww means two characters, 'he' and 'll' are expected. But why do 'el' and 'lo' not match the regex?


>>> match1 = re.findall(r'el', 'hello')
>>> print match1
['el']
>>>





Lookahead
– Pavan Manjunath
Jul 11 '12 at 10:45




3 Answers
3



findall doesn't yield overlapping matches by default. This expression does however:


findall


>>> re.findall(r'(?=(ww))', 'hello')
['he', 'el', 'll', 'lo']



Here (?=...) is a lookahead assertion:


(?=...)



(?=...) matches if ... matches next, but doesn’t consume any of the
string. This is called a lookahead assertion. For example,
Isaac (?=Asimov) will match 'Isaac ' only if it’s followed by 'Asimov'.


(?=...)


...


Isaac (?=Asimov)


'Isaac '


'Asimov'



You can use the new Python regex module, which supports overlapping matches.


>>> import regex as re
>>> match = re.findall(r'ww', 'hello', overlapped=True)
>>> print match
['he', 'el', 'll', 'lo']



Except for zero-length assertion, character in the input will always be consumed in the matching. If you are ever in the case where you want to capture certain character in the input string more the once, you will need zero-length assertion in the regex.



There are several zero-length assertion (e.g. ^ (start of input/line), $ (end of input/line), b (word boundary)), but look-arounds ((?<=) positive look-behind and (?=) positive look-ahead) are the only way that you can capture overlapping text from the input. Negative look-arounds ((?<!) negative look-behind, (?!) negative look-ahead) are not very useful here: if they assert true, then the capture inside failed; if they assert false, then the match fails. These assertions are zero-length (as mentioned before), which means that they will assert without consuming the characters in the input string. They will actually match empty string if the assertion passes.


^


$


b


(?<=)


(?=)


(?<!)


(?!)



Applying the knowledge above, a regex that works for your case would be:


(?=(ww))






By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

Popular posts from this blog

make 2 or more post in bootsrap

Store custom data using WC_Cart add_to_cart() method in Woocommerce 3

Firebase Auth - with Email and Password - Check user already registered