What's the best way to find the intersection between two strings?

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP



What's the best way to find the intersection between two strings?



I need to find the intersection between two strings.
Assertions:


assert intersect("test", "tes") == list("tes"), "Assertion 1"
assert intersect("test", "ta") == list("t"), "Assertion 2"
assert intersect("foo", "fo") == list("fo"), "Assertion 3"
assert intersect("foobar", "foo") == list("foo"), "Assertion 4"



I tried different implementations for the intersect function. intersect would receive 2 str parameters, w and w2


intersect


intersect


str


w


w2



List comprehension. Iterate and look for occurrences in the second string.


return [l for l in w if l in w2]



Fail assertion 1 and 2 because multiple t in w match the one t in w2


w


w2



Sets intersections.


return list(set(w).intersection(w2)
return list(set(w) & set(w2))



Fails assertion 3 and 4 because a set is a collection of unique elements and duplicated letters will be discarded.


collection of unique elements



Iterate and count.


out = ""
for c in s1:
if c in s2 and not c in out:
out += c
return out



Fails because it also eliminates duplicates.



difflib (Python Documentation)


letters_diff = difflib.ndiff(word, non_wildcards_letters)
letters_intersection =

for l in letters_diff:
letter_code, letter = l[:2], l[2:]
if letter_code == " ":
letters_intersection.append(letter)

return letters_intersection



Passes



difflib works but can anybody think of a better, optimized approach?


difflib



EDIT:
The function would return a list of chars. The order doesn't really matter.





What's the expected output of intersection('aba', 'aca')? ['a'] or ['a', 'a']? What about intersection('ab', 'b')? or ['b']?
– Aran-Fey
Aug 8 at 14:26



intersection('aba', 'aca')


['a']


['a', 'a']


intersection('ab', 'b')



['b']





Are these just prefixes? If not, does the order actually matter or just the counts?
– Reut Sharabani
Aug 8 at 14:30






Possible duplicate of Find common substring between two strings
– Thierry Lathuille
Aug 8 at 14:32





@Aran-Fey intersection('aba', 'aca') should return ['a', 'a'] and intersection('ab', 'b') just ['b'] yes. @ReutSharabani , edited the question. No the order doesn't matter.
– maxrodrigo
Aug 8 at 14:56



intersection('aba', 'aca')


['a', 'a']


intersection('ab', 'b')


['b']





what about intersection('ab', 'abhab')? ['ab', 'ab']?
– xdze2
Aug 8 at 15:01


intersection('ab', 'abhab')


['ab', 'ab']




3 Answers
3



Try this:


def intersect(string1, string2):
common =
for char in set(string1):
common.extend(char * min(string1.count(char), string2.count(char)))

return common



Note: It doesn't preserve the order (if I remember set() correctly, the letters will be returned in alphabetical order). But, as you say in your comments, order doesn't matter


set()





@Aran-Fey Edited :)
– Adi219
Aug 8 at 15:27





Please can (potential) downvoters explain why?
– Adi219
Aug 8 at 15:27





list(set(list(str1))) can be just list(set(str1)). You have a typo in the else branch (string instead of string2). I'm pretty sure there's no reason to have the whole if ... else thing in the first place. And finally, repeatedly calling str.count on both strings needlessly increases your run time complexity. You can use collections.Counter to count all the characters just once. (And your code had syntax errors before the edit.)
– Aran-Fey
Aug 8 at 15:30



list(set(list(str1)))


list(set(str1))


else


string


string2


if ... else


str.count


collections.Counter





min(string1.count(char), str2.count(char)) <- that's a bug. You should be using str1 there.
– Aran-Fey
Aug 8 at 15:33



min(string1.count(char), str2.count(char))


str1





@Aran-Fey The if-else is needed to get the string of minimum length. I wouldn't need it if I knew for sure that the strings were of different length, but as they could be, I'm using the if-else.
– Adi219
Aug 8 at 15:33



This works pretty well for your test cases:


def intersect(haystack, needle):
while needle:
pos = haystack.find(needle)
if pos >= 0:
return list(needle)
needle = needle[:-1]
return



But, bear in mind that, all your test cases are longer then shorter, do not have an empty search term, an empty search space, or a non-match.





Unfortunately the test cases don't come anywhere close to covering all the requirements. If you read the comments below the question you should see that this doesn't do what the OP wants.
– Aran-Fey
Aug 8 at 15:21



Gives the number of co-occurrence for all n-grams in the two strings:


from collections import Counter

def all_ngrams(text):
ngrams = ( text[i:i+n] for n in range(1, len(text)+1)
for i in range(len(text)-n+1) )
return Counter(ngrams)

def intersection(string1, string2):
count_1 = all_ngrams(string1)
count_2 = all_ngrams(string2)
return count_1 & count_2 # intersection: min(c[x], d[x])

intersection('foo', 'f') # Counter('f': 1)
intersection('foo', 'o') # Counter('o': 1)
intersection('foobar', 'foo') # Counter('f': 1, 'fo': 1, 'foo': 1, 'o': 2, 'oo': 1)
intersection('abhab', 'abab') # Counter('a': 2, 'ab': 2, 'b': 2)
intersection('achab', 'abac') # Counter('a': 2, 'ab': 1, 'ac': 1, 'b': 1, 'c': 1)
intersection('test', 'ates') # Counter('e': 1, 'es': 1, 's': 1, 't': 1, 'te': 1, 'tes': 1)






By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

Popular posts from this blog

Firebase Auth - with Email and Password - Check user already registered

Dynamically update html content plain JS

How to determine optimal route across keyboard