What's the best way to find the intersection between two strings?
Clash Royale CLAN TAG#URR8PPP
What's the best way to find the intersection between two strings?
I need to find the intersection between two strings.
Assertions:
assert intersect("test", "tes") == list("tes"), "Assertion 1"
assert intersect("test", "ta") == list("t"), "Assertion 2"
assert intersect("foo", "fo") == list("fo"), "Assertion 3"
assert intersect("foobar", "foo") == list("foo"), "Assertion 4"
I tried different implementations for the intersect
function. intersect
would receive 2 str
parameters, w
and w2
intersect
intersect
str
w
w2
List comprehension. Iterate and look for occurrences in the second string.
return [l for l in w if l in w2]
Fail assertion 1 and 2 because multiple t in w
match the one t in w2
w
w2
Sets intersections.
return list(set(w).intersection(w2)
return list(set(w) & set(w2))
Fails assertion 3 and 4 because a set is a collection of unique elements
and duplicated letters will be discarded.
collection of unique elements
Iterate and count.
out = ""
for c in s1:
if c in s2 and not c in out:
out += c
return out
Fails because it also eliminates duplicates.
difflib (Python Documentation)
letters_diff = difflib.ndiff(word, non_wildcards_letters)
letters_intersection =
for l in letters_diff:
letter_code, letter = l[:2], l[2:]
if letter_code == " ":
letters_intersection.append(letter)
return letters_intersection
Passes
difflib
works but can anybody think of a better, optimized approach?
difflib
EDIT:
The function would return a list of chars. The order doesn't really matter.
intersection('aba', 'aca')
['a']
['a', 'a']
intersection('ab', 'b')
['b']
Are these just prefixes? If not, does the order actually matter or just the counts?
– Reut Sharabani
Aug 8 at 14:30
Possible duplicate of Find common substring between two strings
– Thierry Lathuille
Aug 8 at 14:32
@Aran-Fey
intersection('aba', 'aca')
should return ['a', 'a']
and intersection('ab', 'b')
just ['b']
yes. @ReutSharabani , edited the question. No the order doesn't matter.– maxrodrigo
Aug 8 at 14:56
intersection('aba', 'aca')
['a', 'a']
intersection('ab', 'b')
['b']
what about
intersection('ab', 'abhab')
? ['ab', 'ab']
?– xdze2
Aug 8 at 15:01
intersection('ab', 'abhab')
['ab', 'ab']
3 Answers
3
Try this:
def intersect(string1, string2):
common =
for char in set(string1):
common.extend(char * min(string1.count(char), string2.count(char)))
return common
Note: It doesn't preserve the order (if I remember set()
correctly, the letters will be returned in alphabetical order). But, as you say in your comments, order doesn't matter
set()
@Aran-Fey Edited :)
– Adi219
Aug 8 at 15:27
Please can (potential) downvoters explain why?
– Adi219
Aug 8 at 15:27
list(set(list(str1)))
can be just list(set(str1))
. You have a typo in the else
branch (string
instead of string2
). I'm pretty sure there's no reason to have the whole if ... else
thing in the first place. And finally, repeatedly calling str.count
on both strings needlessly increases your run time complexity. You can use collections.Counter
to count all the characters just once. (And your code had syntax errors before the edit.)– Aran-Fey
Aug 8 at 15:30
list(set(list(str1)))
list(set(str1))
else
string
string2
if ... else
str.count
collections.Counter
min(string1.count(char), str2.count(char))
<- that's a bug. You should be using str1
there.– Aran-Fey
Aug 8 at 15:33
min(string1.count(char), str2.count(char))
str1
@Aran-Fey The if-else is needed to get the string of minimum length. I wouldn't need it if I knew for sure that the strings were of different length, but as they could be, I'm using the if-else.
– Adi219
Aug 8 at 15:33
This works pretty well for your test cases:
def intersect(haystack, needle):
while needle:
pos = haystack.find(needle)
if pos >= 0:
return list(needle)
needle = needle[:-1]
return
But, bear in mind that, all your test cases are longer then shorter, do not have an empty search term, an empty search space, or a non-match.
Unfortunately the test cases don't come anywhere close to covering all the requirements. If you read the comments below the question you should see that this doesn't do what the OP wants.
– Aran-Fey
Aug 8 at 15:21
Gives the number of co-occurrence for all n-grams in the two strings:
from collections import Counter
def all_ngrams(text):
ngrams = ( text[i:i+n] for n in range(1, len(text)+1)
for i in range(len(text)-n+1) )
return Counter(ngrams)
def intersection(string1, string2):
count_1 = all_ngrams(string1)
count_2 = all_ngrams(string2)
return count_1 & count_2 # intersection: min(c[x], d[x])
intersection('foo', 'f') # Counter('f': 1)
intersection('foo', 'o') # Counter('o': 1)
intersection('foobar', 'foo') # Counter('f': 1, 'fo': 1, 'foo': 1, 'o': 2, 'oo': 1)
intersection('abhab', 'abab') # Counter('a': 2, 'ab': 2, 'b': 2)
intersection('achab', 'abac') # Counter('a': 2, 'ab': 1, 'ac': 1, 'b': 1, 'c': 1)
intersection('test', 'ates') # Counter('e': 1, 'es': 1, 's': 1, 't': 1, 'te': 1, 'tes': 1)
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.
What's the expected output of
intersection('aba', 'aca')
?['a']
or['a', 'a']
? What aboutintersection('ab', 'b')
?or
['b']
?– Aran-Fey
Aug 8 at 14:26