What's the best way to find the intersection between two strings?

I need to find the intersection between two strings.
Assertions:

assert intersect("test", "tes") == list("tes"), "Assertion 1" assert intersect("test", "ta") == list("t"), "Assertion 2" assert intersect("foo", "fo") == list("fo"), "Assertion 3" assert intersect("foobar", "foo") == list("foo"), "Assertion 4"

I tried different implementations for the intersect function. intersect would receive 2 str parameters, w and w2

intersect

str

w

w2

List comprehension. Iterate and look for occurrences in the second string.

return [l for l in w if l in w2]

Fail assertion 1 and 2 because multiple t in w match the one t in w2

w

w2

Sets intersections.

return list(set(w).intersection(w2) return list(set(w) & set(w2))

Fails assertion 3 and 4 because a set is a collection of unique elements and duplicated letters will be discarded.

collection of unique elements

Iterate and count.

out = "" for c in s1: if c in s2 and not c in out: out += c return out

Fails because it also eliminates duplicates.

difflib (Python Documentation)

letters_diff = difflib.ndiff(word, non_wildcards_letters) letters_intersection = for l in letters_diff: letter_code, letter = l[:2], l[2:] if letter_code == " ": letters_intersection.append(letter) return letters_intersection

Passes

difflib works but can anybody think of a better, optimized approach?

difflib

EDIT:
The function would return a list of chars. The order doesn't really matter.

What's the expected output of intersection('aba', 'aca')? ['a'] or ['a', 'a']? What about intersection('ab', 'b')? or ['b']?
– Aran-Fey
Aug 8 at 14:26

intersection('aba', 'aca')

['a']

['a', 'a']

intersection('ab', 'b')

['b']

Are these just prefixes? If not, does the order actually matter or just the counts?
– Reut Sharabani
Aug 8 at 14:30

Possible duplicate of Find common substring between two strings
– Thierry Lathuille
Aug 8 at 14:32

@Aran-Fey intersection('aba', 'aca') should return ['a', 'a'] and intersection('ab', 'b') just ['b'] yes. @ReutSharabani , edited the question. No the order doesn't matter.
– maxrodrigo
Aug 8 at 14:56

intersection('aba', 'aca')

['a', 'a']

intersection('ab', 'b')

['b']

what about intersection('ab', 'abhab')? ['ab', 'ab']?
– xdze2
Aug 8 at 15:01

intersection('ab', 'abhab')

['ab', 'ab']

3 Answers
3

Try this:

def intersect(string1, string2): common = for char in set(string1): common.extend(char * min(string1.count(char), string2.count(char))) return common

Note: It doesn't preserve the order (if I remember set() correctly, the letters will be returned in alphabetical order). But, as you say in your comments, order doesn't matter

set()

@Aran-Fey Edited :)
– Adi219
Aug 8 at 15:27

Please can (potential) downvoters explain why?
– Adi219
Aug 8 at 15:27

list(set(list(str1))) can be just list(set(str1)). You have a typo in the else branch (string instead of string2). I'm pretty sure there's no reason to have the whole if ... else thing in the first place. And finally, repeatedly calling str.count on both strings needlessly increases your run time complexity. You can use collections.Counter to count all the characters just once. (And your code had syntax errors before the edit.)
– Aran-Fey
Aug 8 at 15:30

list(set(list(str1)))

list(set(str1))

else

string

string2

if ... else

str.count

collections.Counter

min(string1.count(char), str2.count(char)) <- that's a bug. You should be using str1 there.
– Aran-Fey
Aug 8 at 15:33

min(string1.count(char), str2.count(char))

str1

@Aran-Fey The if-else is needed to get the string of minimum length. I wouldn't need it if I knew for sure that the strings were of different length, but as they could be, I'm using the if-else.
– Adi219
Aug 8 at 15:33

This works pretty well for your test cases:

def intersect(haystack, needle): while needle: pos = haystack.find(needle) if pos >= 0: return list(needle) needle = needle[:-1] return

But, bear in mind that, all your test cases are longer then shorter, do not have an empty search term, an empty search space, or a non-match.

Unfortunately the test cases don't come anywhere close to covering all the requirements. If you read the comments below the question you should see that this doesn't do what the OP wants.
– Aran-Fey
Aug 8 at 15:21

Gives the number of co-occurrence for all n-grams in the two strings:

from collections import Counter def all_ngrams(text): ngrams = ( text[i:i+n] for n in range(1, len(text)+1) for i in range(len(text)-n+1) ) return Counter(ngrams) def intersection(string1, string2): count_1 = all_ngrams(string1) count_2 = all_ngrams(string2) return count_1 & count_2 # intersection: min(c[x], d[x]) intersection('foo', 'f') # Counter('f': 1) intersection('foo', 'o') # Counter('o': 1) intersection('foobar', 'foo') # Counter('f': 1, 'fo': 1, 'foo': 1, 'o': 2, 'oo': 1) intersection('abhab', 'abab') # Counter('a': 2, 'ab': 2, 'b': 2) intersection('achab', 'abac') # Counter('a': 2, 'ab': 1, 'ac': 1, 'b': 1, 'c': 1) intersection('test', 'ates') # Counter('e': 1, 'es': 1, 's': 1, 't': 1, 'te': 1, 'tes': 1)

By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

搜尋此網誌

Sfyjdyy

What's the best way to find the intersection between two strings?

What's the best way to find the intersection between two strings?

3 Answers
3

Popular posts from this blog

make 2 or more post in bootsrap

Store custom data using WC_Cart add_to_cart() method in Woocommerce 3

React Native Navigation and navigating to another Screen problem

What's the best way to find the intersection between two strings?

What's the best way to find the intersection between two strings?

3 Answers 3

Popular posts from this blog

make 2 or more post in bootsrap

Store custom data using WC_Cart add_to_cart() method in Woocommerce 3

React Native Navigation and navigating to another Screen problem

3 Answers
3