Match word (starting with plus symbol) in pandas data frames

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP



Match word (starting with plus symbol) in pandas data frames



I have two pandas data frames. I would like to find matching strings in one specific column ("keyword") exist in both data frames.


keyword adGroup goal6Value adCost
[aaaa] (not set) 0 0.0
+bb +bb (not set) 0 0.0
+cc +cc (not set) 2072 0.0
[dddd] (not set) 0 0.0



The second data frame:


keyword status Max Min
[aaaa] (not set) 0.1 0.0
+bb +bb (not set) 0.2 0.0
+ff +ff (not set) 0.1 0.0
[gggg] (not set) 0.3 0.0



I would like the output to return all columns if the keyword is available in both data frames (keyword column). The output should look like this:


keyword status Max Min adGroup goal6Value adCost
[aaaa] (not set) 0.1 0.0 (not set) 0 0.0
+bb +bb (not set) 0.2 0.0 (not set) 0 0.0



I have changed the data type for keyword column into string for both data frames.
I have tried these options:


pd.merge(df1, df2, on='keyword')



and


df1.set_index('keyword').join(df2.set_index('keyword'))



However, both options only matched the keyword with brackets and did not return the keywords starting with a plus symbol even when they are available in both data frames.



Is there a way to match the keyword with the plus symbol as well in pandas?




3 Answers
3



I cannot recreate your issue, the below test works fine. I'd suggest casting your keyword column as dtype object in both dataframes (df1['keyword'] = df1['keyword'].astype(object) | df2['keyword'] = df2['keyword'].astype(object))


df1['keyword'] = df1['keyword'].astype(object)


df2['keyword'] = df2['keyword'].astype(object)



dtype object seems to work for me, as shown below:


data_1 = 'keyword': ['[aaaa]','+bb +bb','+cc +cc','[dddd]'],
'adGroup': ['(not set)','(not set)','(not set)','(not set)'],
'goal6Value': ['0','0','2072','0'],
'adCost': ['0.0','0.0','0.0','0.0']

data_2 = 'keyword': ['[aaaa]','+bb +bb','+ff +ff','[gggg]'],
'status': ['(not set)','(not set)','(not set)','(not set)'],
'Max': ['0.1','0.2','0.1','0.3'],
'Min': ['0.0','0.0','0.0','0.0']

df_1 = pd.DataFrame(data_1)
df_2 = pd.DataFrame(data_2)

test = pd.merge(df_1, df_2, on='keyword')
test.head()

keyword adGroup goal6Value adCost status Max Min
0 [aaaa] (not set) 0 0.0 (not set) 0.1 0.0
1 +bb +bb (not set) 0 0.0 (not set) 0.2 0.0

test.dtypes

keyword object
adGroup object
goal6Value object
adCost object
status object
Max object
Min object
dtype: object



Alternatively, perhaps there are some leading/lagging spaces on your keyword column that may not exist across dataframes. This can be remedied with Pandas.series.str.strip(). Pandas docs.





Thanks! The leading/lagging spaces are the issues. I've removed leading/lagging spaces in the data frames df1['keyword'] = df1['keyword'].str.strip() and pd.merge() matched the keywords with plus symbols.
– mahf_i
Aug 8 at 14:41


df1['keyword'] = df1['keyword'].str.strip()


pd.merge()



I could not recreate the issue as I could merge the two dfs beblow


df1=pd.DataFrame('keyword':['[aaaa]','+bbbb'],'adGroup':['something','something'])
df2=pd.DataFrame('keyword':['[aaaa]','+bbbb'],'adGroup':['something2','something2'])
df1.merge(df2,on='keyword')

adGroup_x keyword adGroup_y
0 something [aaaa] something2
1 something +bbbb something2



May be you need to change the type.





I have tried this solution and does not give the output I want. I have changed the data type too.
– mahf_i
Aug 8 at 13:33





@mahf_i you copy all the code above and not working?
– James Liu
Aug 8 at 13:41





@JamesLiu I have updated the question to better reflect my issue. The keyword can have multiple plus symbols.
– mahf_i
Aug 8 at 13:57





@mahf_i No of + does not matter if they are of the same type. merge is similar to sql join statement and works pretty much in the same fashion.
– mad_
Aug 8 at 14:00



+



EDITED



pd.merge work fine, I can't reproduce the problem, too


pd.merge(df1, df2, on='keyword')






By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

Popular posts from this blog

Firebase Auth - with Email and Password - Check user already registered

Dynamically update html content plain JS

How to determine optimal route across keyboard