Match word (starting with plus symbol) in pandas data frames
Clash Royale CLAN TAG#URR8PPP
Match word (starting with plus symbol) in pandas data frames
I have two pandas data frames. I would like to find matching strings in one specific column ("keyword") exist in both data frames.
keyword adGroup goal6Value adCost
[aaaa] (not set) 0 0.0
+bb +bb (not set) 0 0.0
+cc +cc (not set) 2072 0.0
[dddd] (not set) 0 0.0
The second data frame:
keyword status Max Min
[aaaa] (not set) 0.1 0.0
+bb +bb (not set) 0.2 0.0
+ff +ff (not set) 0.1 0.0
[gggg] (not set) 0.3 0.0
I would like the output to return all columns if the keyword is available in both data frames (keyword column). The output should look like this:
keyword status Max Min adGroup goal6Value adCost
[aaaa] (not set) 0.1 0.0 (not set) 0 0.0
+bb +bb (not set) 0.2 0.0 (not set) 0 0.0
I have changed the data type for keyword column into string for both data frames.
I have tried these options:
pd.merge(df1, df2, on='keyword')
and
df1.set_index('keyword').join(df2.set_index('keyword'))
However, both options only matched the keyword with brackets and did not return the keywords starting with a plus symbol even when they are available in both data frames.
Is there a way to match the keyword with the plus symbol as well in pandas?
3 Answers
3
I cannot recreate your issue, the below test works fine. I'd suggest casting your keyword column as dtype object in both dataframes (df1['keyword'] = df1['keyword'].astype(object)
| df2['keyword'] = df2['keyword'].astype(object)
)
df1['keyword'] = df1['keyword'].astype(object)
df2['keyword'] = df2['keyword'].astype(object)
dtype object seems to work for me, as shown below:
data_1 = 'keyword': ['[aaaa]','+bb +bb','+cc +cc','[dddd]'],
'adGroup': ['(not set)','(not set)','(not set)','(not set)'],
'goal6Value': ['0','0','2072','0'],
'adCost': ['0.0','0.0','0.0','0.0']
data_2 = 'keyword': ['[aaaa]','+bb +bb','+ff +ff','[gggg]'],
'status': ['(not set)','(not set)','(not set)','(not set)'],
'Max': ['0.1','0.2','0.1','0.3'],
'Min': ['0.0','0.0','0.0','0.0']
df_1 = pd.DataFrame(data_1)
df_2 = pd.DataFrame(data_2)
test = pd.merge(df_1, df_2, on='keyword')
test.head()
keyword adGroup goal6Value adCost status Max Min
0 [aaaa] (not set) 0 0.0 (not set) 0.1 0.0
1 +bb +bb (not set) 0 0.0 (not set) 0.2 0.0
test.dtypes
keyword object
adGroup object
goal6Value object
adCost object
status object
Max object
Min object
dtype: object
Alternatively, perhaps there are some leading/lagging spaces on your keyword column that may not exist across dataframes. This can be remedied with Pandas.series.str.strip(). Pandas docs.
df1['keyword'] = df1['keyword'].str.strip()
pd.merge()
I could not recreate the issue as I could merge the two dfs beblow
df1=pd.DataFrame('keyword':['[aaaa]','+bbbb'],'adGroup':['something','something'])
df2=pd.DataFrame('keyword':['[aaaa]','+bbbb'],'adGroup':['something2','something2'])
df1.merge(df2,on='keyword')
adGroup_x keyword adGroup_y
0 something [aaaa] something2
1 something +bbbb something2
May be you need to change the type.
I have tried this solution and does not give the output I want. I have changed the data type too.
– mahf_i
Aug 8 at 13:33
@mahf_i you copy all the code above and not working?
– James Liu
Aug 8 at 13:41
@JamesLiu I have updated the question to better reflect my issue. The keyword can have multiple plus symbols.
– mahf_i
Aug 8 at 13:57
@mahf_i No of
+
does not matter if they are of the same type. merge is similar to sql join statement and works pretty much in the same fashion.– mad_
Aug 8 at 14:00
+
EDITED
pd.merge work fine, I can't reproduce the problem, too
pd.merge(df1, df2, on='keyword')
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.
Thanks! The leading/lagging spaces are the issues. I've removed leading/lagging spaces in the data frames
df1['keyword'] = df1['keyword'].str.strip()
andpd.merge()
matched the keywords with plus symbols.– mahf_i
Aug 8 at 14:41