Pandas merge or join in smaller dataframe

Clash Royale CLAN TAG#URR8PPP
Pandas merge or join in smaller dataframe
I have an issue whereby I have one long dataframe and one short dataframe, and I want to merge so that the shorter dataframe repeats itself to fill the length of the longer (left) df.
df1:
| Index | Wafer | Chip | Value |
---------------------------------
| 0 | 1 | 32 | 0.99 |
| 1 | 1 | 33 | 0.89 |
| 2 | 1 | 39 | 0.96 |
| 3 | 2 | 32 | 0.81 |
| 4 | 2 | 33 | 0.87 |
df2:
| Index | x | y |
-------------------------
| 0 | 1 | 3 |
| 1 | 2 | 2 |
| 2 | 1 | 6 |
df_combined:
| Index | Wafer | Chip | Value | x | y |
-------------------------------------------------
| 0 | 1 | 32 | 0.99 | 1 | 3 |
| 1 | 1 | 33 | 0.89 | 2 | 2 |
| 2 | 1 | 39 | 0.96 | 1 | 6 |
| 3 | 2 | 32 | 0.81 | 1 | 3 | <--- auto-repeats...
| 4 | 2 | 33 | 0.87 | 2 | 2 |
Is this a built in join/merge-type, or requiring a loop of some sort?
This is just false data, but dfs are over 1000 rows...
Current code is a simple outer merge, but doesn't provide the fill/repeat to end:
df = main.merge(df_coords, left_index=True, right_index = True, how='outer') and just gives NaNs.
df = main.merge(df_coords, left_index=True, right_index = True, how='outer')
I've checked around:
Merge two python pandas data frames of different length but keep all rows in output data frame
pandas: duplicate rows from small dataframe to large based on cell value
and it feels like this could be an arguement somewhere in a merge function... but I can't find it.
Any help gratefully received.
Thanks
2 Answers
2
You can repeat df2 until it's as long as df1, then reset_index and merge:
df2
df1
reset_index
merge
new_len = round(len(df1)/len(df2))
repeated = (pd.concat([df2] * new_len)
.reset_index()
.drop(["index"], 1)
.iloc[:len(df1)])
repeated
x y
0 1 3
1 2 2
2 1 6
3 1 3
4 2 2
df1.merge(repeated, how="outer", left_index=True, right_index=True)
Wafer Chip Value x y
0 1 32 0.99 1 3
1 1 33 0.89 2 2
2 1 39 0.96 1 6
3 2 32 0.81 1 3
4 2 33 0.87 2 2
A little hacky, but it should work.
Note: I'm assuming your Index column is not actually a column, but is in fact intended to represent the data frame index. I'm making this assumption because you refer to left_index/right_index args in your merge() code. If Index is actually its own column, this code will basically work, you'll just need to drop Index as well if you don't want it in the final df.
Index
left_index
right_index
merge()
Index
drop
Index
df
Regarding your edit: correct. In my actual code i've used
left_index=True and right_index=True– BAC83
Aug 6 at 11:29
left_index=True
right_index=True
Great, glad I could help.
– andrew_reece
Aug 6 at 11:31
You can achieve this with a left join on the value of df1["Index"] mod the length of df2["Index"]:
df1["Index"]
df2["Index"]
# Creating Modular Index values on df1
n = df2.shape[0]
df1["Modular Index"] = df1["Index"].apply(lambda x: str(int(x)%n))
# Merging dataframes
df_combined = df1.merge(df2, how="left", left_on="Modular Index", right_on="Index")
# Dropping unnecessary columns
df_combined = df_combined.drop(["Modular Index", "Index_y"], axis=1)
print(df_combined)
0 Index_x Wafer Chip Value x y
0 0 1 32 0.99 1 3
1 1 1 33 0.89 2 2
2 2 1 39 0.96 1 6
3 3 2 32 0.81 1 3
4 4 2 33 0.87 2 2
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.
That works, thanks! I would love to know if there is a more built in way, but that's great for now, cheers.
– BAC83
Aug 6 at 11:02