Pandas merge or join in smaller dataframe

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP



Pandas merge or join in smaller dataframe



I have an issue whereby I have one long dataframe and one short dataframe, and I want to merge so that the shorter dataframe repeats itself to fill the length of the longer (left) df.


df1:

| Index | Wafer | Chip | Value |
---------------------------------
| 0 | 1 | 32 | 0.99 |
| 1 | 1 | 33 | 0.89 |
| 2 | 1 | 39 | 0.96 |
| 3 | 2 | 32 | 0.81 |
| 4 | 2 | 33 | 0.87 |

df2:

| Index | x | y |
-------------------------
| 0 | 1 | 3 |
| 1 | 2 | 2 |
| 2 | 1 | 6 |


df_combined:

| Index | Wafer | Chip | Value | x | y |
-------------------------------------------------
| 0 | 1 | 32 | 0.99 | 1 | 3 |
| 1 | 1 | 33 | 0.89 | 2 | 2 |
| 2 | 1 | 39 | 0.96 | 1 | 6 |
| 3 | 2 | 32 | 0.81 | 1 | 3 | <--- auto-repeats...
| 4 | 2 | 33 | 0.87 | 2 | 2 |



Is this a built in join/merge-type, or requiring a loop of some sort?



This is just false data, but dfs are over 1000 rows...



Current code is a simple outer merge, but doesn't provide the fill/repeat to end:



df = main.merge(df_coords, left_index=True, right_index = True, how='outer') and just gives NaNs.


df = main.merge(df_coords, left_index=True, right_index = True, how='outer')



I've checked around:
Merge two python pandas data frames of different length but keep all rows in output data frame
pandas: duplicate rows from small dataframe to large based on cell value



and it feels like this could be an arguement somewhere in a merge function... but I can't find it.
Any help gratefully received.



Thanks




2 Answers
2



You can repeat df2 until it's as long as df1, then reset_index and merge:


df2


df1


reset_index


merge


new_len = round(len(df1)/len(df2))
repeated = (pd.concat([df2] * new_len)
.reset_index()
.drop(["index"], 1)
.iloc[:len(df1)])

repeated
x y
0 1 3
1 2 2
2 1 6
3 1 3
4 2 2

df1.merge(repeated, how="outer", left_index=True, right_index=True)
Wafer Chip Value x y
0 1 32 0.99 1 3
1 1 33 0.89 2 2
2 1 39 0.96 1 6
3 2 32 0.81 1 3
4 2 33 0.87 2 2



A little hacky, but it should work.



Note: I'm assuming your Index column is not actually a column, but is in fact intended to represent the data frame index. I'm making this assumption because you refer to left_index/right_index args in your merge() code. If Index is actually its own column, this code will basically work, you'll just need to drop Index as well if you don't want it in the final df.


Index


left_index


right_index


merge()


Index


drop


Index


df





That works, thanks! I would love to know if there is a more built in way, but that's great for now, cheers.
– BAC83
Aug 6 at 11:02





Regarding your edit: correct. In my actual code i've used left_index=True and right_index=True
– BAC83
Aug 6 at 11:29


left_index=True


right_index=True





Great, glad I could help.
– andrew_reece
Aug 6 at 11:31



You can achieve this with a left join on the value of df1["Index"] mod the length of df2["Index"]:


df1["Index"]


df2["Index"]


# Creating Modular Index values on df1
n = df2.shape[0]
df1["Modular Index"] = df1["Index"].apply(lambda x: str(int(x)%n))

# Merging dataframes
df_combined = df1.merge(df2, how="left", left_on="Modular Index", right_on="Index")

# Dropping unnecessary columns
df_combined = df_combined.drop(["Modular Index", "Index_y"], axis=1)




print(df_combined)

0 Index_x Wafer Chip Value x y
0 0 1 32 0.99 1 3
1 1 1 33 0.89 2 2
2 2 1 39 0.96 1 6
3 3 2 32 0.81 1 3
4 4 2 33 0.87 2 2






By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

Popular posts from this blog

make 2 or more post in bootsrap

Store custom data using WC_Cart add_to_cart() method in Woocommerce 3

Firebase Auth - with Email and Password - Check user already registered