add 1 value to new column and perform operation on it with other columns
Clash Royale CLAN TAG#URR8PPP
add 1 value to new column and perform operation on it with other columns
I have a dataframe,
x y z new_col
Nan NaN Nan 1
1 2 3 4
2 3 4 20
2 1 3 60
So basically formula is,
1 of new_col that we set first 1 * (1 + 3), then 4 * (1 + 4), then 20 * (1+3), so on..
How should I first create a new column (new_col) and then add 1 at first row, then perform calculation.
x
y
z
Expected output is the new column (new_col), create another column new_col and make it first value to 1, then consider that 1 for second value of new_col, then consider previous value for third value of new_col, so on.
– Barsmansvaps Friends
Aug 10 at 16:14
I think I see it: the expected output is included. The formula for column "new-col", row
n
, is new_col[n] = new_col[n-1] * (1 + z[n])
– Prune
Aug 10 at 16:14
n
new_col[n] = new_col[n-1] * (1 + z[n])
@Prune I thought of the same thing.
– Haris
Aug 10 at 16:15
@BarsmansvapsFriends: what have you tried? There should be examples for cumulative sum that you could adapt, and perhaps a handful for cumulative product.
– Prune
Aug 10 at 16:15
2 Answers
2
cumprod
df.assign(new_col=df.z.fillna(0).add(1).cumprod())
x y z new_col
0 NaN NaN NaN 1.0
1 1.0 2.0 3.0 4.0
2 2.0 3.0 4.0 20.0
3 2.0 1.0 3.0 80.0
Attempt to preserve dtype
df.assign(new_col=df.z.fillna(0, downcast='infer').add(1).cumprod())
x y z new_col
0 NaN NaN NaN 1
1 1.0 2.0 3.0 4
2 2.0 3.0 4.0 20
3 2.0 1.0 3.0 80
The point of this is to show how to perform a linear path dependent calculation. Numba is very fast and if the calculation has a time complexity of O(N) then you don't have to be afraid of using this loop in Numba.
If you don't have numba
installed and don't want to install it, just remove the @njit
decorator.
numba
@njit
from numba import njit
@njit
def f(a):
out = np.zeros_like(a)
out[0] = 1
for i, x in enumerate(a[1:], 1):
out[i] = out[i-1] * (1 + x)
return out
df.assign(new_col=f(df.z.values))
x y z new_col
0 NaN NaN NaN 1.0
1 1.0 2.0 3.0 4.0
2 2.0 3.0 4.0 20.0
3 2.0 1.0 3.0 80.0
With int
int
df.assign(new_col=f(df.z.fillna(0).astype(int).values))
x y z new_col
0 NaN NaN NaN 1
1 1.0 2.0 3.0 4
2 2.0 3.0 4.0 20
3 2.0 1.0 3.0 80
If you are looking for a much simpler solution, this will be helpful.
You can just create a new column named "new_column" and initialize all the values to 1 (since the first value should be 1).
df['new_column'] = 1
Then, you can use a for loop to iterate through the rows and update the new column values according to your formula.
for i in range(1, len(df)):
df.loc[i, 'new_column'] = df['new_column'][i-1] * (1 + df['Z'][i])
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.
Sorry, I can't understand the formula. Please include the expected output and clarify what you're trying to do. How does the formula relate to
x
,y
,z
given in the DF?– roganjosh
Aug 10 at 16:12