Replace missing values from list in Pandas

I have a dataframe like this

A B 2018-02-01 00:00:00 5.592860 2.789900 2018-02-01 00:15:00 5.288981 2.054017 2018-02-01 00:30:00 5.319665 2.232686 2018-02-01 00:45:00 5.198657 2.236154 2018-02-01 01:00:00 5.018134 2.064312

The Column A will not have any missing values, but column B will be having.

I have another list like this of length 12 named forecasts

[0.09545173 0.09946214 0.10596157 0.12075519 0.14446978 0.16848372 0.20479251 0.23742175 0.26723814 0.29389328 0.30628437 0.3140854 ]

I want to loop through each row in the dataframe and check whether next 12 rows have any nan. If nan is present, replace that value from the corresponding index from the list.

nan

If the 2nd row from the current index is nan then replace that nan by forecasts[2]

nan

forecasts[2]

To make things more clear,

I will have a dataframe with data like I said the question. There may be misses in the column B but not A. I will have a list of 12 location for each timestamp. The list being named forecast will have the forecasted value of present timestamp in forecasts[0] and 11th timestamp from now in forecast[11]. I want to loop though each timestamp in the dataset, check whether any nan is present in the next 12 locations of the dataframe in the column B.

If there are nan, then replace it with the forecast.

How can I do this easily with pandas.?

In your example, the one to fill in the 2nd row should probably be forecasts[1]? Python's indexing starts at 0.
– Tai
Aug 10 at 4:57

@Tai in the list, forecasts[0] will correspond to the value at the current timestamp
– Sreeram TP
Aug 10 at 5:07

I think you should define forecasts. It seems to pop up from nowhere? I think that is the list but...I don't know now.
– Tai
Aug 10 at 5:08

forecasts

Okay, that list is named as forecasts
– Sreeram TP
Aug 10 at 5:10

Will you want to do this many times to different rows of the df or it will always start at the first row?
– Tai
Aug 10 at 5:14

3 Answers
3

You can use:

import pandas as pd import numpy as np temp=u"""A;B 2018-02-01 00:00:00;5.592860;2.789900 2018-02-01 00:15:00;5.288981;NaN 2018-02-01 00:30:00;5.319665;2.232686 2018-02-01 00:45:00;5.198657;2.236154 2018-02-01 01:00:00;5.018134;2.064312 2018-02-01 01:15:00;5.018134;NaN """ #after testing replace 'pd.compat.StringIO(temp)' to 'filename.csv' df = pd.read_csv(pd.compat.StringIO(temp), sep=";", parse_dates=True) print (df) A B 2018-02-01 00:00:00 5.592860 2.789900 2018-02-01 00:15:00 5.288981 NaN 2018-02-01 00:30:00 5.319665 2.232686 2018-02-01 00:45:00 5.198657 2.236154 2018-02-01 01:00:00 5.018134 2.064312 2018-02-01 01:15:00 5.018134 NaN

L = [0.09545173, 0.09946214, 0.10596157] r = int(len(df) / len(L)) print (r) 2 s = pd.Series(np.tile(np.array(L), r), index=df.index) print (s) 2018-02-01 00:00:00 0.095452 2018-02-01 00:15:00 0.099462 2018-02-01 00:30:00 0.105962 2018-02-01 00:45:00 0.095452 2018-02-01 01:00:00 0.099462 2018-02-01 01:15:00 0.105962 dtype: float64 df['B'] = df['B'].fillna(s) print (df) A B 2018-02-01 00:00:00 5.592860 2.789900 2018-02-01 00:15:00 5.288981 0.099462 2018-02-01 00:30:00 5.319665 2.232686 2018-02-01 00:45:00 5.198657 2.236154 2018-02-01 01:00:00 5.018134 2.064312 2018-02-01 01:15:00 5.018134 0.105962

Let me make things more clear. I will have a dataframe with data like I said the question. There may be misses in the column B but not A. I will have a list of 12 location for each timestamp. The list being named forecast will have the forecasted value of present timestamp in forecasts[0]. I want to loop though each timestamp in the dataset, check whether any nan is present in the next 12 locations of the dataframe in the column B
– Sreeram TP
Aug 10 at 5:28

@SreeramTP - So there is e.g. 24 rows and need replace first 12 and then 12 rows?
– jezrael
Aug 10 at 5:33

If there are 24 rows in the actual dataframe, I will loop through it. First i am at timestamp t0, I will have forecast list. I will have to check whether there are nan in next 11 locations. ie t1 to t11. Suppose there is a nan in t5. Then I have to replace it with forecasts[5] in the actual dataframe. Then I will move to timestamp t2 and do the same
– Sreeram TP
Aug 10 at 5:36

@SreeramTP - I edit answer for length of 3
– jezrael
Aug 10 at 5:39

3

It works as expected.. Cool..!!
– Sreeram TP
Aug 10 at 9:51

OR a one-liner list comprehension:

df['B'] = [l[i] if type(v)==type(np.nan) else v for i,v in enumerate(df['B'].tolist())]

Well this is the basic issue to handle or work with missing or unknown values.
You can use fillna() function to fill missing values with default value that you want.

e.g: If df1 is your dataframe containing missing values in multiple columns.

for column in df1: print("column ",column) df1[column] = df1[column].fillna(0.12345)

To check fillna syntax and example check https://kite.com/python/docs/pandas.core.frame.DataFrame.fillna

You can also use pandas isna() function to check where values are missing.

By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

搜尋此網誌

Sfyjdyy

Replace missing values from list in Pandas

Replace missing values from list in Pandas

3 Answers
3

Popular posts from this blog

Firebase Auth - with Email and Password - Check user already registered

Dynamically update html content plain JS

How to determine optimal route across keyboard

Replace missing values from list in Pandas

Replace missing values from list in Pandas

3 Answers 3

Popular posts from this blog

Firebase Auth - with Email and Password - Check user already registered

Dynamically update html content plain JS

How to determine optimal route across keyboard

3 Answers
3