Most efficient way to loop through and update rows in a large pandas dataframe
Clash Royale CLAN TAG#URR8PPP
Most efficient way to loop through and update rows in a large pandas dataframe
This is my piece of code to update the rows of a dataframe:
def arrangeData(df):
hour_from_timestamp_list =
date_from_timestamp_list =
for row in df.itertuples():
timestamp = row.timestamp
hour_from_timestamp = datetime.fromtimestamp(
int(timestamp) / 1000).strftime('%H:%M:%S')
date_from_timestamp = datetime.fromtimestamp(
int(timestamp) / 1000).strftime('%d-%m-%Y')
hour_from_timestamp_list.append(hour_from_timestamp)
date_from_timestamp_list.append(date_from_timestamp)
df['Time'] = hour_from_timestamp_list
df['Hour'] = pd.to_datetime(df['Time']).dt.hour
df['ChatDate'] = date_from_timestamp_list
return df
Im trying to extract time, hour and chatdate from timestamp. The code is working fine. But when theres huge set of data, somewhere around 300,000 rows, the function is extremely slow. Can anyone suggest a better way to execute this function faster?
For looping I have tried iterrows() which was even more slower.
This is the document that im processing :
"_id" : ObjectId("5b9feadc32214d2b504ea6e1"),
"id" : 34176,
"timestamp" : NumberLong(1535019434998),
"platform" : "Email",
"sessionId" : LUUID("08a5caac-baa3-11e8-a508-106530216ef0"),
"intentStatus" : "NotHandled",
"botId" : "tony"
@jezrael editted the question with the data sample
– Tony Mathew
19 secs ago
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.
Can you add some data sample?
– jezrael
2 mins ago