Timeseries with overlapping timeframes, using just the most recent in Excel Power Query
Clash Royale CLAN TAG#URR8PPP
Timeseries with overlapping timeframes, using just the most recent in Excel Power Query
i have the following problem:
In Power Query i can bring together multiple csv files with power query, with the function to load from a folder. Unfortenatuly, the timeseries data we get is overlapping, with changed values for the entire timeseries.
So for example we get a file for the last 30 days every monday. The values in the last 30 days do change, so the values from my report last week might not be up to date anymore. My goal is to overwrite the old data from last week with the new data from this week.
Example:
File A
Date Item Hits Solved
01.01.2018 A 100 50
01.01.2018 B 138 65
02.01.2018 A 124 70
...
07.01.2018 A 143 50
07.01.2018 B 147 57
...
30.01.2018 A 250 124
File B
Date Item Hits Solved
07.07.2018 A 143 80
07.07.2018 B 147 95
...
06.02.2018 A 341 148
The data from file A should be updated with the new information from File B, keeping only the values from (in this case) the 01.01.2018 to the 06.01.2018 from file A. The rest should be from file B.
Until now i tried to give my files numbers (the newer the higher, so file A -> 1, file B -> 2), making a new (key) column with Date+Item, sorting (descendant) by file number and then removing duplicates in the key column. This sometimes works (only keeping the most recent entries), but sometimes it does not. Removing duplicates from the Date Column does not work for me. Here it just remove the newest entries, keeping the old.
Is there a better way to solve this problem?
Thanks alot!
2 Answers
2
Go the step where you are sorting the data based on your file order and put it inside Table.Buffer(), so it would look like
= Table.Buffer(Table.Sort(BLAH BLAH TABLE SORTING CODE))
I don't remember the technical reasons, but for purposes here this resets the "internal" sort order PQ uses when running remove duplicates to follow the visual sort order of the table.
Also it's not necessary to create a key column, you can just select your Item + Date Columns and remove duplicates and it will accomplish the same thing.
I believe when you use remove duplicates the function removes them based on the original order the data was loaded in. Table.Buffer() reloads data from whatever you referenced, so the "load" order remove duplicates uses is reset as well. I'm assuming there is an internal key for data you don't see that PQ is using on the backend.
– Wedge
Aug 13 at 16:14
I would suggest starting with FileA
and doing a left anti merge with FileB
on the Date
and Item
columns. This will remove any rows from FileA
that have matching Date
and Item
rows in FileB
.
FileA
FileB
Date
Item
FileA
Date
Item
FileB
Then delete the new column that this merge created and append File B
to the end and you should have the result you want.
File B
Thank you for your comment. This seems to work, but from my limited knowledge you can not merge files from a folder, can you? I thought you can just append files!?
– Arthur Pennt
Aug 13 at 9:08
There's no group merge, as far as I know, but you should still be able to use this approach by doing two folder loads. One where you filter out the most recent file (Query A) and one where you just pick the most recent file (Query B) and then treat Query A and Query B like
FileA
and FileB
in my answer.– Alexis Olson
Aug 13 at 13:49
FileA
FileB
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.
This works great. Thank you! Do i understand it correctly that PQ internally does not change/recalculate the order if you do not use Table.Buffer()?
– Arthur Pennt
Aug 13 at 9:06