Iteration for time series data, using purrr
Clash Royale CLAN TAG#URR8PPP
Iteration for time series data, using purrr
I have a bunch of time series data stacked on top of one another in a data frame; one series for each region in a country. I'd like to apply the seas()
function (from the seasonal
package) to each series, iteratively, to make the series seasonally adjusted. To do this, I first have to convert the series to a ts
class. I'm struggling to do all this using purrr
.
seas()
seasonal
ts
purrr
Here's a minimum worked example:
library(seasonal)
library(tidyverse)
set.seed(1234)
df <- data.frame(region = rep(1:10, each = 20),
quarter = rep(1:20, 10),
var = sample(5:200, 200, replace = T))
For each region (indexed by a number) I'd like to perform the following operations. Here's the first region as an example:
tem1 <- df %>% filter(region==1)
tem2 <- ts(data = tem1$var, frequency = 4, start=c(1990,1))
tem3 <- seas(tem2)
tem4 <- as.data.frame(tem3$data)
I'd then like to stack the output (ie. the multiple tem4 data frames, one for each region), along with the region and quarter identifiers.
So, the start of the output for region 1 would be this:
final seasonaladj trend irregular region quarter
1 27 27 96.95 -67.97279 1 1
2 126 126 96.95 27.87381 1 2
3 124 124 96.95 27.10823 1 3
4 127 127 96.95 30.55075 1 4
5 173 173 96.95 75.01355 1 5
6 130 130 96.95 32.10672 1 6
The data for region 2 would be below this etc.
I started with the following but without luck so far. Basically, I'm struggling to get the time series into the tibble:
seas.adjusted <- df %>%
group_by(region) %>%
mutate(data.ts = map(.x = data$var,
.f = as.ts,
start = 1990,
freq = 4))
edits in original.
– lethalSinger
Aug 12 at 21:07
2 Answers
2
I don't know much about the seasonal adjustment part, so there may be things I missed, but I can help with moving your calculations into a map
-friendly function.
map
After grouping by region, you can nest the data so there's a nested data frame for each region. Then you can run essentially the same code as you had, but inside a function in map
. Unnesting the resulting column gives you a long-shaped data frame of adjustments.
map
Like I said, I don't have the expertise to know whether those last two columns having NA
s is expected or not.
NA
Edit: Based on @wibeasley's question about retaining the quarter
column, I'm adding a mutate
that adds a column of the quarters listed in the nested data frame.
quarter
mutate
library(seasonal)
library(tidyverse)
set.seed(1234)
df <- data.frame(region = rep(1:10, each = 20),
quarter = rep(1:20, 10),
var = sample(5:200, 200, replace = T))
df %>%
group_by(region) %>%
nest() %>%
mutate(data.ts = map(data, function(x)
tem2 <- ts(x$var, frequency = 4, start = c(1990, 1))
tem3 <- seas(tem2)
as.data.frame(tem3$data) %>%
mutate(quarter = x$quarter)
)) %>%
unnest(data.ts)
#> # A tibble: 200 x 8
#> region final seasonaladj trend irregular quarter seasonal adjustfac
#> <int> <dbl> <dbl> <dbl> <dbl> <int> <dbl> <dbl>
#> 1 1 27 27 97.0 -68.0 1 NA NA
#> 2 1 126 126 97.0 27.9 2 NA NA
#> 3 1 124 124 97.0 27.1 3 NA NA
#> 4 1 127 127 97.0 30.6 4 NA NA
#> 5 1 173 173 97.0 75.0 5 NA NA
#> 6 1 130 130 97.0 32.1 6 NA NA
#> 7 1 6 6 97.0 -89.0 7 NA NA
#> 8 1 50 50 97.0 -46.5 8 NA NA
#> 9 1 135 135 97.0 36.7 9 NA NA
#> 10 1 105 105 97.0 8.81 10 NA NA
#> # ... with 190 more rows
I also gave a bit more thought to doing this without nesting, and instead tried doing it with a split
. Passing that list of data frames into imap_dfr
let me take each split piece of the data frame and its name (in this case, the value of region
), then return everything rbind
ed back together into one data frame. I sometimes shy away from nested data just because I have trouble seeing what's going on, so this is an alternative that is maybe more transparent.
split
imap_dfr
region
rbind
df %>%
split(.$region) %>%
imap_dfr(function(x, reg)
tem2 <- ts(x$var, frequency = 4, start = c(1990, 1))
tem3 <- seas(tem2)
as.data.frame(tem3$data) %>%
mutate(region = reg, quarter = x$quarter)
) %>%
select(region, quarter, everything()) %>%
head()
#> region quarter final seasonaladj trend irregular seasonal adjustfac
#> 1 1 1 27 27 96.95 -67.97274 NA NA
#> 2 1 2 126 126 96.95 27.87378 NA NA
#> 3 1 3 124 124 96.95 27.10823 NA NA
#> 4 1 4 127 127 96.95 30.55077 NA NA
#> 5 1 5 173 173 96.95 75.01353 NA NA
#> 6 1 6 130 130 96.95 32.10669 NA NA
Created on 2018-08-12 by the reprex package (v0.2.0).
@camile, I love your use of
nest()
. Do you have ideas how to include a quarter
column? Experimenting on my own, I used nest()
with a function similar to my f()
below. But it's still using my hack of reintroducing quarter
.– wibeasley
Aug 12 at 21:55
nest()
quarter
nest()
f()
quarter
@wibeasley Good question, I'm revisiting that now
– camille
Aug 12 at 22:39
I put all the action inside of f()
, and then called it with purrr::map_df()
. The re-inclusion of quarter
is a hack.
f()
purrr::map_df()
quarter
f <- function( .region )
d <- df %>%
dplyr::filter(region == .region)
y <- d %>%
dplyr::pull(var) %>%
ts(frequency = 4, start=c(1990,1)) %>%
seas()
y$data %>%
as.data.frame() %>%
# dplyr::select(-seasonal, -adjustfac) %>%
dplyr::mutate(
quarter = d$quarter
)
purrr::map_df(1:10, f, .id = "region")
results:
region final seasonaladj trend irregular quarter seasonal adjustfac
1 1 27.00000 27.00000 96.95000 -6.797279e+01 1 NA NA
2 1 126.00000 126.00000 96.95000 2.787381e+01 2 NA NA
3 1 124.00000 124.00000 96.95000 2.710823e+01 3 NA NA
4 1 127.00000 127.00000 96.95000 3.055075e+01 4 NA NA
5 1 173.00000 173.00000 96.95000 7.501355e+01 5 NA NA
6 1 130.00000 130.00000 96.95000 3.210672e+01 6 NA NA
7 1 6.00000 6.00000 96.95000 -8.899356e+01 7 NA NA
8 1 50.00000 50.00000 96.95000 -4.647254e+01 8 NA NA
9 1 135.00000 135.00000 96.95000 3.671077e+01 9 NA NA
10 1 105.00000 105.00000 96.95000 8.806955e+00 10 NA NA
...
96 5 55.01724 55.01724 60.25848 9.130207e-01 16 1.9084928 1.9084928
97 5 60.21549 60.21549 59.43828 1.013076e+00 17 1.0462424 1.0462424
98 5 58.30626 58.30626 58.87065 9.904130e-01 18 0.1715082 0.1715082
99 5 61.68175 61.68175 58.07827 1.062045e+00 19 1.0537962 1.0537962
100 5 59.30138 59.30138 56.70798 1.045733e+00 20 2.5294523 2.5294523
...
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.
thanks for including your code so far. Could you also include the desired dataframe/output? Truncate it if necessary; just enough to demonstrate your intentions.
– wibeasley
Aug 12 at 20:57