Iteration for time series data, using purrr

I have a bunch of time series data stacked on top of one another in a data frame; one series for each region in a country. I'd like to apply the seas() function (from the seasonal package) to each series, iteratively, to make the series seasonally adjusted. To do this, I first have to convert the series to a ts class. I'm struggling to do all this using purrr.

seas()

seasonal

ts

purrr

Here's a minimum worked example:

library(seasonal) library(tidyverse) set.seed(1234) df <- data.frame(region = rep(1:10, each = 20), quarter = rep(1:20, 10), var = sample(5:200, 200, replace = T))

For each region (indexed by a number) I'd like to perform the following operations. Here's the first region as an example:

tem1 <- df %>% filter(region==1) tem2 <- ts(data = tem1$var, frequency = 4, start=c(1990,1)) tem3 <- seas(tem2) tem4 <- as.data.frame(tem3$data)

I'd then like to stack the output (ie. the multiple tem4 data frames, one for each region), along with the region and quarter identifiers.

So, the start of the output for region 1 would be this:

final seasonaladj trend irregular region quarter 1 27 27 96.95 -67.97279 1 1 2 126 126 96.95 27.87381 1 2 3 124 124 96.95 27.10823 1 3 4 127 127 96.95 30.55075 1 4 5 173 173 96.95 75.01355 1 5 6 130 130 96.95 32.10672 1 6

The data for region 2 would be below this etc.

I started with the following but without luck so far. Basically, I'm struggling to get the time series into the tibble:

seas.adjusted <- df %>% group_by(region) %>% mutate(data.ts = map(.x = data$var, .f = as.ts, start = 1990, freq = 4))

thanks for including your code so far. Could you also include the desired dataframe/output? Truncate it if necessary; just enough to demonstrate your intentions.
– wibeasley
Aug 12 at 20:57

edits in original.
– lethalSinger
Aug 12 at 21:07

2 Answers
2

I don't know much about the seasonal adjustment part, so there may be things I missed, but I can help with moving your calculations into a map-friendly function.

map

After grouping by region, you can nest the data so there's a nested data frame for each region. Then you can run essentially the same code as you had, but inside a function in map. Unnesting the resulting column gives you a long-shaped data frame of adjustments.

map

Like I said, I don't have the expertise to know whether those last two columns having NAs is expected or not.

NA

Edit: Based on @wibeasley's question about retaining the quarter column, I'm adding a mutate that adds a column of the quarters listed in the nested data frame.

quarter

mutate

library(seasonal) library(tidyverse) set.seed(1234) df <- data.frame(region = rep(1:10, each = 20), quarter = rep(1:20, 10), var = sample(5:200, 200, replace = T)) df %>% group_by(region) %>% nest() %>% mutate(data.ts = map(data, function(x) tem2 <- ts(x$var, frequency = 4, start = c(1990, 1)) tem3 <- seas(tem2) as.data.frame(tem3$data) %>% mutate(quarter = x$quarter) )) %>% unnest(data.ts) #> # A tibble: 200 x 8 #> region final seasonaladj trend irregular quarter seasonal adjustfac #> <int> <dbl> <dbl> <dbl> <dbl> <int> <dbl> <dbl> #> 1 1 27 27 97.0 -68.0 1 NA NA #> 2 1 126 126 97.0 27.9 2 NA NA #> 3 1 124 124 97.0 27.1 3 NA NA #> 4 1 127 127 97.0 30.6 4 NA NA #> 5 1 173 173 97.0 75.0 5 NA NA #> 6 1 130 130 97.0 32.1 6 NA NA #> 7 1 6 6 97.0 -89.0 7 NA NA #> 8 1 50 50 97.0 -46.5 8 NA NA #> 9 1 135 135 97.0 36.7 9 NA NA #> 10 1 105 105 97.0 8.81 10 NA NA #> # ... with 190 more rows

I also gave a bit more thought to doing this without nesting, and instead tried doing it with a split. Passing that list of data frames into imap_dfr let me take each split piece of the data frame and its name (in this case, the value of region), then return everything rbinded back together into one data frame. I sometimes shy away from nested data just because I have trouble seeing what's going on, so this is an alternative that is maybe more transparent.

split

imap_dfr

region

rbind

df %>% split(.$region) %>% imap_dfr(function(x, reg) tem2 <- ts(x$var, frequency = 4, start = c(1990, 1)) tem3 <- seas(tem2) as.data.frame(tem3$data) %>% mutate(region = reg, quarter = x$quarter) ) %>% select(region, quarter, everything()) %>% head() #> region quarter final seasonaladj trend irregular seasonal adjustfac #> 1 1 1 27 27 96.95 -67.97274 NA NA #> 2 1 2 126 126 96.95 27.87378 NA NA #> 3 1 3 124 124 96.95 27.10823 NA NA #> 4 1 4 127 127 96.95 30.55077 NA NA #> 5 1 5 173 173 96.95 75.01353 NA NA #> 6 1 6 130 130 96.95 32.10669 NA NA

Created on 2018-08-12 by the reprex package (v0.2.0).

@camile, I love your use of nest(). Do you have ideas how to include a quarter column? Experimenting on my own, I used nest() with a function similar to my f() below. But it's still using my hack of reintroducing quarter.
– wibeasley
Aug 12 at 21:55

nest()

quarter

nest()

f()

quarter

@wibeasley Good question, I'm revisiting that now
– camille
Aug 12 at 22:39

I put all the action inside of f(), and then called it with purrr::map_df(). The re-inclusion of quarter is a hack.

f()

purrr::map_df()

quarter

f <- function( .region ) d <- df %>% dplyr::filter(region == .region) y <- d %>% dplyr::pull(var) %>% ts(frequency = 4, start=c(1990,1)) %>% seas() y$data %>% as.data.frame() %>% # dplyr::select(-seasonal, -adjustfac) %>% dplyr::mutate( quarter = d$quarter ) purrr::map_df(1:10, f, .id = "region")

results:

region final seasonaladj trend irregular quarter seasonal adjustfac 1 1 27.00000 27.00000 96.95000 -6.797279e+01 1 NA NA 2 1 126.00000 126.00000 96.95000 2.787381e+01 2 NA NA 3 1 124.00000 124.00000 96.95000 2.710823e+01 3 NA NA 4 1 127.00000 127.00000 96.95000 3.055075e+01 4 NA NA 5 1 173.00000 173.00000 96.95000 7.501355e+01 5 NA NA 6 1 130.00000 130.00000 96.95000 3.210672e+01 6 NA NA 7 1 6.00000 6.00000 96.95000 -8.899356e+01 7 NA NA 8 1 50.00000 50.00000 96.95000 -4.647254e+01 8 NA NA 9 1 135.00000 135.00000 96.95000 3.671077e+01 9 NA NA 10 1 105.00000 105.00000 96.95000 8.806955e+00 10 NA NA ... 96 5 55.01724 55.01724 60.25848 9.130207e-01 16 1.9084928 1.9084928 97 5 60.21549 60.21549 59.43828 1.013076e+00 17 1.0462424 1.0462424 98 5 58.30626 58.30626 58.87065 9.904130e-01 18 0.1715082 0.1715082 99 5 61.68175 61.68175 58.07827 1.062045e+00 19 1.0537962 1.0537962 100 5 59.30138 59.30138 56.70798 1.045733e+00 20 2.5294523 2.5294523 ...

By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

搜尋此網誌

Sfyjdyy

Iteration for time series data, using purrr

Iteration for time series data, using purrr

2 Answers
2

Popular posts from this blog

Firebase Auth - with Email and Password - Check user already registered

Dynamically update html content plain JS

How to determine optimal route across keyboard

Iteration for time series data, using purrr

Iteration for time series data, using purrr

2 Answers 2

Popular posts from this blog

Firebase Auth - with Email and Password - Check user already registered

Dynamically update html content plain JS

How to determine optimal route across keyboard

2 Answers
2