library(tidyverse)
How to pivot datasets
A guide to using pivot_longer() and pivot_wider() in {tidyr}
Why, oh why?
I’ve spent quite a few years working with data in R but was reminded of the brain fog that overcomes me each time I want to pivot any data when I saw Jörn’s post on Bluesky:
Will ever I be able to convert data between wide and long format in #rstats without googling?
— Jörn Alexander Quent (@jaquent.bsky.social) Jun 6, 2024 at 7:40
I regularly use Excel and Power BI at work and pivoting can be pretty simple in Power Query (here’s the answer in case you’re wondering) but this site is a Microsoft-free zone so I’ll show you how to do in RStudio.
Enter the Tidyverse
For this guide we’re using the {tidyr} package from {tidyverse} so let’s get them loaded up and set up a dataset for use in this guide:
<- seq(1:10)
index <- runif(10,1,50)
p1 <- runif(10,5,70)
p2 <- runif(10,7,90)
p3 <- tibble(index,p1,p2,p3)
df print(df)
# A tibble: 10 × 4
index p1 p2 p3
<int> <dbl> <dbl> <dbl>
1 1 44.4 66.3 60.5
2 2 25.0 29.9 73.2
3 3 4.50 69.8 34.7
4 4 12.3 46.0 13.0
5 5 32.1 20.2 78.4
6 6 41.9 62.4 22.2
7 7 13.1 41.9 52.8
8 8 18.3 33.7 33.7
9 9 16.5 52.8 37.5
10 10 26.5 63.3 28.3
Now, let’s say we want to everything apart from the index
column to be reshaped into a longer tibble where the p
labels are in a column called class
and the actual observations for each row are in a column called values
. Using the pivot_longer()
function we could do something like this were we select everything apart from index
and column names are assigned using the names_to
argument and values using values_to
:
|>
df pivot_longer(!index,
names_to = "class",
values_to = "value")
# A tibble: 30 × 3
index class value
<int> <chr> <dbl>
1 1 p1 44.4
2 1 p2 66.3
3 1 p3 60.5
4 2 p1 25.0
5 2 p2 29.9
6 2 p3 73.2
7 3 p1 4.50
8 3 p2 69.8
9 3 p3 34.7
10 4 p1 12.3
# ℹ 20 more rows
Other ways of achieving this same result are by selecting specific columns from their column index; {tidyselect}’s selection helpers; or a regexp using the names_pattern
argument:
# Use column index to select columns
|>
df pivot_longer(cols = c(2:4),
names_to = "class",
values_to = "values")
# Use starts_with to select columns
|>
df pivot_longer(cols = starts_with("p"),
names_to = "class",
values_to = "values")
# Use a regexp to select columns
|>
df pivot_longer(cols = c(2:4),
names_to = "class",
names_pattern = "^p(.*)",
values_to = "values")
Now let’s say that we have the same data in a long format, using df2,
but want to use it in a wide format.
<- df |>
df2 pivot_longer(cols = starts_with("p"),
names_to = "class",
values_to = "values")
head(df2,5)
# A tibble: 5 × 3
index class values
<int> <chr> <dbl>
1 1 p1 44.4
2 1 p2 66.3
3 1 p3 60.5
4 2 p1 25.0
5 2 p2 29.9
In this case, we use pivot_wider()
and the names_from
argument to specify which column will be used for the column names, and values_from
for the values.
|>
df2 pivot_wider(names_from = class,
values_from = values)
# A tibble: 10 × 4
index p1 p2 p3
<int> <dbl> <dbl> <dbl>
1 1 44.4 66.3 60.5
2 2 25.0 29.9 73.2
3 3 4.50 69.8 34.7
4 4 12.3 46.0 13.0
5 5 32.1 20.2 78.4
6 6 41.9 62.4 22.2
7 7 13.1 41.9 52.8
8 8 18.3 33.7 33.7
9 9 16.5 52.8 37.5
10 10 26.5 63.3 28.3
Next steps
The code above returns the simple tibble we started with but reshaping more complicated datasets might require the use of names_repair
, names_sep
, or names_expand
arguments in pivot_wider
. To learn about these and other arguments for both of the pivoting functions mentioned above then take a look at the detailed guide here. Happy pivoting!