Penguins and nakedpipes

R

Exploring the new {palmerpenguins} dataset with {nakedpipe} - An alternative to using {magrittr}’s %>%.

Harry Fisher https://hfshr.netlify.app/
06-21-2020

Recently I came across nakedpipe Fabri (2020) on github - an R package with an intriguing name to say the least… After checking out the repo, I was very impressed and thought I’d share my notes here!

So what does nakedpipe do?

I’m sure most R users are familiar with magrittr’s pipe (%>%) operator. The %>% allows you to chain together multiple commands by piping forward an expression into a function like x %>% f, rather than f(x) Bache and Wickham (2014). The resulting code is usually very readable and easy to debug. nakedpipe adopts a similar style, but removes the need to use %>% after every function, and adds some additional logging/debugging features as well as being slightly faster than the magrittr implementation.

In order to try out the features in nakedpipe I used the new palmerpenguins dataset (KB, TD, and WR 2014) which provides a great test dataset and is a refreshing alternative the usual suspects, mtcars et al…

Getting started

#remotes::install_github("allisonhorst/palmerpenguins")
#remotes::install_github("moodymudskipper/nakedpipe")

library(tidyverse)
library(palmerpenguins) 
library(nakedpipe)
library(magrittr)

penguins <- penguins

A basic example looks like this:

penguins %.% {
  filter(sex == "female") 
  select(1:5) 
  head(5)
}   
# A tibble: 5 x 5
  species island    bill_length_mm bill_depth_mm flipper_length_mm
  <fct>   <fct>              <dbl>         <dbl>             <int>
1 Adelie  Torgersen           39.5          17.4               186
2 Adelie  Torgersen           40.3          18                 195
3 Adelie  Torgersen           36.7          19.3               193
4 Adelie  Torgersen           38.9          17.8               181
5 Adelie  Torgersen           41.1          17.6               182

You use the %.% operator to “pipe” into a sequence of functions, which are within {}. You can reduce the code further by removing the subset / filter argument like:

penguins %.% {
  sex == "female"
  select(1:5) 
  head(5)
}  
# A tibble: 5 x 5
  species island    bill_length_mm bill_depth_mm flipper_length_mm
  <fct>   <fct>              <dbl>         <dbl>             <int>
1 Adelie  Torgersen           39.5          17.4               186
2 Adelie  Torgersen           40.3          18                 195
3 Adelie  Torgersen           36.7          19.3               193
4 Adelie  Torgersen           38.9          17.8               181
5 Adelie  Torgersen           41.1          17.6               182

Similarly, this works with transform / mutate calls. The use of the = sign implies a new column is being created, for example:

penguins %.% {
  sex == "female"
  body_mass_kg = body_mass_g/1000
  select(body_mass_g, body_mass_kg)
  head(5)
}  
  body_mass_g body_mass_kg
1        3800        3.800
2        3250        3.250
3        3450        3.450
4        3625        3.625
5        3200        3.200

Assigning the result of a sequence of function can be done by using -> at the end of the sequence (outside of the {}) or using the %<.% operator at the start of the sequence.

penguins %.% {
  sex == "female"
  arrange(body_mass_g)
  select(1,2,6) 
  head(5)
} -> small_penguins

# Or

small_penguins <- penguins

small_penguins %<.% {
  sex == "female"
  arrange(body_mass_g)
  select(1,2,6) 
  head(5)
} 

Additionally, you can create outputs midway through the sequence of functions by using ~~:

penguins %.% {
  sex == "female"
  ~~ . -> female_penguins
  select(1:5) 
  head(5) 
} -> x

head(female_penguins)
# A tibble: 6 x 8
  species island bill_length_mm bill_depth_mm flipper_length_…
  <fct>   <fct>           <dbl>         <dbl>            <int>
1 Adelie  Torge…           39.5          17.4              186
2 Adelie  Torge…           40.3          18                195
3 Adelie  Torge…           36.7          19.3              193
4 Adelie  Torge…           38.9          17.8              181
5 Adelie  Torge…           41.1          17.6              182
6 Adelie  Torge…           36.6          17.8              185
# … with 3 more variables: body_mass_g <int>, sex <fct>, year <int>

Additional operators

You can use %P.% to print the output of each step:

penguins %P.% {
  sex == "female"
  select(1:5) 
  head(5)
} -> y
penguins %P.% {
  sex == "female"
# A tibble: 165 x 8
   species island bill_length_mm bill_depth_mm flipper_length_…
   <fct>   <fct>           <dbl>         <dbl>            <int>
 1 Adelie  Torge…           39.5          17.4              186
 2 Adelie  Torge…           40.3          18                195
 3 Adelie  Torge…           36.7          19.3              193
 4 Adelie  Torge…           38.9          17.8              181
 5 Adelie  Torge…           41.1          17.6              182
 6 Adelie  Torge…           36.6          17.8              185
 7 Adelie  Torge…           38.7          19                195
 8 Adelie  Torge…           34.4          18.4              184
 9 Adelie  Biscoe           37.8          18.3              174
10 Adelie  Biscoe           35.9          19.2              189
# … with 155 more rows, and 3 more variables: body_mass_g <int>,
#   sex <fct>, year <int>
  select(1:5)
# A tibble: 165 x 5
   species island    bill_length_mm bill_depth_mm flipper_length_mm
   <fct>   <fct>              <dbl>         <dbl>             <int>
 1 Adelie  Torgersen           39.5          17.4               186
 2 Adelie  Torgersen           40.3          18                 195
 3 Adelie  Torgersen           36.7          19.3               193
 4 Adelie  Torgersen           38.9          17.8               181
 5 Adelie  Torgersen           41.1          17.6               182
 6 Adelie  Torgersen           36.6          17.8               185
 7 Adelie  Torgersen           38.7          19                 195
 8 Adelie  Torgersen           34.4          18.4               184
 9 Adelie  Biscoe              37.8          18.3               174
10 Adelie  Biscoe              35.9          19.2               189
# … with 155 more rows
  head(5)
# A tibble: 5 x 5
  species island    bill_length_mm bill_depth_mm flipper_length_mm
  <fct>   <fct>              <dbl>         <dbl>             <int>
1 Adelie  Torgersen           39.5          17.4               186
2 Adelie  Torgersen           40.3          18                 195
3 Adelie  Torgersen           36.7          19.3               193
4 Adelie  Torgersen           38.9          17.8               181
5 Adelie  Torgersen           41.1          17.6               182
}

%L.% prints out timings of each step in the sequence (Not particularly useful for this toy example, but would be for longer running code)

penguins %L.% {
  sex == "female"
  select(1:5) 
  head(5)
} -> z
penguins %L.% {
  sex == "female"
   user  system elapsed 
  0.001   0.000   0.002 
  select(1:5)
   user  system elapsed 
  0.003   0.000   0.003 
  head(5)
   user  system elapsed 
  0.001   0.000   0.000 
}

%F.% lets you assign a function using the nakedpipe syntax:

penguin_func <- . %F.% {
  group_by(species)
  summarise(across(where(is.numeric), ~mean(., na.rm = TRUE)))
  mutate(across(where(is.numeric), round, 2))
}

penguin_func(penguins)
# A tibble: 3 x 6
  species bill_length_mm bill_depth_mm flipper_length_… body_mass_g
  <fct>            <dbl>         <dbl>            <dbl>       <dbl>
1 Adelie            38.8          18.4             190.       3701.
2 Chinst…           48.8          18.4             196.       3733.
3 Gentoo            47.5          15.0             217.       5076.
# … with 1 more variable: year <dbl>

ggplot

nakedpipe also plays well with ggplot:

penguins %.% {
  body_mass_kg = body_mass_g / 1000
  ggplot(aes(body_mass_kg, flipper_length_mm, shape = species, colour = species))
  + geom_point() 
  + labs(y = "Flipper length", x = "Body mass (kg)")
  + theme_minimal()
}

There are some other useful things that nakepipe can do that I haven’t included here, such as %D.% operator for debugging, %..% for even faster performance and even a RStudio addin to convert your magrittr style code to the nakepipe syntax. Pretty cool!

If you’ve found any of these examples interesting, I’d highly recommend you check out the package (https://github.com/moodymudskipper/nakedpipe) and give it a try!

Bache, Stefan Milton, and Hadley Wickham. 2014. Magrittr: A Forward-Pipe Operator for R. https://CRAN.R-project.org/package=magrittr.

Fabri, Antoine. 2020. Nakedpipe: Pipe into a Sequence of Calls Without Repeating the Pipe Symbol. https://github.com/moodymudskipper/nakedpipe.

KB, Gorman, Williams TD, and Fraser WR. 2014. “Ecological Sexual Dimorphism and Environmental Variability Within a Community of Antarctic Penguins (Genus Pygoscelis).” PLoS ONE 9(3) (e90081): –13. https://doi.org/10.1371/journal.pone.0090081.

References

Corrections

If you see mistakes or want to suggest changes, please create an issue on the source repository.

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY 4.0. Source code is available at https://github.com/hfshr/distill_blog, unless otherwise noted. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

Citation

For attribution, please cite this work as

Fisher (2020, June 21). Harry Fisher: Penguins and nakedpipes. Retrieved from https://hfshr.netlify.app/posts/2020-06-21-penguins-and-nakedpipes/

BibTeX citation

@misc{fisher2020penguins,
  author = {Fisher, Harry},
  title = {Harry Fisher: Penguins and nakedpipes},
  url = {https://hfshr.netlify.app/posts/2020-06-21-penguins-and-nakedpipes/},
  year = {2020}
}