Potentially Useful

Writing boring posts is okay

Checking Equality in an R Pipeline

in: Programming Data Science tagged: R

I was wondering how to incorporate a test for equality into a pipeline using R’s new(ish) native forward pipe operator1, so I asked the Fediverse and got great advice2.

Why would I do this?

I approach scripts differently than notebooks (whether using Jupyter or RMarkdown) since they’re usually meant to be run non-interactively. I try to keep the list of packages (or modules) fairly minimal; basically, when deciding how to balance portability vs. code readability, I am biased a bit more toward portability than I am for an interactive data analysis.

Nevertheless, a good script should include data quality checks and raise errors when they’re found. An approach I’d previously taken with the pipe operator introduced by R’s magrittr package3 might look like this:

my_data_frame %>%
  # ( a long sequence of data munging operations would go here )
  filter(some_condition_that_should_never_be_true) %>%
  nrow() %>%
  `==`(0) %>%
  stopifnot()

This code would cause the R script to exit and alert the calling process that something went wrong, but it doesn’t work as-written with the native pipe.

Avoid using ==

I was convinced by a couple responses4 that the best approach is to forego the == operator entirely, and this advice is echoed in R’s own documentation.

The first solution is what the scripts I wrote this week now use.

You can still use ==

I often bristle when I see somebody ask “how do I do this?” on the internet only to be told “don’t do that”. It only “works” when the respondent really knows what they’re talking about, and internet commenters seem to overestimate their expertise5. Fortunately, that wasn’t the case this time, and I think this says great things about the Fediverse’s R community.

That said, it is still possible to use the equality operator by naming the argument6:

my_data_frame |>
  # ( a long sequence of data munging operations would go here )
  filter(some_condition_that_should_never_be_true) |>
  nrow() |>
  `==`(x = _, 0) |>
  stopifnot()

I believe this approach requires R 4.2.0 or greater, but I haven’t tested it in the 4.1.x family. It’s nice to know about this, because it applies in other contexts (e.g., when using other operators as functions).

It’s possibly worth calling out the first approach I had planned to use, which calls an “anonymous function”7:

my_data_frame |>
  # ( a long sequence of data munging operations would go here )
  filter(some_condition_that_should_never_be_true) |>
  nrow() |>
  {\(x) x == 0}() |>
  stopifnot()

Rambling about pipes

Pipe operators can be found in a several other languages8. The idea of “pipelining” operations probably “comes from” concatenative languages; pipe operators bring this ability to other paradigms, allowing one to chain a sequence of functions without storing intermediate values or nesting the calls. Fans of this style believe it can “decrease development time and improve readability and maintainability of code”9.

In the examples here, I don’t want to create a bunch of variables to store values that I won’t use later. But also, I think I realize why I like pipelining: being able to fit the sequence into a single pipeline feels like writing a sentence describing the transformations I want to apply to the data.


  1. |>, introduced in R 4.1.0 in May 2021. ↩︎

  2. Seriously, thanks to everyone who replied, most of whom probably only saw my post because of its #RStats hashtag. ↩︎

  3. See Differences between the base R and magrittr pipes ↩︎

  4. This advice comes from Josep Pueyo-Ros and Elio Campitelli↩︎

  5. Dunning, D. (2011). The Dunning–Kruger effect: On being ignorant of one’s own ignorance. In Advances in experimental social psychology (Vol. 44, pp. 247-296). Academic Press. ↩︎

  6. Thanks go to Matt Dray for this one. ↩︎

  7. This syntax was also introduced in R 4.1.0. ↩︎

  8. Python is a notable exception here, and it’s my opinion that it would be improved with the addition of one. ↩︎

  9. https://cran.r-project.org/web/packages/magrittr/vignettes/magrittr.html ↩︎