I was wondering how to incorporate a test for equality into a pipeline using R’s new(ish) native forward pipe operator1, so I asked the Fediverse and got great advice2.
Why would I do this?
I approach scripts differently than notebooks (whether using Jupyter or RMarkdown) since they’re usually meant to be run non-interactively. I try to keep the list of packages (or modules) fairly minimal; basically, when deciding how to balance portability vs. code readability, I am biased a bit more toward portability than I am for an interactive data analysis.
Nevertheless, a good script should include data quality checks and raise errors when they’re found. An approach I’d previously taken with the pipe operator introduced by R’s magrittr package3 might look like this:
my_data_frame %>%
# ( a long sequence of data munging operations would go here )
filter(some_condition_that_should_never_be_true) %>%
nrow() %>%
`==`(0) %>%
stopifnot()
This code would cause the R script to exit and alert the calling process that something went wrong, but it doesn’t work as-written with the native pipe.
Avoid using ==
I was convinced by a couple responses4 that the best approach is
to forego the ==
operator entirely, and this advice is echoed in R’s own
documentation.
- When you know that the value you’re testing will be an integer or a
non-numeric value, the
identical()
function is better. In the example above, I’d writeidentical(0L)
, since this differentiates integer from floating-point values, and I knownrow()
will return an integer. - For other numeric values, use
all.equal()
followed byisTRUE()
; e.g.sin(pi) |> all.equal(0) |> isTRUE()
The first solution is what the scripts I wrote this week now use.
You can still use ==
I often bristle when I see somebody ask “how do I do this?” on the internet only to be told “don’t do that”. It only “works” when the respondent really knows what they’re talking about, and internet commenters seem to overestimate their expertise5. Fortunately, that wasn’t the case this time, and I think this says great things about the Fediverse’s R community.
That said, it is still possible to use the equality operator by naming the argument6:
my_data_frame |>
# ( a long sequence of data munging operations would go here )
filter(some_condition_that_should_never_be_true) |>
nrow() |>
`==`(x = _, 0) |>
stopifnot()
I believe this approach requires R 4.2.0 or greater, but I haven’t tested it in the 4.1.x family. It’s nice to know about this, because it applies in other contexts (e.g., when using other operators as functions).
It’s possibly worth calling out the first approach I had planned to use, which calls an “anonymous function”7:
my_data_frame |>
# ( a long sequence of data munging operations would go here )
filter(some_condition_that_should_never_be_true) |>
nrow() |>
{\(x) x == 0}() |>
stopifnot()
Rambling about pipes
Pipe operators can be found in a several other languages8. The idea of “pipelining” operations probably “comes from” concatenative languages; pipe operators bring this ability to other paradigms, allowing one to chain a sequence of functions without storing intermediate values or nesting the calls. Fans of this style believe it can “decrease development time and improve readability and maintainability of code”9.
In the examples here, I don’t want to create a bunch of variables to store values that I won’t use later. But also, I think I realize why I like pipelining: being able to fit the sequence into a single pipeline feels like writing a sentence describing the transformations I want to apply to the data.
-
|>
, introduced in R 4.1.0 in May 2021. ↩︎ -
Seriously, thanks to everyone who replied, most of whom probably only saw my post because of its
#RStats
hashtag. ↩︎ -
This advice comes from Josep Pueyo-Ros and Elio Campitelli. ↩︎
-
Dunning, D. (2011). The Dunning–Kruger effect: On being ignorant of one’s own ignorance. In Advances in experimental social psychology (Vol. 44, pp. 247-296). Academic Press. ↩︎
-
This syntax was also introduced in R 4.1.0. ↩︎
-
Python is a notable exception here, and it’s my opinion that it would be improved with the addition of one. ↩︎
-
https://cran.r-project.org/web/packages/magrittr/vignettes/magrittr.html ↩︎