The Conditional Missingness of Missing Values in R

Many data analysts often wish to examine subsets of data or otherwise manipulate data using indicators of data missingness. Luckily, R features a number of different ways of designating a value as missing. Unluckily, some of the interactions with popular functions are not always intuitive and this can produce unintended results.

I wrote a demonstration of this awhile back. The below showcases behaviors of missing values many R programmers likely expect and also some surprising results. One way to potentially avoid disastrous consequences - as a consequence of these behaviors or other causes - is to establish tests to make sure your code does what you want it to do.

# The below demonstrates the madness of R's treatment of NA values.
# Some examples taken from https://stackoverflow.com/questions/25100974/na-matches-na-but-is-not-equal-to-na-why/25101796
# Logical examples
NA %in% NA
# [1] TRUE
NA == NA
# [1] NA
NA | TRUE
# [1] TRUE
NA_real_ | TRUE
# [1] TRUE
NA_integer_ | TRUE
# [1] TRUE
NA | FALSE
# [1] NA
NA_real_ | FALSE
# [1] NA
NA_integer_ | FALSE
# [1] NA
TRUE | paste(NA)
# Error in TRUE | paste(NA) :
# operations are possible only for numeric, logical or complex types
# Matching examples
match(NA, NA)
# [1] 1
match(NA, NA_real_)
# [1] 1
match(NA_character_, NA_real_)
# [1] 1
match(paste(NA), NA)
# [1] NA
gsub("NA", "", NA)
# [1] NA
gsub("NA", "", paste(NA))
# [1] ""
is.na(NA)
# [1] TRUE
is.na(paste(NA))
# [1] FALSE
# Other examples
identical(NA, NA)
# [1] TRUE
eval(NA)
# [1] NA
is.na(eval(NA))
# [1] TRUE
view raw r_na_examples hosted with ❤ by GitHub
Avatar
Brett J. Gall
Data Scientist

Related