Many data analysts often wish to examine subsets of data or otherwise manipulate data using indicators of data missingness. Luckily, R features a number of different ways of designating a value as missing. Unluckily, some of the interactions with popular functions are not always intuitive and this can produce unintended results.
I wrote a demonstration of this awhile back. The below showcases behaviors of missing values many R programmers likely expect and also some surprising results. One way to potentially avoid disastrous consequences - as a consequence of these behaviors or other causes - is to establish tests to make sure your code does what you want it to do.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# The below demonstrates the madness of R's treatment of NA values. | |
# Some examples taken from https://stackoverflow.com/questions/25100974/na-matches-na-but-is-not-equal-to-na-why/25101796 | |
# Logical examples | |
NA %in% NA | |
# [1] TRUE | |
NA == NA | |
# [1] NA | |
NA | TRUE | |
# [1] TRUE | |
NA_real_ | TRUE | |
# [1] TRUE | |
NA_integer_ | TRUE | |
# [1] TRUE | |
NA | FALSE | |
# [1] NA | |
NA_real_ | FALSE | |
# [1] NA | |
NA_integer_ | FALSE | |
# [1] NA | |
TRUE | paste(NA) | |
# Error in TRUE | paste(NA) : | |
# operations are possible only for numeric, logical or complex types | |
# Matching examples | |
match(NA, NA) | |
# [1] 1 | |
match(NA, NA_real_) | |
# [1] 1 | |
match(NA_character_, NA_real_) | |
# [1] 1 | |
match(paste(NA), NA) | |
# [1] NA | |
gsub("NA", "", NA) | |
# [1] NA | |
gsub("NA", "", paste(NA)) | |
# [1] "" | |
is.na(NA) | |
# [1] TRUE | |
is.na(paste(NA)) | |
# [1] FALSE | |
# Other examples | |
identical(NA, NA) | |
# [1] TRUE | |
eval(NA) | |
# [1] NA | |
is.na(eval(NA)) | |
# [1] TRUE | |