# Missing Values

swirl()

| Welcome to swirl! Please sign in. If you've been here before, use the same name as you

| did then. If you are new, call yourself something unique.

What shall I call you? Krishnakanth Allika

| Please choose a course, or type 0 to exit swirl.

1: R Programming

2: Take me to the swirl course repository!

Selection: 1

| Please choose a lesson, or type 0 to return to course menu.

1: Basic Building Blocks 2: Workspace and Files 3: Sequences of Numbers

4: Vectors 5: Missing Values 6: Subsetting Vectors

7: Matrices and Data Frames 8: Logic 9: Functions

10: lapply and sapply 11: vapply and tapply 12: Looking at Data

13: Simulation 14: Dates and Times 15: Base Graphics

Selection: 5

| | 0%

| Missing values play an important role in statistics and data analysis. Often, missing

| values must not be ignored, but rather they should be carefully studied to see if

| there's an underlying pattern or cause for their missingness.

...

|==== | 5%

| In R, NA is used to represent any value that is 'not available' or 'missing' (in the

| statistical sense). In this lesson, we'll explore missing values further.

...

|======== | 10%

| Any operation involving NA generally yields NA as the result. To illustrate, let's

| create a vector c(44, NA, 5, NA) and assign it to a variable x.

x<-c(44, NA, 5, NA)

| All that practice is paying off!

|============ | 15%

| Now, let's multiply x by 3.

x*3

[1] 132 NA 15 NA

| You are amazing!

|================ | 20%

| Notice that the elements of the resulting vector that correspond with the NA values in

| x are also NA.

...

|==================== | 25%

| To make things a little more interesting, lets create a vector containing 1000 draws

| from a standard normal distribution with y <- rnorm(1000).

y<-rnorm(1000)

| That's correct!

|======================== | 30%

| Next, let's create a vector containing 1000 NAs with z <- rep(NA, 1000).

z<-rep(NA,1000)

| You are doing so well!

|============================ | 35%

| Finally, let's select 100 elements at random from these 2000 values (combining y and z)

| such that we don't know how many NAs we'll wind up with or what positions they'll

| occupy in our final vector -- my_data <- sample(c(y, z), 100).

my_data<-sample(c(y,z),100)

| You nailed it! Good job!

|================================ | 40%

| Let's first ask the question of where our NAs are located in our data. The is.na()

| function tells us whether each element of a vector is NA. Call is.na() on my_data and

| assign the result to my_na.

my_na<-is.na(my_data)

| That's correct!

|==================================== | 45%

| Now, print my_na to see what you came up with.

my_na

[1] FALSE TRUE FALSE FALSE FALSE TRUE TRUE FALSE TRUE FALSE FALSE FALSE TRUE FALSE

[15] TRUE FALSE FALSE FALSE TRUE FALSE TRUE TRUE FALSE FALSE FALSE FALSE TRUE FALSE

[29] TRUE FALSE TRUE FALSE FALSE TRUE TRUE FALSE FALSE TRUE FALSE TRUE TRUE FALSE

[43] FALSE TRUE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE

[57] TRUE FALSE FALSE FALSE TRUE TRUE FALSE FALSE TRUE FALSE FALSE FALSE FALSE TRUE

[71] TRUE TRUE FALSE FALSE TRUE TRUE TRUE FALSE FALSE TRUE TRUE FALSE TRUE FALSE

[85] FALSE TRUE FALSE FALSE TRUE FALSE FALSE TRUE TRUE FALSE FALSE TRUE TRUE FALSE

[99] TRUE FALSE

| You are quite good my friend!

|======================================== | 50%

| Everywhere you see a TRUE, you know the corresponding element of my_data is NA.

| Likewise, everywhere you see a FALSE, you know the corresponding element of my_data is

| one of our random draws from the standard normal distribution.

...

|============================================ | 55%

| In our previous discussion of logical operators, we introduced the `==`

operator as a

| method of testing for equality between two objects. So, you might think the expression

| my_data == NA yields the same results as is.na(). Give it a try.

my_data==NA

[1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA

[29] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA

[57] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA

[85] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA

| All that hard work is paying off!

|================================================ | 60%

| The reason you got a vector of all NAs is that NA is not really a value, but just a

| placeholder for a quantity that is not available. Therefore the logical expression is

| incomplete and R has no choice but to return a vector of the same length as my_data

| that contains all NAs.

...

|==================================================== | 65%

| Don't worry if that's a little confusing. The key takeaway is to be cautious when using

| logical expressions anytime NAs might creep in, since a single NA value can derail the

| entire thing.

...

|======================================================== | 70%

| So, back to the task at hand. Now that we have a vector, my_na, that has a TRUE for

| every NA and FALSE for every numeric value, we can compute the total number of NAs in

| our data.

...

|============================================================ | 75%

| The trick is to recognize that underneath the surface, R represents TRUE as the number

| 1 and FALSE as the number 0. Therefore, if we take the sum of a bunch of TRUEs and

| FALSEs, we get the total number of TRUEs.

...

|================================================================ | 80%

| Let's give that a try here. Call the sum() function on my_na to count the total number

| of TRUEs in my_na, and thus the total number of NAs in my_data. Don't assign the result

| to a new variable.

sum(my_na)

[1] 43

| You're the best!

|==================================================================== | 85%

| Pretty cool, huh? Finally, let's take a look at the data to convince ourselves that

| everything 'adds up'. Print my_data to the console.

my_data

[1] -0.578578797 NA -0.112639140 0.836412196 -1.074043937 NA

[7] NA 1.303020726 NA 1.514220057 -1.533126560 -0.366673361

[13] NA -1.032058614 NA -1.631213149 0.379297612 -0.706613051

[19] NA -0.692352920 NA NA 0.535394170 -1.872906664

[25] -0.861449272 -1.321735747 NA -0.787816086 NA 0.801388943

[31] NA -1.487792282 0.470028145 NA NA 1.187583726

[37] -1.704604005 NA 0.596807280 NA NA -1.493099149

[43] 0.265671235 NA -0.985396879 -0.974373033 NA 1.377397659

[49] -0.637308342 -1.450105656 0.192263390 0.776355028 NA NA

[55] NA NA NA 1.567478781 -0.511602362 0.107048330

[61] NA NA -0.408394399 0.592123817 NA 0.305403550

[67] 3.201114883 0.806735141 0.698544788 NA NA NA

[73] -2.671330480 0.123440813 NA NA NA -1.236370155

[79] 0.936670598 NA NA 1.519229128 NA -1.366810674

[85] 0.211749069 NA 0.203741812 1.319234085 NA -0.432928319

[91] -0.006566875 NA NA 0.060568295 0.292428312 NA

[97] NA 0.717821949 NA 0.359249723

| You're the best!

|======================================================================== | 90%

| Now that we've got NAs down pat, let's look at a second type of missing value -- NaN,

| which stands for 'not a number'. To generate NaN, try dividing (using a forward slash)

| 0 by 0 now.

0/0

[1] NaN

| That's a job well done!

|============================================================================ | 95%

| Let's do one more, just for fun. In R, Inf stands for infinity. What happens if you

| subtract Inf from Inf?

Inf-Inf

[1] NaN

| You are really on a roll!

|================================================================================| 100%

| Would you like to receive credit for completing this course on Coursera.org?

1: Yes

2: No

Selection: 1

What is your email address? [email protected]

What is your assignment token? xXxXxxXXxXxxXXXx

Grade submission succeeded!

| You are doing so well!

| You've reached the end of this lesson! Returning to the main

| menu...

| Please choose a course, or type 0 to exit swirl.

*Last updated 2020-04-13 23:32:50.765370 IST*

## Comments