# Subsetting Vectors

R version 3.6.3 (2020-02-29) -- "Holding the Windsock"

Copyright (C) 2020 The R Foundation for Statistical Computing

Platform: x86_64-w64-mingw32/x64 (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.

You are welcome to redistribute it under certain conditions.

Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.

Type 'contributors()' for more information and

'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or

'help.start()' for an HTML browser interface to help.

Type 'q()' to quit R.

[Workspace loaded from C:/Users/kk/PortableApps/Git/home/k-allika/repos/DataScienceWithR/.RData]

library("swirl")

| Hi! I see that you have some variables saved in your workspace. To keep things running

| smoothly, I recommend you clean up before starting swirl.

| Type ls() to see a list of the variables in your workspace. Then, type rm(list=ls()) to

| clear your workspace.

| Type swirl() when you are ready to begin.

ls()

[1] "my_char" "my_data" "my_div" "my_na" "my_name" "my_seq" "my_sqrt"

[8] "num_vect" "old.dir" "tf" "x" "y" "z"

rm(list=ls())

swirl()

| Welcome to swirl! Please sign in. If you've been here before, use the same name as you

| did then. If you are new, call yourself something unique.

What shall I call you? Krishnakanth Allika

| Please choose a course, or type 0 to exit swirl.

1: R Programming

2: Take me to the swirl course repository!

Selection: 1

| Please choose a lesson, or type 0 to return to course menu.

1: Basic Building Blocks 2: Workspace and Files 3: Sequences of Numbers

4: Vectors 5: Missing Values 6: Subsetting Vectors

7: Matrices and Data Frames 8: Logic 9: Functions

10: lapply and sapply 11: vapply and tapply 12: Looking at Data

13: Simulation 14: Dates and Times 15: Base Graphics

Selection: 6

| | 0%

| In this lesson, we'll see how to extract elements from a vector based on some

| conditions that we specify.

...

|== | 3%

| For example, we may only be interested in the first 20 elements of a vector, or only

| the elements that are not NA, or only those that are positive or correspond to a

| specific variable of interest. By the end of this lesson, you'll know how to handle

| each of these scenarios.

...

|==== | 5%

| I've created for you a vector called x that contains a random ordering of 20 numbers

| (from a standard normal distribution) and 20 NAs. Type x now to see what it looks like.

x

[1] -0.68754438 NA NA NA NA NA NA

[8] NA -0.01654302 1.03010195 -0.40799451 -0.55849418 NA -0.07687958

[15] -0.05351510 NA NA 1.16924926 1.60452324 -0.08284351 1.66735009

[22] NA NA 2.18942224 -0.14724334 NA NA -0.99999522

[29] NA NA -0.12665386 -0.61215464 -0.58919026 NA NA

[36] 1.12894965 -1.36770314 NA -1.33061090 NA

| You are amazing!

|====== | 8%

| The way you tell R that you want to select some particular elements (i.e. a 'subset')

| from a vector is by placing an 'index vector' in square brackets immediately following

| the name of the vector.

...

|======== | 10%

| For a simple example, try x[1:10] to view the first ten elements of x.

x[1:10]

[1] -0.68754438 NA NA NA NA NA NA

[8] NA -0.01654302 1.03010195

| You got it right!

|========== | 13%

| Index vectors come in four different flavors -- logical vectors, vectors of positive

| integers, vectors of negative integers, and vectors of character strings -- each of

| which we'll cover in this lesson.

...

|============ | 15%

| Let's start by indexing with logical vectors. One common scenario when working with

| real-world data is that we want to extract all elements of a vector that are not NA

| (i.e. missing data). Recall that is.na(x) yields a vector of logical values the same

| length as x, with TRUEs corresponding to NA values in x and FALSEs corresponding to

| non-NA values in x.

...

|============== | 18%

| What do you think x[is.na(x)] will give you?

1: A vector of length 0

2: A vector of all NAs

3: A vector of TRUEs and FALSEs

4: A vector with no NAs

Selection: 2

| That's the answer I was looking for.

|================ | 21%

| Prove it to yourself by typing x[is.na(x)].

x[is.na(x)]

[1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA

| You got it!

|================== | 23%

| Recall that `!`

gives us the negation of a logical expression, so !is.na(x) can be read

| as 'is not NA'. Therefore, if we want to create a vector called y that contains all of

| the non-NA values from x, we can use y <- x[!is.na(x)]. Give it a try.

x[!is.na(x)]

[1] -0.68754438 -0.01654302 1.03010195 -0.40799451 -0.55849418 -0.07687958 -0.05351510

[8] 1.16924926 1.60452324 -0.08284351 1.66735009 2.18942224 -0.14724334 -0.99999522

[15] -0.12665386 -0.61215464 -0.58919026 1.12894965 -1.36770314 -1.33061090

| Nice try, but that's not exactly what I was hoping for. Try again. Or, type info() for

| more options.

| Type y <- x[!is.na(x)] to capture all non-missing values from x.

y<-x[!is.na(x)]

| You are doing so well!

|===================== | 26%

| Print y to the console.

y

[1] -0.68754438 -0.01654302 1.03010195 -0.40799451 -0.55849418 -0.07687958 -0.05351510

[8] 1.16924926 1.60452324 -0.08284351 1.66735009 2.18942224 -0.14724334 -0.99999522

[15] -0.12665386 -0.61215464 -0.58919026 1.12894965 -1.36770314 -1.33061090

| You are really on a roll!

|======================= | 28%

| Now that we've isolated the non-missing values of x and put them in y, we can subset y

| as we please.

...

|========================= | 31%

| Recall that the expression y > 0 will give us a vector of logical values the same

| length as y, with TRUEs corresponding to values of y that are greater than zero and

| FALSEs corresponding to values of y that are less than or equal to zero. What do you

| think y[y > 0] will give you?

1: A vector of all the negative elements of y

2: A vector of length 0

3: A vector of TRUEs and FALSEs

4: A vector of all NAs

5: A vector of all the positive elements of y

Selection: 5

| You are quite good my friend!

|=========================== | 33%

| Type y[y > 0] to see that we get all of the positive elements of y, which are also the

| positive elements of our original vector x.

y[y>0]

[1] 1.030102 1.169249 1.604523 1.667350 2.189422 1.128950

| All that practice is paying off!

|============================= | 36%

| You might wonder why we didn't just start with x[x > 0] to isolate the positive

| elements of x. Try that now to see why.

x[x>0]

[1] NA NA NA NA NA NA NA 1.030102 NA

[10] NA NA 1.169249 1.604523 1.667350 NA NA 2.189422 NA

[19] NA NA NA NA NA 1.128950 NA NA

| You are amazing!

|=============================== | 38%

| Since NA is not a value, but rather a placeholder for an unknown quantity, the

| expression NA > 0 evaluates to NA. Hence we get a bunch of NAs mixed in with our

| positive numbers when we do this.

...

|================================= | 41%

| Combining our knowledge of logical operators with our new knowledge of subsetting, we

| could do this -- x[!is.na(x) & x > 0]. Try it out.

x[!is.na(x)&x>0]

[1] 1.030102 1.169249 1.604523 1.667350 2.189422 1.128950

| You are really on a roll!

|=================================== | 44%

| In this case, we request only values of x that are both non-missing AND greater than

| zero.

...

|===================================== | 46%

| I've already shown you how to subset just the first ten values of x using x[1:10]. In

| this case, we're providing a vector of positive integers inside of the square brackets,

| which tells R to return only the elements of x numbered 1 through 10.

...

|======================================= | 49%

| Many programming languages use what's called 'zero-based indexing', which means that

| the first element of a vector is considered element 0. R uses 'one-based indexing',

| which (you guessed it!) means the first element of a vector is considered element 1.

...

|========================================= | 51%

| Can you figure out how we'd subset the 3rd, 5th, and 7th elements of x? Hint -- Use the

| c() function to specify the element numbers as a numeric vector.

x[c(3,5,7)]

[1] NA NA NA

| That's correct!

|=========================================== | 54%

| It's important that when using integer vectors to subset our vector x, we stick with

| the set of indexes {1, 2, ..., 40} since x only has 40 elements. What happens if we ask

| for the zeroth element of x (i.e. x[0])? Give it a try.

x[0]

numeric(0)

| Nice work!

|============================================= | 56%

| As you might expect, we get nothing useful. Unfortunately, R doesn't prevent us from

| doing this. What if we ask for the 3000th element of x? Try it out.

x[3000]

[1] NA

| Nice work!

|=============================================== | 59%

| Again, nothing useful, but R doesn't prevent us from asking for it. This should be a

| cautionary tale. You should always make sure that what you are asking for is within the

| bounds of the vector you're working with.

...

|================================================= | 62%

| What if we're interested in all elements of x EXCEPT the 2nd and 10th? It would be

| pretty tedious to construct a vector containing all numbers 1 through 40 EXCEPT 2 and

| 10.

...

|=================================================== | 64%

| Luckily, R accepts negative integer indexes. Whereas x[c(2, 10)] gives us ONLY the 2nd

| and 10th elements of x, x[c(-2, -10)] gives us all elements of x EXCEPT for the 2nd and

| 10 elements. Try x[c(-2, -10)] now to see this.

x[c(-2,-10)]

[1] -0.68754438 NA NA NA NA NA NA

[8] -0.01654302 -0.40799451 -0.55849418 NA -0.07687958 -0.05351510 NA

[15] NA 1.16924926 1.60452324 -0.08284351 1.66735009 NA NA

[22] 2.18942224 -0.14724334 NA NA -0.99999522 NA NA

[29] -0.12665386 -0.61215464 -0.58919026 NA NA 1.12894965 -1.36770314

[36] NA -1.33061090 NA

| That's a job well done!

|===================================================== | 67%

| A shorthand way of specifying multiple negative numbers is to put the negative sign out

| in front of the vector of positive numbers. Type x[-c(2, 10)] to get the exact same

| result.

x[-c(2,10)]

[1] -0.68754438 NA NA NA NA NA NA

[8] -0.01654302 -0.40799451 -0.55849418 NA -0.07687958 -0.05351510 NA

[15] NA 1.16924926 1.60452324 -0.08284351 1.66735009 NA NA

[22] 2.18942224 -0.14724334 NA NA -0.99999522 NA NA

[29] -0.12665386 -0.61215464 -0.58919026 NA NA 1.12894965 -1.36770314

[36] NA -1.33061090 NA

| Your dedication is inspiring!

|======================================================= | 69%

| So far, we've covered three types of index vectors -- logical, positive integer, and

| negative integer. The only remaining type requires us to introduce the concept of

| 'named' elements.

...

|========================================================= | 72%

| Create a numeric vector with three named elements using vect <- c(foo = 11, bar = 2,

| norf = NA).

vect<-c(foo=11,bar=2,norf=NA)

| You are amazing!

|=========================================================== | 74%

| When we print vect to the console, you'll see that each element has a name. Try it out.

vect

foo bar norf

11 2 NA

| That's a job well done!

|============================================================== | 77%

| We can also get the names of vect by passing vect as an argument to the names()

| function. Give that a try.

names(vect)

[1] "foo" "bar" "norf"

| Keep working like that and you'll get there!

|================================================================ | 79%

| Alternatively, we can create an unnamed vector vect2 with c(11, 2, NA). Do that now.

vect2<-c(11,2,NA)

| You are quite good my friend!

|================================================================== | 82%

| Then, we can add the `names`

attribute to vect2 after the fact with names(vect2) <-

| c("foo", "bar", "norf"). Go ahead.

names(vect2)<-c("foo","bar","norf")

| Keep up the great work!

|==================================================================== | 85%

| Now, let's check that vect and vect2 are the same by passing them as arguments to the

| identical() function.

identical(vect,vect2)

[1] TRUE

| Keep up the great work!

|====================================================================== | 87%

| Indeed, vect and vect2 are identical named vectors.

...

|======================================================================== | 90%

| Now, back to the matter of subsetting a vector by named elements. Which of the

| following commands do you think would give us the second element of vect?

1: vect["2"]

2: vect["bar"]

3: vect[bar]

Selection: 2

| That's a job well done!

|========================================================================== | 92%

| Now, try it out.

vect["bar"]

bar

2

| You are doing so well!

|============================================================================ | 95%

| Likewise, we can specify a vector of names with vect[c("foo", "bar")]. Try it out.

vect[c("foo","bar",'norf')]

foo bar norf

11 2 NA

| You're close...I can feel it! Try it again. Or, type info() for more options.

| Use vect[c("foo", "bar")] to get only the elements of vect named "foo" and "bar".

vect[c("foo","bar")]

foo bar

11 2

| You are quite good my friend!

|============================================================================== | 97%

| Now you know all four methods of subsetting data from vectors. Different approaches are

| best in different scenarios and when in doubt, try it out!

...

|================================================================================| 100%

| Would you like to receive credit for completing this course on Coursera.org?

1: No

2: Yes

Selection: 2

What is your email address? [email protected]

What is your assignment token? xXxXxxXXxXxxXXXx

Grade submission succeeded!

| You got it right!

| You've reached the end of this lesson! Returning to the main menu...

| Please choose a course, or type 0 to exit swirl.

*Last updated 2020-04-14 10:13:49.166837 IST*