# GGPlot2 Part2

library(swirl)

swirl()

| Welcome to swirl! Please sign in. If you've been here before, use the same name as

| you did then. If you are new, call yourself something unique.

What shall I call you? Krishnakanth Allika

| Please choose a course, or type 0 to exit swirl.

1: Exploratory Data Analysis

2: Take me to the swirl course repository!

Selection: 1

| Please choose a lesson, or type 0 to return to course menu.

```
1: Principles of Analytic Graphs 2: Exploratory Graphs
3: Graphics Devices in R 4: Plotting Systems
5: Base Plotting System 6: Lattice Plotting System
7: Working with Colors 8: GGPlot2 Part1
9: GGPlot2 Part2 10: GGPlot2 Extras
11: Hierarchical Clustering 12: K Means Clustering
13: Dimension Reduction 14: Clustering Example
15: CaseStudy
```

Selection: 9

| Attempting to load lesson dependencies...

| Package ‘ggplot2’ loaded correctly!

| | 0%

| GGPlot2_Part2. (Slides for this and other Data Science courses may be found at github

| https://github.com/DataScienceSpecialization/courses/. If you care to use them, they

| must be downloaded as a zip file and viewed locally. This lesson corresponds to

| 04_ExploratoryAnalysis/ggplot2.)

...

|== | 2%

| In a previous lesson we showed you the vast capabilities of qplot, the basic

| workhorse function of the ggplot2 package. In this lesson we'll focus on some

| fundamental components of the package. These underlie qplot which uses default values

| when it calls them. If you understand these building blocks, you will be better able

| to customize your plots. We'll use the second workhorse function in the package,

| ggplot, as well as other graphing functions.

...

|=== | 4%

| Do you remember what the gg of ggplot2 stands for?

1: grammar of graphics

2: good grief

3: great graphics

4: goto graphics

Selection: 1

| That's the answer I was looking for.

|===== | 6%

| A "grammar" of graphics means that ggplot2 contains building blocks with which you

| can create your own graphical objects. What are these basic components of ggplot2

| plots? There are 7 of them.

...

|====== | 8%

| Obviously, there's a DATA FRAME which contains the data you're trying to plot. Then

| the AESTHETIC MAPPINGS determine how data are mapped to color, size, etc. The GEOMS

| (geometric objects) are what you see in the plot (points, lines, shapes) and FACETS

| are the panels used in conditional plots. You've used these or seen them used in the

| first ggplot2 (qplot) lesson.

...

|======== | 10%

| There are 3 more. STATS are statistical transformations such as binning, quantiles,

| and smoothing which ggplot2 applies to the data. SCALES show what coding an aesthetic

| map uses (for example, male = red, female = blue). Finally, the plots are depicted on

| a COORDINATE SYSTEM. When you use qplot these were taken care of for you.

...

|========== | 12%

| Do you remember what the "artist's palette" model means in the context of plotting?

1: we draw pictures

2: we mix paints

3: plots are built up in layers

4: things get messy

Selection: 3

| You nailed it! Good job!

|=========== | 15%

| As in the base plotting system (and in contrast to the lattice system), when building

| plots with ggplot2, the plots are built up in layers, maybe in several steps. You can

| plot the data, then overlay a summary (for instance, a regression line or smoother)

| and then add any metadata and annotations you need.

...

|============= | 17%

| We'll keep using the mpg data that comes with the ggplot2 package. Recall the

| versatility of qplot. Just as a refresher, call qplot now with 5 arguments. The first

| 3 deal with data - displ, hwy, and data=mpg. The fourth is geom set equal to the

| concatenation of the two strings, "point" and "smooth". The fifth is facets set equal

| to the formula .~drv. Try this now.

qplot(displ,hwy,data=mpg,geom=c("point","smooth"),facets=.~drv)

`geom_smooth()`

using method = 'loess' and formula 'y ~ x'

| You got it!

|=============== | 19%

| We see a 3 facet plot, one for each drive type (4, f, and r). Now we'll see how

| ggplot works. We'll build up a similar plot using the basic components of the

| package. We'll do this in a series of steps.

...

|================ | 21%

| First we'll create a variable g by assigning to it the output of a call to ggplot

| with 2 arguments. The first is mpg (our dataset) and the second will tell ggplot what

| we want to plot, in this case, displ and hwy. These are what we want our aesthetics

| to represent so we enclose these as two arguments to the function aes. Try this now.

g<-ggplot(mpg,aes(displ,hwy))

| You are quite good my friend!

|================== | 23%

| Notice that nothing happened? As in the lattice system, ggplot created a graphical

| object which we assigned to the variable g.

...

|==================== | 25%

| Run the R command summary with g as its argument to see what g contains.

summary(g)

`data: manufacturer, model, displ, year, cyl, trans, drv, cty, hwy, fl, class [234x11] mapping: x = ~displ, y = ~hwy faceting: <ggproto object: Class FacetNull, Facet, gg> compute_layout: function draw_back: function draw_front: function draw_labels: function draw_panels: function finish_data: function init_scales: function map_data: function params: list setup_data: function setup_params: function shrink: TRUE train_scales: function vars: function super: <ggproto object: Class FacetNull, Facet, gg>`

| You are quite good my friend!

|===================== | 27%

| So g contains the mpg data with all its named components in a 234 by 11 matrix. It

| also contains a mapping, x (displ) and y (hwy) which you specified, and no faceting.

...

|======================= | 29%

| Note that if you tried to print g with the expressions g or print(g) you'd get an

| error! Even though it's a great package, ggplot doesn't know how to display the data

| yet since you didn't specify how you wanted to see it. Now type g+geom_point() and

| see what happens.

g+geom_point()

| You are quite good my friend!

|======================== | 31%

| By calling the function geom_point you added a layer. By not assigning the expression

| to a variable you displayed a plot. Notice that you didn't have to pass any arguments

| to the function geom_point. That's because the object g has all the data stored in

| it. (Remember you saw that when you ran summary on g before.) Now use the expression

| you just typed (g + geom_point()) and add to it another layer, a call to

| geom_smooth(). Notice the red message R gives you.

g+geom_point()+geom_smooth()

`geom_smooth()`

using method = 'loess' and formula 'y ~ x'

| You got it!

|========================== | 33%

| The gray shadow around the blue line is the confidence band. See how wide it is at

| the right? Let's try a different smoothing function. Use the up arrow to recover the

| expression you just typed, and instead of calling geom_smooth with no arguments, call

| it with the argument method set equal to the string "lm".

g+geom_point()+geom_smooth(method="lm")

`geom_smooth()`

using formula 'y ~ x'

| Excellent work!

|============================ | 35%

| By changing the smoothing function to "lm" (linear model) ggplot2 generated a

| regression line through the data. Now recall the expression you just used and add to

| it another call, this time to the function facet_grid. Use the formula . ~ drv as it

| argument. Note that this is the same type of formula used in the calls to qplot.

g+geom_point()+geom_smooth(method="lm")+facet_grid(.~drv)

`geom_smooth()`

using formula 'y ~ x'

| Your dedication is inspiring!

|============================= | 38%

| Notice how each panel is labeled with the appropriate factor. All the data associated

| with 4-wheel drive cars is in the leftmost panel, front-wheel drive data is shown in

| the middle panel, and rear-wheel drive data in the rightmost. Notice that this is

| similar to the plot you created at the start of the lesson using qplot. (We used a

| different smoothing function than previously.)

...

|=============================== | 40%

| So far you've just used the default labels that ggplot provides. You can add your own

| annotation using functions such as xlab(), ylab(), and ggtitle(). In addition, the

| function labs() is more general and can be used to label either or both axes as well

| as provide a title. Now recall the expression you just typed and add a call to the

| function ggtitle with the argument "Swirl Rules!".

g+geom_point()+geom_smooth(method="lm")+facet_grid(.~drv)+ggtitle("Swirl Rules!")

`geom_smooth()`

using formula 'y ~ x'

| You are doing so well!

|================================ | 42%

| Now that you've seen the basics we'll talk about customizing. Each of the “geom”

| functions (e.g., _point and _smooth) has options to modify it. Also, the function

| theme() can be used to modify aspects of the entire plot, e.g. the position of the

| legend. Two standard appearance themes are included in ggplot. These are theme_gray()

| which is the default theme (gray background with white grid lines) and theme_bw()

| which is a plainer (black and white) color scheme.

...

|================================== | 44%

| Let's practice modifying aesthetics now. We'll use the graphic object g that we

| already filled with mpg data and add a call to the function geom_point, but this time

| we'll give geom_point 3 arguments. Set the argument color equal to "pink", the

| argument size to 4, and the argument alpha to 1/2. Notice that all the arguments are

| set equal to constants.

g+geom_point(color="pink",size=4,alpha=0.5)

| You are doing so well!

|==================================== | 46%

| Notice the different shades of pink? That's the result of the alpha aesthetic which

| you set to 1/2. This aesthetic tells ggplot how transparent the points should be.

| Darker circles indicate values hit by multiple data points.

...

|===================================== | 48%

| Now we'll modify the aesthetics so that color indicates which drv type each point

| represents. Again, use g and add to it a call to the function geom_point with 3

| arguments. The first is size set equal to 4, the second is alpha equal to 1/2. The

| third is a call to the function aes with the argument color set equal to drv. Note

| that you MUST use the function aes since the color of the points is data dependent

| and not a constant as it was in the previous example.

g+geom_point(size=4,alpha=0.5,aes(color=drv))

| That's a job well done!

|======================================= | 50%

| Notice the helpful legend on the right decoding the relationship between color and

| drv.

...

|========================================= | 52%

| Now we'll practice modifying labels. Again, we'll use g and add to it calls to 3

| functions. First, add a call to geom_point with an argument making the color

| dependent on the drv type (as we did in the previous example). Second, add a call to

| the function labs with the argument title set equal to "Swirl Rules!". Finally, add a

| call to labs with 2 arguments, one setting x equal to "Displacement" and the other

| setting y equal to "Hwy Mileage".

g+geom_point(aes(color=drv))+labs(title="Swirl Rules!")+labs(x="Displacement",y="Hwy Mileage")

| You are amazing!

|========================================== | 54%

| Note that you could have combined the two calls to the function labs in the previous

| example. Now we'll practice customizing the geom_smooth calls. Use g and add to it a

| call to geom_point setting the color to drv type (remember to use the call to the aes

| function), size set to 2 and alpha to 1/2. Then add a call to geom_smooth with 4

| arguments. Set size equal to 4, linetype to 3, method to "lm", and se to FALSE.

g+geom_point(aes(color=drv),size=2,alpha=0.5)+geom_smooth(size=4,linetype=3,method="lm",se=FALSE)

`geom_smooth()`

using formula 'y ~ x'

| Perseverance, that's the answer.

|============================================ | 56%

| What did these arguments do? The method specified a linear regression (note the

| negative slope indicating that the bigger the displacement the lower the gas

| mileage), the linetype specified that it should be dashed (not continuous), the size

| made the dashes big, and the se flag told ggplot to turn off the gray shadows

| indicating standard errors (confidence intervals).

...

|============================================= | 58%

| Finally, let's do a simple plot using the black and white theme, theme_bw. Specify g

| and add a call to the function geom_point with the argument setting the color to the

| drv type. Then add a call to the function theme_bw with the argument base_family set

| equal to "Times". See if you notice the difference.

g+geom_point(aes(color=drv))+theme_bw(base_family = "Times")

There were 13 warnings (use warnings() to see them)

| Nice work!

|=============================================== | 60%

| No more gray background! Also, if you have good eyesight, you'll notice that the font

| in the labels changed.

...

|================================================= | 62%

| One final note before we go through a more complicated, layered ggplot example, and

| this concerns the limits of the axes. We're pointing this out to emphasize a subtle

| difference between ggplot and the base plotting function plot.

...

|================================================== | 65%

| We've created some random x and y data, called myx and myy, components of a dataframe

| called testdat. These represent 100 random normal points, except halfway through, we

| made one of the points be an outlier. That is, we set its y-value to be out of range

| of the other points. Use the base plotting function plot to create a line plot of

| this data. Call it with 4 arguments - myx, myy, type="l", and ylim=c(-3,3). The

| type="l" tells plot you want to display the data as a line instead of as a

| scatterplot.

warning messages from top-level task callback 'mini'

There were 40 warnings (use warnings() to see them)

play()

| Entering play mode. Experiment as you please, then type nxt() when you are ready to

| resume the lesson.

g+geom_point(aes(color=drv))+theme_dark()

g+geom_point(aes(color=drv))+theme_minimal()

g+geom_point(aes(color=drv))+theme_grey()

nxt()

| Resuming lesson...

| We've created some random x and y data, called myx and myy, components of a dataframe

| called testdat. These represent 100 random normal points, except halfway through, we

| made one of the points be an outlier. That is, we set its y-value to be out of range

| of the other points. Use the base plotting function plot to create a line plot of

| this data. Call it with 4 arguments - myx, myy, type="l", and ylim=c(-3,3). The

| type="l" tells plot you want to display the data as a line instead of as a

| scatterplot.

plot(myx,myy,type="l",ylim=c(-3,3))

| You got it!

|==================================================== | 67%

| Notice how plot plotted the points in the (-3,3) range for y-values. The outlier at

| (50,100) is NOT shown on the line plot. Now we'll plot the same data with ggplot.

| Recall that the name of the dataframe is testdat. Create the graphical object g with

| a call to ggplot with 2 arguments, testdat (the data) and a call to aes with 2

| arguments, x set equal to myx, and y set equal to myy.

g<-ggplot(data=testdat,aes(x=myx,y=myy))

| You got it!

|====================================================== | 69%

| Now add a call to geom_line with 0 arguments to g.

g+geom_line()

| You got it right!

|======================================================= | 71%

| Notice how ggplot DID display the outlier point at (50,100). As a result the rest of

| the data is smashed down so you don't get to see what the bulk of it looks like. The

| single outlier probably isn't important enough to dominate the graph. How do we get

| ggplot to behave more like plot in a situation like this?

...

|========================================================= | 73%

| Let's take a guess that in addition to adding geom_line() to g we also just have to

| add ylim(-3,3) to it as we did with the call to plot. Try this now to see what

| happens.

g+geom_line()+ylim(-3,3)

| Perseverance, that's the answer.

|========================================================== | 75%

| Notice that by doing this, ggplot simply ignored the outlier point at (50,100).

| There's a break in the line which isn't very noticeable. Now recall that at the

| beginning of the lesson we mentioned 7 components of a ggplot plot, one of which was

| a coordinate system. This is a situation where using a coordinate system would be

| helpful. Instead of adding ylim(-3,3) to the expression g+geom_line(), add a call to

| the function coord_cartesian with the argument ylim set equal to c(-3,3).

g+geom_line()+coord_cartesian(ylim=c(-3,3))

| You are really on a roll!

|============================================================ | 77%

| See the difference? This looks more like the plot produced by the base plot function.

| The outlier y value at x=50 is not shown, but the plot indicates that it is larger

| than 3.

...

|============================================================== | 79%

| We'll close with a more complicated example to show you the full power of ggplot and

| the entire ggplot2 package. We'll continue to work with the mpg dataset.

...

|=============================================================== | 81%

| Start by creating the graphical object g by assigning to it a call to ggplot with 2

| arguments. The first is the dataset and the second is a call to the function aes.

| This call will have 3 arguments, x set equal to displ, y set equal to hwy, and color

| set equal to factor(year). This last will allow us to distinguish between the two

| manufacturing years (1999 and 2008) in our data.

g<-ggplot(data=mpg,aes(x=displ,y=hwy,color=factor(year)))

| All that practice is paying off!

|================================================================= | 83%

| Uh oh! Nothing happened. Does g exist? Of course, it just isn't visible yet since you

| didn't add a layer.

...

|=================================================================== | 85%

| If you typed g at the command line, what would happen?

1: a scatterplot would appear with 2 colors of points

2: I would have to try this to answer the question

3: R would return an error in red

Selection: 3

| You got it!

|==================================================================== | 88%

| We'll build the plot up step by step. First add to g a call to the function

| geom_point with 0 arguments.

g+geom_point()

| You nailed it! Good job!

|====================================================================== | 90%

| A simple, yet comfortingly familiar scatterplot appears. Let's make our display a 2

| dimensional multi-panel plot. Recall your last command (with the up arrow) and add to

| it a call the function facet_grid. Give it 2 arguments. The first is the formula

| drv~cyl, and the second is the argument margins set equal to TRUE. Try this now.

g+geom_point()+facet_grid(drv~cyl,margins=TRUE)

| Keep up the great work!

|======================================================================== | 92%

| A 4 by 5 plot, huh? The margins argument tells ggplot to display the marginal totals

| over each row and column, so instead of seeing 3 rows (the number of drv factors) and

| 4 columns (the number of cyl factors) we see a 4 by 5 display. Note that the panel in

| position (4,5) is a tiny version of the scatterplot of the entire dataset.

...

|========================================================================= | 94%

| Now add to your last command (or retype it if you like to type) a call to geom_smooth

| with 4 arguments. These are method set to "lm", se set to FALSE, size set to 2, and

| color set to "black".

g+geom_point()+facet_grid(drv~cyl,margins=TRUE)+geom_smooth(method="lm",se=FALSE,size=2,color="black")

`geom_smooth()`

using formula 'y ~ x'

| Keep up the great work!

|=========================================================================== | 96%

| Angry Birds? Finally, add to your last command (or retype it if you like to type) a

| call to the function labs with 3 arguments. These are x set to "Displacement", y set

| to "Highway Mileage", and title set to "Swirl Rules!".

g+geom_point()+facet_grid(drv~cyl,margins=TRUE)+geom_smooth(method="lm",se=FALSE,size=2,color="black")+labs(x="Displacement",y="Highway Mileage",title="Swirl Rules!")

`geom_smooth()`

using formula 'y ~ x'

| Keep working like that and you'll get there!

|============================================================================ | 98%

| You could have done these labels with separate calls to labs but we thought you'd be

| sick of this by now. Anyway, congrats! You've concluded part 2 of ggplot2. We hope

| you got enough mileage out of the lesson. If you like ggplot2 you can do some extras

| with the extra lesson.

...

|==============================================================================| 100%

| Would you like to receive credit for completing this course on Coursera.org?

1: Yes

2: No

Selection: 1

What is your email address? [email protected]

What is your assignment token? xXxXxxXXxXxxXXXx

Grade submission succeeded!

| You got it right!

| You've reached the end of this lesson! Returning to the main menu...

| Please choose a course, or type 0 to exit swirl.

1: Exploratory Data Analysis

2: Take me to the swirl course repository!

Selection: 0

| Leaving swirl now. Type swirl() to resume.

g+geom_point()+facet_grid(drv~cyl,margins=TRUE)+geom_smooth(method="lm",se=FALSE,size=2,color="black")+labs(x="Displacement",y="Highway Mileage",title="Swirl Rules!")+theme(plot.title = element_text(hjust = 0.5))

`geom_smooth()`

using formula 'y ~ x'

rm(list=ls())

*Last updated 2020-05-08 21:23:15.085181 IST*

## Comments