R语言绘图简单教程
没选到数学就来搞搞统计。。
统计教授还是很有dalao风范的,Stanford本科+Oxford PHD。。
主要来自课内笔记和lab
绘图使用ggplot2库
Basic
一些比较基本的操作。首先下载ggthemes库,每次画图后+ theme_XXX()
加上主题。也可以用theme_set(theme_XXX())
设置缺省主题。后文都使用theme_set(theme_economist_white())
其次,有个比较辣鸡的图片合并库ggpubr,ggarrange(figure1, figure2, nrow=2, ncol=1)
。为何辣鸡呢?因为x轴y轴的刻度对不上。后文有更好方法
WEEK1
基本操作,建议google或看文档,有点编程基础的都没问题。
WEEK2
主要讲了讲histogram,boxplot(抱歉真不知道中文怎么说
Historgram
Basic
ggplot(titanic_survival, aes(x = age)) + geom_histogram()
aes里面是x轴,是前面数据包里面的一个子项。
以后的改进可以在ggplot括号里面加,调整的是整个图像的参数,在后半部分加的是只变更histogram的参数。(因为后面可以再加统计图类型,显示在一个坐标系中,后面有介绍
(用的titanic基本数据
几个改进方向:(没有顺序
Binwidth
就是每个柱子的单位大小,是x轴的单位
ggplot(titanic_survival, aes(x = age)) + geom_histogram(binwidth = 5)
Label and Title
这个还是比较简单的,xlab为x轴,ylab为y轴,ggtitle为标题
标题居中:+ theme(plot.title = element_text(hjust = 0.5))
ggplot(titanic_survival, aes(x = age)) + geom_histogram(binwidth = 5) + xlab("Age") + ylab("Count") + ggtitle("Age of Titanic Survivals") + theme(plot.title = element_text(hjust = 0.5))
Color
没颜色真的很难受。有两个参数,fill和color,fill是填充,color是边框
ggplot(titanic_survival, aes(x = age)) + geom_histogram(binwidth = 5, color = "blue", fill = "white") + xlab("Age") + ylab("Count") + ggtitle("Age of Titanic Survivals") + theme(plot.title = element_text(hjust = 0.5))
Mean Line
加个中线更直观。用geom_vline(aes(xintercept = xx))
ggplot(titanic_survival, aes(x = age)) + geom_histogram(binwidth = 5, color = "blue", fill = "white") + xlab("Age") + ylab("Count") + ggtitle("Age of Titanic Survivals") + theme(plot.title = element_text(hjust = 0.5)) + geom_vline(aes(xintercept = mean(titanic_survival$age)), color = "yellow")
Density Plot
这个就仁者见仁了,感觉normal distribution的时候很直观。注意要改很多地方。。。
附一个作业里面挺正态的一个
大概就先这样了,以后学到了再补充。
update1 Outliers
我们先在数据里面加个outlier,画个histogram
ggplot(titanic_survival2, aes(x = age)) + geom_histogram(binwidth = 5, color = "blue", fill = "white") + xlab("Age") + ylab("Count") + ggtitle("Age of Titanic Survivals") + theme(plot.title = element_text(hjust = 0.5)) + geom_vline(aes(xintercept = mean(titanic_survival$age)), color = "yellow")
其实很简单,就一句geom_text(aes(label = ifelse(age > 100, as.character(v1),'')), y = 8)
这个导入出了一点问题,v1里面存的是名字,而且是factor,要用as改下;y表示纵坐标。实在不行就抄上改改条件
ggplot(titanic_survival2, aes(x = age)) + geom_histogram(binwidth = 5, color = "blue", fill = "white") + xlab("Age") + ylab("Count") + ggtitle("Age of Titanic Survivals") + theme(plot.title = element_text(hjust = 0.5)) + geom_vline(aes(xintercept = mean(titanic_survival$age)), color = "yellow") + geom_text(aes(label = ifelse(age > 100, as.character(v1),'')), y = 8)
果然是万能的dp逃过了一劫
update2 Gradual Change Color
学了个很秀的渐变画图方法。。先从这里开始吧
ggplot(titanic_survival, aes(x = age)) + geom_histogram(binwidth = 5, color = "blue", fill = "white") + xlab("Age") + ylab("Count") + ggtitle("Age of Titanic Survivals") + theme(plot.title = element_text(hjust = 0.5)) + geom_vline(aes(xintercept = mean(titanic_survival$age)), color = "yellow")
首先我们将histogram切成100块,加上渐变颜色。geom_histogram(binwidth = 5, aes(fill = cut(age, 100)))
ggplot(titanic_survival, aes(x = age)) + geom_histogram(binwidth = 5, aes(fill = cut(age, 100))) + xlab("Age") + ylab("Count") + ggtitle("Age of Titanic Survivals") + theme(plot.title = element_text(hjust = 0.5)) + geom_vline(aes(xintercept = mean(titanic_survival$age)), color = "yellow")
额。。ggplot给每个颜色都上了个图例。。用show.legend = F
取消一下
ggplot(titanic_survival, aes(x = age)) + geom_histogram(binwidth = 5, aes(fill = cut(age, 100)), show.legend = F) + xlab("Age") + ylab("Count") + ggtitle("Age of Titanic Survivals") + theme(plot.title = element_text(hjust = 0.5)) + geom_vline(aes(xintercept = mean(titanic_survival$age)), color = "yellow")
是不是有点炫彩亮瞎狗眼的感觉。。。下面改进一下
1.加上alpha = x
改变一下透明度。注意x是百分比,而且alpha最好在涂色之前加,之后可能会出现诡异的错误
ggplot(titanic_survival, aes(x = age)) + geom_histogram(binwidth = 5, alpha = 0.85, aes(fill = cut(age, 100)), show.legend = F) + xlab("Age") + ylab("Count") + ggtitle("Age of Titanic Survivals") + theme(plot.title = element_text(hjust = 0.5)) + geom_vline(aes(xintercept = mean(titanic_survival$age)), color = "yellow")
2.限制颜色出现的区间
感觉五彩的过于不正经,感觉限制到2个配色还好。scale_fill_discrete(h = c(x, y))
,xy区间就多试试吧,也没啥好方法。。
ggplot(titanic_survival, aes(x = age)) + geom_histogram(binwidth = 5, alpha = 0.85, aes(fill = cut(age, 100)), show.legend = F) + scale_fill_discrete(h = c(200, 380)) + xlab("Age") + ylab("Count") + ggtitle("Age of Titanic Survivals") + theme(plot.title = element_text(hjust = 0.5)) + geom_vline(aes(xintercept = mean(titanic_survival$age)), color = "yellow")
3.改变浓度与亮度
scale_fill_discrete(h = c(200, 380), c = 120, l = 70)
浓度和亮度(chroma and luminance不知道翻译的对不对)分别对应c和l,个人感觉没啥区别。。但我也没啥美术功底
ggplot(titanic_survival, aes(x = age)) + geom_histogram(binwidth = 5, alpha = 0.85, aes(fill = cut(age, 100)), show.legend = F) + scale_fill_discrete(h = c(200, 380), c = 120, l = 70) + xlab("Age") + ylab("Count") + ggtitle("Age of Titanic Survivals") + theme(plot.title = element_text(hjust = 0.5)) + geom_vline(aes(xintercept = mean(titanic_survival$age)), color = "yellow")
附上作业里面感觉不错的两张
为了作业文档不超过50m压了下画质。。这样表示outliers还是很资瓷的
Boxplot
不会像上面讲的那么细了,建议按顺序观看
Basic
boxplot一般都是对比了,我就直接双变量了。。
ggplot(titanic_survival, aes(x = sex, y = age)) + geom_boxplot()
Color
同上,fill
ggplot(titanic_survival, aes(x = sex, y = age)) + geom_boxplot(fill = c("red", "blue"))
Label and Title
同上
ggplot(titanic_survival, aes(x = sex, y = age)) + geom_boxplot(fill = c("red", "blue")) + xlab("Gender") + ylab("Age") + ggtitle("Boxplot") + theme(plot.title = element_text(hjust = 0.5))
Outliers
感觉这个功能还不错
ggplot(titanic_survival, aes(x = sex, y = age)) + geom_boxplot(fill = c("red", "blue"), outlier.colour="red", outlier.shape=8) + xlab("Gender") + ylab("Age") + ggtitle("Boxplot") + theme(plot.title = element_text(hjust = 0.5))
Scatterplot
这个当然要用用烂的auto.mpg了hhhhh
Basic
用geom_points()
ggplot(auto.mpg, aes(x = weight, y = mpg)) + geom_point()
加上标题啥的
ggplot(auto.mpg, aes(x = weight, y = mpg)) + geom_point() + xlab("Weigth") + ylab("MPG") + ggtitle("Relationship Between MPG and Weight") + theme(plot.title = element_text(hjust = 0.5))
Color Size Shape
这三个基本上一模一样,color是颜色,size是大小,shape是形状。所以只讲color
这里讲下aes里面的color和外面的有什么不一样。。简单来说,外面就是全部变成一个颜色,里面就是搜索数据库里面的变量,如果有就染色,没有就新建一个
而且这里的染色是ggplot的缺省染色,比较奇怪(丑
ggplot(auto.mpg, aes(x = weight, y = mpg)) + geom_point(aes(color = factor(cylinders))) + xlab("Weigth") + ylab("MPG") + ggtitle("Relationship Between MPG and Weight") + theme(plot.title = element_text(hjust = 0.5))
现在有两个问题,一个是图例,一个是颜色,我们一个一个解决
图例的话查了很多东西,发现了一个奇怪的语句,就抄上吧 = =
ggplot(auto.mpg, aes(x = weight, y = mpg)) + geom_point(aes(color = factor(cylinders))) + xlab("Weigth") + ylab("MPG") + ggtitle("Relationship Between MPG and Weight") + theme(plot.title = element_text(hjust = 0.5)) + scale_colour_discrete(name = "Cylinders", breaks = c('3', '4', '5', '6', '8'), labels = c("Three", "Four", '5', '6', '8'))
应该还是比较好懂的,breaks里面放原来factor的东西,后面可以不改的但还是演示一下
Google一波。。指令竟然是同一个
ggplot(auto.mpg, aes(x = weight, y = mpg)) + geom_point(aes(color = factor(cylinders))) + xlab("Weigth") + ylab("MPG") + ggtitle("Relationship Between MPG and Weight") + theme(plot.title = element_text(hjust = 0.5)) +scale_color_manual(name = "Cylinders", breaks = c('3', '4', '5', '6', '8'), values = c("purple", "red", 'yellow', 'blue', 'green'))
(好像更丑了。。。
总结下:scale_color_manual(name = "图例名", breaks = c('3', '4', '5', '6', '8')<-factor里面的值且图例显示按照排列顺序, values = c("purple", "red", 'yellow', 'blue', 'green'))<-颜色,顺序和前面对应, labels = c("Three", "Four", '5', '6', '8'))<-每个图例的名称,顺序和前面对应
Smooth
这个是自带的拟合geom_smooth()
ggplot(auto.mpg, aes(x = weight, y = mpg)) + geom_point(aes(color = factor(cylinders))) + xlab("Weigth") + ylab("MPG") + ggtitle("Relationship Between MPG and Weight") + theme(plot.title = element_text(hjust = 0.5)) +scale_color_manual(name = "Cylinders", breaks = c('3', '4', '5', '6', '8'), values = c("purple", "red", 'yellow', 'blue', 'green')) + geom_smooth(color = "orange")
里面的几个比较有用的参数method = lm/glm/gam/loess,loess局部加权多项式模型,lm线性模型,另外几个一般用不到。默认为loess
还有个se = T/F表示是否要那圈灰色的东西,不演示了
ggplot(auto.mpg, aes(x = weight, y = mpg)) + geom_point(aes(color = factor(cylinders))) + xlab("Weigth") + ylab("MPG") + ggtitle("Relationship Between MPG and Weight") + theme(plot.title = element_text(hjust = 0.5)) +scale_color_manual(name = "Cylinders", breaks = c('3', '4', '5', '6', '8'), values = c("purple", "red", 'yellow', 'blue', 'green')) + geom_smooth(color = "orange", method = lm)
Wrap
这其实是个很强大的功能,只是在这个图上表现比较辣鸡facet_wrap(~XXX, ncol = 3)
。不知道怎么表述,感受一下吧
ggplot(auto.mpg, aes(x = weight, y = mpg)) + geom_point(aes(color = factor(cylinders))) + xlab("Weigth") + ylab("MPG") + ggtitle("Relationship Between MPG and Weight") + theme(plot.title = element_text(hjust = 0.5)) +scale_color_manual(name = "Cylinders", breaks = c('3', '4', '5', '6', '8'), values = c("purple", "red", 'yellow', 'blue', 'green')) + geom_smooth(color = "orange", method = lm) + facet_wrap(~cylinders, ncol = 3)
emmm….看个正常点的吧,某次作业里的
所以只对大小比较相近的数据有效,不行就ggpubr吧