How to Create Publication Quality Figures in R

M O T I V A T I O N

I am currently chest-deep in the profoundly fulfilling but also long & arduous process of performing data analysis for and writing up the first chapter of my dissertation (yay). I think one of the many lessons I’ve learned from this experience (aside from the facts that too much caffeine makes my tummy hurt & office puppies make the best writing companions) is that creating this thing & getting it ready to share with the world (or at least the small sliver of the world also interested in phenotypic plasticity/hatching behavior) is an incredibly iterative process. So far, there has been no such thing as doing something right the first time. I was foolish to think there would be anything less than far. too. many. versions of every figure and every sentence. If the manuscript version #32 version of myself were to give manuscript version #1 or 2 version of myself a single piece of advice, it would be::: for the love of god data, to make editable, re-producible figures from the onset. Because the alternative is to spend many bleak hours unnecessarily fixing and futzing in Adobe Illustrator (the bane of my existance). SO, I’ve decided to write up a blog post about the useful things I’ve learned recently about making pretty figures that will hopefully make your peer reviewers be like “yes we’d like to publish you now.”

L O A D P A C K A G E S

Here’s a place where you can load all the packages you’ll need to run the code on this blog post. I’ll also try to copy paste to the relevant sections.

library(xlsx); library(RColorBrewer); library(extrafont); library(ggplot2);  library(cowplot); library(gridExtra)
## Registering fonts with R
## 
## Attaching package: 'cowplot'
## The following object is masked from 'package:ggplot2':
## 
##     ggsave

D A T A

I’ll read in some sample data from my work!

Daily data set

#library(xlsx)
Daily.df<-read.xlsx("Daily_20180701.xls", sheetName="Daily")
#str(Daily.df) #note this includes the two 7d indivs
Daily.df <- subset(Daily.df, Daily.df$Age<7) #Subset out the 2 individuals at 7 days
# str(Daily.df) # now only ages 3 - 6 days

Subset stages so that I can color each separately.

stage2 <- subset(Daily.df, StageAge == 2)
stage3 <- subset(Daily.df, StageAge == 3)
stage4 <- subset(Daily.df, StageAge == 4)
stage5 <- subset(Daily.df, StageAge == 5)
stage6 <- subset(Daily.df, StageAge == 6)
stage7 <- subset(Daily.df, StageAge == 7)
stage8 <- subset(Daily.df, StageAge == 8)
stage9 <- subset(Daily.df, StageAge == 9)

Sibs data set

SibshipsSonia.df<-read.xlsx("Sibships_20190109.xlsx", sheetName="Sibships")

#subset only ages 3.75-5.75
SubsetSibshipsSonia.df<- subset(SibshipsSonia.df, Age ==3.75 | Age ==4 | Age ==4.25 | Age == 4.5| Age ==4.75| Age ==5.75, na.rm=T)

Subset clutches so that I can color each separately.

c101<- subset(SubsetSibshipsSonia.df, Clutch ==101, na.rm=T)
c102<- subset(SubsetSibshipsSonia.df, Clutch ==102, na.rm=T)
c104<- subset(SubsetSibshipsSonia.df, Clutch ==104, na.rm=T)
c105<- subset(SubsetSibshipsSonia.df, Clutch ==105, na.rm=T)
c106<- subset(SubsetSibshipsSonia.df, Clutch ==106, na.rm=T)

C O L O R S

There are a ton of resources and various blog posts available on the interwebs going into super in depth detail about colors in both base R and ggplot2. I’ll just quickly point out three of my favorites. * First, there exists a package for Wes Anderson themed color palettes called wesanderson that I really recommend. * Second, another excellent package called ggsci lists color palettes for specific scientific journals and science-fiction themes. * Lastly, RColorBrewer is an R package with colorful pre-made palettes that come in three categories: qualitative, diverging, and sequential.

#library(RColorBrewer)
stagecolors <- brewer.pal("Dark2", n=6)
clutchcolors<- brewer.pal("Set1", n=5)
clutchgrays<- brewer.pal("Greys", n=9)

F O N T S

The journal I’m submitting to prefers figure text to be in 8pt Arial font with 12pt bold uppercase letters to distinguish figure panels. I’ll use the package extrafont to make the text in Arial. To do this, first I had to install extrafont, and then import the fonts from my system into the extrafont database. You can view available fonts by running fonts() or fonttable().

#library(extrafont)
#font_import()

Once you’ve imported the fonts from your system to the extrafont database, they must be registered with R as being available for the PDF output device. This must be run once in each R session where you want to use the fonts.

#loadfonts()

If you want to output to .ps files instead of .pdf, use loadfonts(device=“postscript”). Once the fonts are registered with R’s PDF device, you can create figures with them by calling the family, either in base R or ggplot2.

The last step is to embed the fonts you use. Extrafont uses GhostScript, a free PostScript interpreter, to embed the fonts. You’ll need to make sure it’s installed on your computer (note Ghostscript is not an R package). To embed the fonts, use embed_fonts(). This last step I will run after I made the plot, so it’s commented out now.

#embed_fonts("plotname.pdf", outfile = "plotname_embed.pdf")

If outfile is not specified, it will overwrite the original file.

M A K E P L O T

Plot part A of figure:

plot.daily <- ggplot(Daily.df, aes(x=Age, y=CorrectedAvg, group=as.factor(as.character(Age)))) +
  geom_boxplot(data=Daily.df, size=1) +
  geom_point(data=stage3, aes(colour=stagecolors[1]), size=3) +
  geom_point(data=stage4, aes(colour=stagecolors[2]), size=3) +
  geom_point(data=stage5, aes(colour=stagecolors[3]), size=3) +
  geom_point(data=stage6, aes(colour=stagecolors[4]), size=3) +
  geom_point(data=stage7, aes(colour=stagecolors[5]), size=3) +
  geom_point(data=stage8, aes(colour=stagecolors[6]), size=3) +
  theme_cowplot(font_size = 14, line_size = 1) +
  scale_color_identity(name="Stage", 
                       breaks=c(stagecolors[1], stagecolors[2], stagecolors[3], stagecolors[4], stagecolors[5], stagecolors[6]), 
                       labels=c(3, 4, 5, 6, 7, 8), 
                       guide="legend") +
  scale_x_continuous(breaks = c(3, 4, 5, 6))+
  labs(y = "VOR amplitude (°)\n") +
  theme(axis.text.y=element_text(size=14, colour= "black")) +
  theme(axis.text.x=element_text(size=14, colour= "black")) +
  theme(axis.title.x=element_blank()) +
  theme(axis.title.y=element_text(size=14, colour = "black")) +
  theme(legend.justification = c(0,1), legend.position = c(0.05,1)) +
  theme(legend.background = element_rect(fill="lightgray",
                                  size=0.4, linetype="solid", 
                                  colour ="black")) + 
  annotate("text",x=3,y=15,label="a", size=6)+
  annotate("text",x=4,y=35,label="b", size=6)+
  annotate("text",x=5,y=33,label="b", size=6)+
  annotate("text",x=6,y=40,label="b", size=6)

plot.daily

Plot part B of figure:

plot.sibs<- ggplot(SubsetSibshipsSonia.df, aes(x=Age, y=CorrectedAvg, group=Clutch)) +
  geom_point(data=c101, colour=clutchgrays[5], size=3) +
  #geom_smooth(data=c101, method = "lm", formula = y ~ poly(x, 2),  se=FALSE, aes(color=clutchcolors[1])) +
  geom_point(data=c102, colour=clutchgrays[6], size=3) +
  #geom_smooth(data=c102, method = "lm", formula = y ~ poly(x, 2),  se=FALSE, aes(color=clutchcolors[2])) +
  geom_point(data=c104, colour=clutchgrays[7], size=3) +
  #geom_smooth(data=c104, method = "lm", formula = y ~ poly(x, 2),  se=FALSE, aes(color=clutchcolors[3])) +
  geom_point(data=c105, colour=clutchgrays[8], size=3) +
  #geom_smooth(data=c105, method = "lm", formula = y ~ poly(x, 2),  se=FALSE, aes(color=clutchcolors[4])) +
  geom_point(data=c106, colour=clutchgrays[9], size=3) +
  #geom_smooth(data=c106, method = "lm", formula = y ~ poly(x, 2),  se=FALSE, aes(color=clutchcolors[5])) +
  theme_cowplot(font_size = 14, line_size = 1) +
  scale_color_identity(name="Clutch", 
                       breaks=c(clutchgrays[5], clutchgrays[6], clutchgrays[7], clutchgrays[8], clutchgrays[9]), 
                       labels=c(101, 102, 104, 105, 106), 
                       guide="legend") +
  labs(x = "\nDevelopmental age (days)", 
       y = "VOR amplitude (°)\n") +
  theme(legend.justification = c(0,1), legend.position = c(0.05,1)) +
  theme(legend.background = element_rect(fill="lightgray",
                                  size=0.5, linetype="solid", 
                                  colour ="black")) + 
  scale_x_continuous(breaks = c(4, 4.5, 5, 5.5))+
  scale_y_continuous(breaks = c(0, 10, 20, 30, 40, 50))+
  theme(axis.text.y=element_text(size=14, colour= "black")) +
  theme(axis.text.x=element_text(size=14, colour= "black")) +
  theme(axis.title.x=element_text(size=14, colour = "black")) +
  theme(axis.title.y=element_text(size=14, colour = "black")) +
  annotate("text",x=3.75,y=7,label="a", size=6)+
  annotate("text",x=4,y=13,label="a", size=6)+
  annotate("text",x=4.25,y=30,label="b", size=6)+
  annotate("text",x=4.5,y=45,label="c", size=6)+
  annotate("text",x=4.75,y=15,label="d", size=6)+
  annotate("text",x=5.75,y=7,label="d", size=6)

plot.sibs

S A V I N G S T U F F

It’s tempting to just create graphics to the on-screen device and just use “Save As…” from the menu, but this doesn’t allow you to explicitly set the options for the device and/or choose the file format. Although this default figure might look ~ okay ~ in the “Plots” section of RStudio, it does not export as a high-resolution figure. The default resolution of R and RStudio images are exported at 72 ppi - insufficient for publication. Most journals require 300 ppi images in TIFF, EPS, PNG or PDF format. Also, if you resize the graphics window after you create the graph, you can get some unexpected results (like strange ratios/circles that look like ovals). Avoid using dev.copy for the same reason, despite its convenience.

Best practice is to create a script file that begins with a call to the device driver (usually pdf or png), runs the graphics commands, and then finishes with a call to dev.off (). This way, you’ll get better-looking results AND you’ll be able to recreate the graphic file months later when you’ve definitely forgotten how you did it manually. To export a high-resolution figure, we first assign the plot and then use another package called gridExtra.

# Load required package
require(gridExtra)

Figure4 <- plot_grid(plot.daily, plot.sibs, labels = "AUTO", ncol = 1, align = 'v')

Figure4

This figure looks pretty squished in the html preview, but when I save this figure to my desktop, all the things are proportional and pretty!

png("~/Desktop/Figure4.png", width = 18, height = 21, units = 'cm', res = 300)
grid.arrange (Figure4) # Make plot

#don't forget to embed fonts.
#embed_fonts("plotname.pdf", outfile = "plotname_embed.pdf")
dev.off()
## quartz_off_screen 
##                 2

For a .TIFF or .pdf format, change the relevant funtion and extension.

I recommend using a vector-based graphic format. This means that the graphic is represented in a scale-independent format, and can be recreated in any size small or large without resulting in jagged lines or pixellated text. When you print it or zoom way in, lines will appear smooth and text will be clear, even if the graphic has been enlarged or reduced and regardless of the DPI (dots-per-inch) rating on the printer. In practice, Microsoft products sometimes don’t handle vector graphic reliably. Sometimes if I embed a graphic into MS word and save, it’ll come out pixelated the next time I open the file. BUT I’ve found that this problem doesn’t happen with PDF graphics created via the pdf() driver, so I think that’s the best choice.