R is a statistical programming languages and one of the most popular languages for data analysis, statistics and plotting in academia and industry. Learning a new language can be daunting, particularly if you have no experience of scripting and are used to Graphical User Interfaces (GUIs) where you point and click to perform your statistical analysis.
Fear not though, there are lot of resources and very friendly, enthusiastic and helpful R users out there who can help you on your journey learning R. This post details some of them, and I’d welcome additions.
Most of these resources are links websites that are free and openly available. Where books are linked they are very often freely available on-line, but there will also often be the possibility of purchasing a hard copy, which you may want to consider doing if you find the resource useful to help support the authors.
R has a number of bodies, organisations and companies associated with it.
- The R Foundation a not for profit organisation working in the public interest.
- The R Consortium a group organized under an open source governance and foundation model to support the worldwide community of users.
- rOpenSci open data, software and reproducibility.
Software
R is software and will need installing on your computer. Because it is Free Open Source Softrware (FOSS) you can download and install it on your computer for free. You will have to install it to use it and the isntr
Git Version Control
It is good practice to version control the code you write, it provides an electronic paper trail of how your code has evolved over time and allows you to keep track not just of the code itself but why it has changed or been written.
These days the most popular version control system is Git and projects are often hosted/backed up on popular “forges” such as GitHub or GitLab. Sign up with an academic email address (@<institute>.ac.uk
or @<institute>.edu
) and you will have a few extra benefits.
Learning Git is a whole, vast, topic in and of itself, but to get started with R and Git see the recommendation below. If you are a student or researcher at The University of Sheffield you may want to consider taking the Research Software Engineering (RSE) Teams popular Git, GitHub and GitKraken : Zero to Hero course which runs regularly throughout the year. Sign up to their mailing list and you’ll be notified of when the course runs. Alternatively email them to find out when the next course is scheduled to run.
IDEs
Integrated Development Environments (IDE) are software that help you write code faster and more consistently courtesy of various features such as syntax highlighting, automatic bracket and quote pairing, automatic indentation and a suite of functions for performing common tasks such as version controlling files or rendering documents.
The most popular IDE for R is RStudio Desktop which has excellent support for R, RMarkdown/Quarto and basic Git support. If you are new to version control with Git you may want to consider using GitKraken which provides an intuitive point and click interface for version controlling your files and working with GitHub/GitLab.
My personal preference is to use Emacs and the package Emacs Speaks Statistics (ESS). This is a robust solution (ESS) has been around for decades and you get the convenience of using Emacs and its many packages such as the amazing Magit for carrying out all Git related tasks. It has a steeper learning curve than RStudio but in my opinion is completely worth the effort.
Books
If you’re using R the chances are you want to perform some sort of Statistical Analysis on your data. This often involves cleaning data that has been received, writing code to summarise, tabulate and plot your data, often in a literate manner (which means reports are open and can be reproduced easily). If you read nothing else to get you started using R for this work then you should read R for Data Science by Hadley Wickham and Garrett Grolemund. This is an excellent book that is available for free online.
- teacheR - Teach Yourself or Others R by Adam Rawles
- Cookbook for R by Winston Chang is a useful reference for many common tasks.
Quarto/RMarkdown
R has its own Markdown language for writing literate documents and a comprehensive resources covering all aspects is R Markdown: The Definitive Guide by Yihui Xie, J.J. Allaire and Garret Grolemund. By writing your work in R Markdown you are performing literate programming and it means your report can updated automatically if the underlying data changes. Output to HTML, PDF, LibreOffice, Microsoft Office and many other formats. The underlying source can be version controlled using Git so that it is documented, backed up (e.g. on GitHub or GitLab) and it is easy to collaborate with colleagues.
More recently Posit (nee RStudio) have developed Quarto the next iteration of RMarkdown. It supports more document types (e.g. blogs and RevealJS slides) and has excellent documentation and a growing number of extensions. If you are just starting out I would recommend using Quarto over RMarkdown.
Git Book
It is good practice to version control your code and literate documents as you develop them. This can be achieved using the version control system Git. Get yourself an account on GitHub and/or GitLab and settle down to read Jenny Bryans excellent Happy Git and GitHub for the useR.
Tidyverse
You will hear a lot about the Tidyverse which is an opinionated collection of R packages designed for data science. They are well worth learning as they make writing code considerably easier than with the base R packages. You won’t need all of the packages immediately but key ones to learn are
- dplyr (or
- tidyr for tidying your data.
- forcats for working with categorical variables.
- lubridate for working with date variables.
- stringr for working with string variables.
If you’ve large datasets the dtplyr which uses the data.table package in the background but with dplyr
code. data.table
is considerably faster than dplyr
for many operations. This is particularly noticeable when you have large datasets.
Statistics
There is a wealth of resources out there for learning and using R for different topics. The following is that which I’m aware of, if there is an omission please open an issue on my blog
- Regression Modeling Strategies by Frank E. Harrell, Jr.
- An Introduction to Statistical Learning with Applications in R/Python an excellent book on modern “machine learning” techniques.
- Tidy Modeling with R by Max Kuhn and Julia Silge
- Applied Predictive Modeling by Max Kuhn and Kjell Johnson (site to accompany physical book)
- Hands-On Machine Learning with R by Bradley Boehmke and Brandon Greenwell
- Interpretable Machine Learning by Christoph Molnar
- The Hitchikers Guide to Responsible Machine Learning by Przemsylaw Biecek, Anna Kozak and Aleksander Zawada
- Introduction to Data Science and Advanced Data Science by Rafael A. Irizarry
- Telling Stories with Data With Applications in R by Rohan Alexander
- Introduction to Modern Statistics (1st Ed)
- The Epidemiologist R Handbook R for applied epidemiology and public health
- R for Health Data Science by Ewen Harrison and Riinu Pius
- Forecasting: Principles and Practice (3rd ed)
Bayesian Statistics
There are some excellent resources for learning Bayesian Analyses with R. Perhaps the most comprehensive and in-depth is Statistical Rethinking by Richard McElreath. He runs regular free courses teaching the material in the book (Statistical Rethinking 2023) and the book content has been translated to other R frameworks and Python. Another very good book is Bayes Rules! An Introduction to Applied Bayesian Modeling. These are both covered in the Bayesian Statistics - Syllabus course by Andrew Heiss.
- Statistical Rethinking by Richard McElreath
- Bayes Rules! An Introduction to Applied Bayesian Modeling by Alicia A. Johnson, Miles Q. Ott and Mine Dogucu
- Gaussian Processes for Machine Learning by Carl Edward Rasmussen and Christopher K. I. William
Plotting
R has excellent support for producing graphs, figures and data visualisations. There is the base graphics that have been around since the beginning, but more recently the ggplot2 framework introduced by Hadley Wickham which implements Leland Wilkinson’s Grammar of Graphics has been very popular.
Advanced Topics
There is a lot to R than just Statistical analysis and one day you may want to investigate these in greater detail. The links below are to more advanced topics such as writing and maintaining packages or specific tasks such as text mining.
- R Packages by Hadley Wickham and Jenny Bryan
- Advanced R by Hadley Wickham
- Advanced Statistical Computing by Roger D. Peng
- Mastering Shiny by Hadley Wickham
- Outstanding User Interfaces with Shiny by Kenton Russel
- Reproducible Analytical Pipelines by Bruno Rodrigues
- Text Mining with R by Julia Silge and David Robinson
CRAN
The Comprehensive R Archive Network (CRAN) is the primary place to look for R packages. It also contains a number of subject specific Task Views which are pages that summarise the packages and resources associated with a particular topic. There are also links to the official manuals, FAQs and user contributed documentation.
The R Journal
The R Journal is the peer-reviewed, open-access scientific journal published by the R Foundation. It includes articles on packages, reviews and proposals, comparisons and benchmarking, applications of existing techniques and special issue articles to accompany conferences or particular topics.
Cheatsheats
Cheatsheets come in handy as a reference to packages and commands. A central repository of cheatsheets is maintained by Postit.
Community
The R community is incredibly supportive, welcoming and helpful. There are over 600 User Groups around the world where R users meet up and share their experience and knowledge and support each other. Sheffield has its own SheffieldR User Group.
R Ladies
R-Ladies is a worldwide organisation whose mission is to promote gender diversity in the R Community. Groups around the world have their own meetups and activities.
R4DS
There is also the R4DS Online Learning Commuity which helps you work through the R for Data Science book. They have an active Slack channel for coordinating the courses and run Tidy Tuesday, a weekly podcast and community activity which is a great way of learning now tasks in R.
NHS R Community
The NHS R Community is focused on applications of R in the NHS Research community. They have blogs, a Slack channel and conferences.
Blogs
The R Bloggers site aggregates blogs from people who write about R and is a brilliant resource. A few highlights are noted as well but R Bloggers is probably the best resource. If you want to subscribe to these most have RSS feeds
- Tidyverse blog
- Notes from a data witch a blog by Danielle Navarro
Mastodon
Find posts and resources on Mastodon by searching for the #rstats
hashtag. Here are some people I follow and find useful information from.
Bots
- @rpodcast@podcastindex.social
- @rweekly@fosstodon.org toots weekly updates from the R Community
- @CRANberriesFeed@mas.to a bot that toots about packages released or updated on CRAN.
People
- @e3mma@mastodon.social Emma is a bioinformatician and lecturer at York University who teaches R and reproducibility.
- @f2harrel@mastodon.online Frank is the author of the Regression Modelling Strategies book.
- @Cmastication@mastodon.social JD Long an avid R user.
- @hadleywickham@fosstodon.org doyen of R packages and the Tidyverse.
- @HeathrTurnr Heather Turner is a Research Software Engineering Fewllow in Statistics at University of Warrick and an R Foundation board member.
- @eddelbuettel@mastodon.social ESS and R developer.
- @sje@fosstodon.org ESS developer.
- @robjhyndman@aus.social co-author of Forecasting Principles and Practice.
- @topepo@fosstodon.org author of Applied Predictive Modelling and co-author of other R books; R package developer (
tidymodels
). - @annakrystalli@fosstodon.org Research Software Engineer, [ReproHack founder] and editor at rOpenSci
Organisations
- @ropensci@fosstodon.org
- @Posit@fosstodon.org the company that develops RStudio, Quarto and Shiny
Podcasts
Miscellaneous
Generative Art
Many people enjoy playing with R and ggplot2 to create Generative Art. One example is aRtsy, an R package that implements algorithms for making generative art in a straightforward and standardized manner using ‘ggplot2’ (see archive of posts on @aRtsy_package@mastodon). Another package is aRt by Nicola Rennie.
- aRtsy
- aRt by Nicola Rennie
- Generative art resources in R
Reuse
Citation
@online{shephard2023,
author = {Shephard, Neil},
title = {R {Resources}},
date = {2023-10-06},
url = {https://blog.nshephard.dev/posts/r-resources/},
langid = {en}
}