Summary and Setup
Learning Objectives
- Describe the importance of efficient and reproducible data QA/QC
- Identify common data errors and quality issues
- Develop a QA/QC strategy for a tabular data set
- Import data into R and QA/QC using default data types
- Implement an R script to perform data QA/QC on a tabular data set
- Document and communicate data QA/QC steps for data reporting
Prerequisite
This lesson assumes you have R and RStudio installed on your computer. R and RStudio are two separate pieces of software:
- R is a programming language and software used to run code written in R.
- RStudio is an integrated development environment (IDE) that makes using R easier. In this course we use RStudio to interact with R.
If you don’t already have R and RStudio installed, follow the instructions for your operating system below. You have to install R before you install RStudio.
FIXME: Setup instructions live in this document. Please specify the tools and the data sets the Learner needs to have installed.
Data Sets
Download the data zip file and unzip it to your Desktop
Software Setup
Details
Setup for different systems can be presented in dropdown menus via a
spoiler
tag. They will join to this discussion block, so
you can give a general overview of the software used in this lesson here
and fill out the individual operating systems (and potentially add more,
e.g. online setup) in the solutions blocks.
Use PuTTY
Use Terminal.app
Use Terminal