# Introduction

## Aims

You will learn how to chose between correlation and regression as well as applying, interpreting and reporting them.

## Learning Outcomes

By actively following the lecture and practical and carrying out the independent study the successful student will be able to:

• Explain the principles of correlation and of regression (MLO 1)
• Apply (appropriately), interpret and evaluate the legitimacy of, both in R (MLO 2, 3 and 4)
• Summarise and illustrate with appropriate R figures test results scientifically (MLO 3 and 4)

## Philosophy

Workshops are not a test. It is expected that you often don’t know how to start, make a lot of mistakes and need help. Do not be put off and don’t let what you can not do interfere with what you can do. You will benefit from collaborating with others and/or discussing your results.

The lectures and the workshops are closely integrated and it is expected that you are familar with the lecture content before the workshop. You need not understand every detail as the workshop should build and consolidate your understanding. You may wish to refer to the slides as you work through the workshop schedule.

## Slides

Correlation and Regression: pdf (recommended) / pptx

# Exercises

## Getting started

Start RStudio from the Start menu.

Make a new project with File | New Project and chose New directory and then New project. Be purposeful about where you create it by using the Browse button. I suggest using your 17C folder. Give the Project (directory) a name, perhaps “regress_correl”

Make a new folder ‘raw_data’ where you will later save data files.

Make a new folder ‘figures’ where you will later save your figures.

Make a new script file called analysis.R or similar to carry out the rest of the work.

You probably want to load the `tidyverse` with `library(tidyverse)`.

## Pearson’s Correlation

The data given in height.txt are the heights of eleven brother and sister pairs.

Save a copy of height.txt to your `raw_data` folder and import it.

### Exploring

What type of variables are ‘brother’ and ‘sister’? What are the implications for the test?

Do a quick plot of the data. We don’t have a causal relationship here so either varaible can go on the x-axis.

``````ggplot(height, aes(x = sister, y = brother) ) +
``````