Quick links: Module synopsis | VLE Site | List of staff | Description of the data | Report Guide


News: Updated yeast data

load(url("https://www-users.york.ac.uk/~dj757/BIO00047I/data/yeast_data.28-02-2020.Rda"))

I have fixed the essential gene column. Now protein-coding genes have values 1 or 0 (zero) and ncRNAs have NA values.


The assessement

The assessment for this module is a report describing an analysis of data (maximum 1500 words). A description of what we want to see, and how we will mark the report is here

The assessment deadline is: Thursday 16th of April 2020 at 11am


The Google Hangout

Join the Big Data Biology 2020 Hangout

Link to your Google Hangouts on a browser

This module will use a Google Hangout chatroom to share questions, plots, R code (and sometimes to announce any errors in the website that we find). Some people find the R code useful as a resource. Use of the Hangout is optional.

PS: As your posts are personal data, we comply with General Data Protection Regulation (GDPR). You can read about this, and some rules for good conduct here.


The Seven Principles of Big Data Biology

Big data sets are an important part of modern biology. But they can be complicated. To help you though this path, we have distilled (some of) our wisdom into some principles that are important to learn. Here they are:

  1. What is the biological question?
  2. Know your data.
  3. Filter out the bad stuff.
  4. Statistical tests are powerful.
  5. Be careful with statistics.
  6. If you conduct multiple tests, you need multiple test corrections.
  7. Present data honestly and clearly.

Download the details here: The Seven Principles of Big Data Biology


image:

A protein interaction network from this article


Workshop material