Introduction

Aims

In this part of the workshop, we will take you through the steps you need to get Associative Transcriptomics (AT) working.

Learning Outcomes

By doing this practical, the successful participant will be able to:

  • connect to the Rstudio server
  • work with the server terminal
  • prepare trait file
  • perform Associative Transcriptomics (AT)
  • download AT results

Getting started

This practical lesson works on a server that is dedicated for PORI project. Pre-configured R scripts are provided to you to run AT at the workshop. We use this server to simplify setup requirements and speed up the computational processes, and to get you familiar with using the cloud (a common requirement for working with big data). Please keep your credentials safe and DO NOT share them with others. If anybody outside the workshop would like to access, please contact Zhesi (zhesi.he@york.ac.uk) for a new account.

Currently, the AT jobs can be run by with unlimited instances by all users. There is no job scheduling system. It allows users to just to run their AT jobs without extra learning in scheduling scripts. However, this puts the server into a risk of over-running on the current computing resource. To share the computational resources fairly, running GAPIT (SNP association) analysis should be booked with the instructor (Zhesi) with a scheduled time slots. Please don’t run it on your own without notice. We might have to stop your job to allow the server running smoothly.

Connect to Server

RStudio Server

Open a web browser and enter the IP address of your instance, followed by :8787. For this workshop, the IP address is 145.239.69.63 and your URL should be

http://145.239.69.63:8787

Tip: Make sure there are no spaces in your URL or your web browser may interpret it as a search query.

You should now be looking at a page that will allow you to login to the RStudio server:

Enter your user credentials and click Sign In. The credentials for this workshop were provided by email. If you have trouble logging in, please contact the instructor.

If the log in is all correct, you should now see the RStudio Server interface:

Interface and layout

Here are the major windows (or panes) of the RStudio (server) environment:

  • Source: This pane is where you can view/write files.
  • Console/Terminal: This is where you see the execution of commands. The “Terminal” tab give you access to the BASH terminal of the server which is mostly useful for this workshop.
  • Environment/History: Here, RStudio will show you what datasets and objects (variables) you have created and which are defined in memory. You can ignore this window as we are not using it in this workshop.
  • Files/Plots/Packages/Help: This multipurpose pane will show you the contents of directories on your computer. The most useful is the “Files” tab where you can navigate directory, upload/download files, details will be shown in later sections.

All of the panes in RStudio server have configuration options. For example, you can minimize/maximize a pane, or by moving your mouse in the space between panes you can resize as needed. Please note again, the most important panes for this workshop are Terminal and Files.

Work with Terminal

All of the AT scripts are run in the Terminal window, at the bottom left of the RStudio server interface. Click on “Terminal” tab in the pane to access the terminal of the server.

What is Terminal

Terminals, also known as command lines or consoles, allow us to accomplish and automate tasks on a computer or remote server without the use of a graphical user interface. Using a terminal allows us to send simple text commands to our computer to do things like navigate through a directory or copy a file, and perform many more complex automations and programs.

Useful commands

You can work with Terminal to navigating directories. It is exactly how you work with file and directory commands in a Linux/Unix System.

  • The tilde (~) symbol stands for your home directory. If your username is zhesi, then the tilde (~) stands for /home/zhesi

  • pwd: The pwd command will allow you to know in which directory you’re located (pwd stands for “print working directory”).

  • ls: The ls command will show you (‘list’) the files in your current directory. Used with certain options, you can see sizes of files, when files were made, and permissions of files. Example: “ls ~” will show you the files that are in your home directory.

  • cd: The cd command will allow you to change directories. When you open a terminal you will be in your home directory. To move around the file system you will use cd. Examples:

    • To navigate into the root directory, use “cd /

    • To navigate to your home directory, use “cd” or “cd ~

    • To navigate up one directory level, use “cd ..

    • To navigate to the previous directory (or back), use “cd -

    • To navigate through multiple levels of directory at once, specify the full directory path that you want to go to. For example, use, “cd /groups/workshop” to go directly to the workshop subdirectory of groups. As another example, “cd ~/AT” will move you to the AT subdirectory inside your home directory. If your username is zhesi the ~/AT is exactly the same as /home/zhesi/AT.

  • cp: The cp command will make a copy of a file for you. Example: “cp file foo" will make an exact copy of”file" and name it “foo”, but the file “file” will still be there. If you are copying a directory, you must use “cp -r directory foo” (copy recursively). (To understand what “recursively” means, think of it this way: to copy the directory and all its files and subdirectories and all their files and subdirectories of the subdirectories and all their files, and on and on, “recursively”)

  • mv: The mv command will move a file to a different location or will rename a file. Examples are as follows: “mv file foo” will rename the file “file” to “foo”. “mv foo ~/AT” will move the file “foo” to your AT directory, but it will not rename it. You must specify a new file name to rename a file.

    • To save on typing, you can substitute ‘~’ in place of the home directory.
  • rm: Use this command to remove or delete a file in your directory. To delete a directory and all of its contents recursively, use rm -r.

  • mkdir: The mkdir command will allow you to create directories. Example: “mkdir AT_GEM” will create a directory called “AT_GEM”.

Preparing trait data

File Format

Before running the AT script, we need to prepare the input data.

As described by Andrea, the input trait data is a Tab-delimited text file, which contains a table of x rows and y columns, where rows are accession names (Taxa), columns are traits.

If you have your own data, please use the “Template_for_PORI_trait_data.xlsx (download)” file to match the accession names, the leave only the Taxa column and trait columns and save as "Tab-delimited text (.txt)" file in excel. Or you can download an example trait file.

Note: Please leave no empty cells in the Taxa column. For trait columns, missing data can be represented by NA or Empty in the table.

Note: Please use no space or special characters in the column name or file name like the following:

(  )  *  ?  \ | "" ' ~ ` $ # & [ ] { } ; < > / ! % ^ @ 

because it’s not straightforward for Terminal to handle them. Underscore “_” is a very good substitution of spaces. Dot “.” is also good.

Upload trait data

Here you can download an example trait text file. When opening the trait text file using excel on your laptop, it looks like the following:

You can upload your own trait file by using the File pane. After you log in, by default, the File pane shows your home directory. You can navigate by clicking on “New Folder” to create new folders and “Upload” to upload your trait file. The “More” button also gives you options to copy and move them like “mv” and “cp” in Terminal.

Running AT

Where to run

In Terminal window, you need to navigate to where you want to store the results of your AT.

For example, you can create a nested directory by:

mkdir -p ~/AT/Seed_SW/GEM

Then go into that directory by:

cd ~/AT/Seed_SW/GEM

Command to run

Finally, you need to run the AT script by the following format:

Rscript path_to_AT_script path_to_trait_file > optional_log_file &

Tip: The > sign redirects standard output to a log file. The & sign pushes the job to the background. This means it continues to run the command but returns you to your shell to allows you to continue doing parallel commands.

If you prefer to see the full progress, you can omit those and run without a log file by just input:

Rscript path_to_AT_script path_to_trait_file

Note: Please keep SPACE between the each part of the command. The path can be full path (e.g. “~/AT/GEM/mytrait.txt”) or relevant path (e.g. “../mytrait.txt”).

Exercise

Have a go running regress AT with the example trait file or your own trait file.

  • STEP 1. Download the example trait file or create your own trait file if you have new data

  • STEP 2. Upload the trait file onto your home folder on the server, feel free to organise it with folders etc.

  • STEP 3. Run AT Regress command with it. The path to the script is /groups/workshop/scripts/Regress_PORI_server.R

If you have put your job running in the background, you should be able to check if your AT is still running by the command jobs:

jobs

Or you can view your log file by the command

less xxxx.log

Use SHIFT+G to go to end of file and ESC to exit viewing.

Or you will see a standard output like the following if you have not assign an output log file.

Finishing AT

Download result files

In Files window, you can view and download the AT result files.

For example, if you want to navigate to the result directory “~/CGAT_SW/GEM”, you can click CGAT_SW folder in your home directory and then click GEM folder.

Or, you can use the “…” icon on the top right corner, then put in the full directory path (e.g. “~/AT/CGAT_SW/GEM”).

Then, you can download files by (multiply) selecting the files, then click on Export and Download:

Download raw files

Raw genotype and expression data along with some other useful files for interpreting AT results are available to download. In Files window, navigate to “/groups/workshop/Download” with the “…” icon.