Link back to main course page

Part 1: Working with a server remotely.

Introduction

A server is a computer or device on a network that manages network resources. Essentially a server is a collection of linked computers that have processors (CPU’s or GPU’s), memory (RAM) and disk storage space, like hard drives.

Servers are used for bioinformatics because data sets can be very large, and can take long times and/or large amounts of RAM to process.

Servers generally run using a linux operating system, rather than the MicroSoft Windows or MacOS. This is beause linux is more stable, more secure and has methods to process and distriute large numbers of tasks to different processors.

We will use a server based in York, called York Advanced Research Computing Cluster, or YARCC for short.


Using the York VPN

To work remotely on YARCC, you need to log in.

Regardless of whether you are logging in from a PC or a Mac, you will need to use the York VPN applicatoin PulseSecure. The download and installation instructions are provided [here}(https://www.york.ac.uk/it-services/services/vpn/).

When you open PulseSecure, you should have no connections added. As shown below:


Click on the plus to add a connection. To connect to York you will need to fill in the VPN settings for the Name: and Server URL:, as below:


Then click connect, and add your username and your password.

We will use this temporary account to log into the YARCC server as well.


If you connect and the information is correct, you should have the following screen, with a green tick, and a disconnect option available.


If this worked, you are now connected to York via the Virtual Private Network (VPN).
Check that you are still connected each morning - you may need to re-connect..

Logging into the server

Now that you are connected to the York VPN, we ca connect to the YARCC compute cluster. This cluster contains softare and data that we will use for the remainder of the course.

Loggin in from a PC is slightly different from logging in from a Mac.


Logging into the server from a PC

If you are using a windows computer, you will then need to install login and configure this to connect using the Pulse secure VPN you just set up. You should already have login installed. If you open this you should have the following screen.


If you click on open, this will then bring up the following screen, you will need to fill out the host name and make sure it is connecting through SSH as shown beneath:


This will be the first time connecting to the server using your login set up, so it will give you an authentication screen. You will need to select yes, to authenticate.


Once this has connected you should have a terminal screen open which will then ask you to again fill in your login details. If you log in successfully you should see the following two windows:



Logging into the server from a Mac

If you are logging in from a mac, this will be much easier and not require the login software. You will still need to log in using the VPN.

Once this is connected, open a window of the Terminal App. Then, log in using the ssh (secuure shell) command as follows:

ssh username@login.yarcc.york.ac.uk 

When you press enter this will ask for your password. If both are entered succesfully this should have you logged in!


Part 2: Linux

YARCC runs on a linux operating system. To do the bioinformtics processing, you will need to use linux a little.

If you are familar with linux and merely want a reminder, get a linux cheat sheet here. If not, read on.


Directories and your working directory

  • In the same way that windows PC’s have folders where you store your files, linux systems have directories. They are essentailly the same thing: a place to organise files and programs.

  • Linux often uses a command line (text-based) interface rather than a graphical interface like windows, so directories are referred to with text.

When you first log into a linux server you are directed to your home directory. On YARCC your home directory will be:

/home/userfs/t/username

Note that username is replaced with your username (like tmpq0001, etc). So everyone has their own unique home directory.


In linux you are always ‘working from’ a specific dsirectory. You can think of this as where you are in the file system. You are always somewhere!

To find out what your working directory at any point, use this command:

pwd

Directories can contain files and other directories. Just like houses contain rooms, rooms contain items and boxes. And tins and jars within boxes etc.

Directories are nested.


Remember where you are

As you work in linux it is important to keep track of where you are in relation to your directories.. This image below shows how directories might be organised. The something directory is’nt too useful, but the data directory tells you what is in there. (You’ll find out about bam files later.)

Directories and files have a path, which is the list of subdirectories that you need to specify to gte to that location. The path of your home directory on the YARCC linux system is something like: /home/userfs/t/tmpq1234.


To change which directory you are ‘in’ use this command:

cd data

This will take you to thr data directory (if it exists).


It is important to understand the concepts of directories and paths.

Discuss with your group or a tutor before you move on.


Trying out linux commands

This set of linux command will get you started. We suggest that you type each of these commands into your linux system in order. Do not copy and paste the text from this web page.

  • First, choose a bulding name, a room name, and a word for a box or container in any language. Note these down somwhere.

  • Then log into the server (if you haven’t already).

  • Then check where you are, with:

pwd


  • Now make a new directory called bulding name (using your own word):
mkdir building


  • Now change your location to the new building directory, using the cd (change directory) command:
cd building


  • Now make a another new directory called room name (using your own word). Then move into this new directoty using cd again.
mkdir room
cd room


  • Then check where you are again with:
pwd


  • Then make three files within the room directory, called box.1, box.2 and smallbox.1 (using your own word for box), using th touch command (which makes an empty file).
touch box.1
touch box.2
touch smallbox.1


  • Now list what files are in your current directory using the ls (list) command:
ls



Most commands in linux have optional ‘flags’, that are added with extra letters after the command. Flags allow you to run the command with different variations. Some of these flags can be very useful.

  • To find out about a linux command, and its flags, you can call us the ‘manual’ for that command with:

    man command

    (replacing command with something like ls, touch, cd etc)

  • For example, to list files in your working directory with a long format (-l), sorted by time (-t) in reverse (-r). This will show the most recent files at the end.

    ls -lrt


    Give this a try.


  • Now you have created your nested directories with ~/building/room/ and box files.To move ‘up’ one level, use this command:
cd ../



This is how cd ../ changes your working directory.


NOTE

Files can have any name in linux.
File names and commands are case sensitive.
So the command to change directory cd will not work if you type Cd.
Be careful with dots .. and spaces in linux - they matter!


More linux commands.

Make a copy of a file called myfile. The new copy is called myfile2.

cp myfile myfile2

If your working directory is kitchroom (~/) you can copy a file called myfile from your working directory to the fridge like so:

cp myfile fridge/

Remove a file called this.

rm this

Show (or print out to the screen) all of a file called this.

cat this

Warning!: some of the file we will work with are very large. Using cat can take a long time. To escape from a command that is running type Ctrl+Z.

Show the first ten lines of a file called this.

head this

Show the last ten lines of a file called this.

tail this

Wild cards

One of the most powerful parts of file handing in linux is it’s use of wild cards. These allow you to specify groups of files to move or copy, and in many other situations.

There are three main wildcards in Linux:

  • An asterisk (*) – matches one or more occurrences of any character, including no character.

  • Question mark (?) – represents or matches a single occurrence of any character.

  • Bracketed characters ([ ]) – matches any occurrence of character enclosed in the square brackets.

For example, to list only the files in your room directory that start with box. Do this:

Go back you your home directory:

cd ~/

List files that start with box:

cd ~/building/room/box.*

To list files that end with 1:

ls ~/building/room/*.1

To list files that start with b end with and single character:

ls ~/building/room/b*.?

To list files that start with anything, end with anything but contain the word box:

ls ~/building/room/*box*

Pipes and input/output redirection

Another powerful part of linux systems is the ability to ‘pipe’ or redirect the output of one program directly into another program (using the | symbol), or into a file (using the > symbol). Pipes work like an assembly line.

  • Here is how you pipe the output of a list command into the sort command:
ls -latr ~/building/room/*box* | sort

Note that you finish the ls command, add a pipe symbol ( | ) then use the sort command.


  • Here is how you redirect the output of a list command into a file called list-output. You can call the file anything you like - ot will be created by the pipe.
ls -latr ~/building/room/*box* > list-output

Note that you finish the ls command, add a redirect symbol ( > ) then specify a file name.


DISCUSS WITH YOUR GROUP

Quizz each other about what these commands mean:

cp this here/
rm *.vcf
mkdir something
rm ~/buidling/*/*.?
cp  ~/buidling/room/*ox* ~/buidling/

And one command you should not do!

rm *.*

Why not?


The End

This should be all you need for linux at the moment.

Examples of all the commands you will need (and more) are in this cheat sheet.

The File Commands and Shortcuts will be the most useful for you now.


Link back to main course page