A server is a computer or device on a network that manages network resources. Essentially a server is a collection of linked computers that have processors (CPU’s or GPU’s), memory (RAM) and disk storage space, like hard drives.
Servers are used for bioinformatics because data sets can be very large, and can take long times and/or large amounts of RAM to process.
Servers generally run using a linux operating system, rather than the MicroSoft Windows or MacOS. This is beause linux is more stable, more secure and has methods to process and distriute large numbers of tasks to different processors.
We will use a server based in York, called York Advanced Research Computing Cluster, or YARCC for short.
To work remotely on YARCC, you need to log in.
Regardless of whether you are logging in from a PC or a Mac, you will need to use the York VPN applicatoin PulseSecure. The download and installation instructions are provided [here}(https://www.york.ac.uk/it-services/services/vpn/).
When you open PulseSecure, you should have no connections added. As shown below:
Click on the plus to add a connection. To connect to York you will need to fill in the VPN settings for the Name: and Server URL:, as below:
Then click connect, and add your username and your password.
We will use this temporary account to log into the YARCC server as well.
If you connect and the information is correct, you should have the following screen, with a green tick, and a disconnect option available.
Now that you are connected to the York VPN, we ca connect to the YARCC compute cluster. This cluster contains softare and data that we will use for the remainder of the course.
Loggin in from a PC is slightly different from logging in from a Mac.
If you are using a windows computer, you will then need to install login and configure this to connect using the Pulse secure VPN you just set up. You should already have login installed. If you open this you should have the following screen.
If you click on open, this will then bring up the following screen, you will need to fill out the host name and make sure it is connecting through SSH as shown beneath:
This will be the first time connecting to the server using your login set up, so it will give you an authentication screen. You will need to select yes, to authenticate.
Once this has connected you should have a terminal screen open which will then ask you to again fill in your login details. If you log in successfully you should see the following two windows:
If you are logging in from a mac, this will be much easier and not require the login software. You will still need to log in using the VPN.
Once this is connected, open a window of the Terminal App. Then, log in using the ssh (secuure shell) command as follows:
ssh username@login.yarcc.york.ac.uk
When you press enter this will ask for your password. If both are entered succesfully this should have you logged in!
YARCC runs on a linux operating system. To do the bioinformtics processing, you will need to use linux a little.
If you are familar with linux and merely want a reminder, get a linux cheat sheet here. If not, read on.
In the same way that windows PC’s have folders where you store your files, linux systems have directories. They are essentailly the same thing: a place to organise files and programs.
Linux often uses a command line (text-based) interface rather than a graphical interface like windows, so directories are referred to with text.
When you first log into a linux server you are directed to your home directory. On YARCC your home directory will be:
/home/userfs/t/username
Note that username is replaced with your username (like tmpq0001, etc). So everyone has their own unique home directory.
In linux you are always ‘working from’ a specific dsirectory. You can think of this as where you are in the file system. You are always somewhere!
To find out what your working directory at any point, use this command:
pwd
Directories can contain files and other directories. Just like houses contain rooms, rooms contain items and boxes. And tins and jars within boxes etc.
Directories are nested.
As you work in linux it is important to keep track of where you are in relation to your directories.. This image below shows how directories might be organised. The something directory is’nt too useful, but the data directory tells you what is in there. (You’ll find out about bam files later.)
Directories and files have a path, which is the list of subdirectories that you need to specify to gte to that location. The path of your home directory on the YARCC linux system is something like: /home/userfs/t/tmpq1234.
To change which directory you are ‘in’ use this command:
cd data
This will take you to thr data directory (if it exists).
It is important to understand the concepts of directories and paths.
Discuss with your group or a tutor before you move on.
This set of linux command will get you started. We suggest that you type each of these commands into your linux system in order. Do not copy and paste the text from this web page.
First, choose a bulding name, a room name, and a word for a box or container in any language. Note these down somwhere.
Then log into the server (if you haven’t already).
Then check where you are, with:
pwd
mkdir building
cd building
mkdir room
cd room
pwd
touch box.1
touch box.2
touch smallbox.1
ls
Most commands in linux have optional ‘flags’, that are added with extra letters after the command. Flags allow you to run the command with different variations. Some of these flags can be very useful.
To find out about a linux command, and its flags, you can call us the ‘manual’ for that command with:
man command
(replacing command with something like ls, touch, cd etc)
For example, to list files in your working directory with a long format (-l), sorted by time (-t) in reverse (-r). This will show the most recent files at the end.
ls -lrt
Give this a try.
cd ../
This is how cd ../ changes your working directory.
NOTEFiles can have any name in linux.File names and commands are case sensitive. So the command to change directory cd will not work if you type Cd. Be careful with dots .. and spaces in linux - they matter!
Make a copy of a file called myfile. The new copy is called myfile2.
cp myfile myfile2
If your working directory is kitchroom (~/) you can copy a file called myfile from your working directory to the fridge like so:
cp myfile fridge/
Remove a file called this.
rm this
Show (or print out to the screen) all of a file called this.
cat this
Warning!: some of the file we will work with are very large. Using cat can take a long time. To escape from a command that is running type Ctrl+Z.
Show the first ten lines of a file called this.
head this
Show the last ten lines of a file called this.
tail this
One of the most powerful parts of file handing in linux is it’s use of wild cards. These allow you to specify groups of files to move or copy, and in many other situations.
There are three main wildcards in Linux:
An asterisk (*) – matches one or more occurrences of any character, including no character.
Question mark (?) – represents or matches a single occurrence of any character.
Bracketed characters ([ ]) – matches any occurrence of character enclosed in the square brackets.
For example, to list only the files in your room directory that start with box. Do this:
Go back you your home directory:
cd ~/
List files that start with box:
cd ~/building/room/box.*
To list files that end with 1:
ls ~/building/room/*.1
To list files that start with b end with and single character:
ls ~/building/room/b*.?
To list files that start with anything, end with anything but contain the word box:
ls ~/building/room/*box*
Another powerful part of linux systems is the ability to ‘pipe’ or redirect the output of one program directly into another program (using the | symbol), or into a file (using the > symbol). Pipes work like an assembly line.
ls -latr ~/building/room/*box* | sort
Note that you finish the ls command, add a pipe symbol ( | ) then use the sort command.
ls -latr ~/building/room/*box* > list-output
Note that you finish the ls command, add a redirect symbol ( > ) then specify a file name.
DISCUSS WITH YOUR GROUP
Quizz each other about what these commands mean:
cp this here/
rm *.vcf
mkdir something
rm ~/buidling/*/*.?
cp ~/buidling/room/*ox* ~/buidling/
And one command you should not do!
rm *.*
Why not?
This should be all you need for linux at the moment.
Examples of all the commands you will need (and more) are in this cheat sheet.
The File Commands and Shortcuts will be the most useful for you now.