Permalink
Cannot retrieve contributors at this time
Name already in use
A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
ClusterIntro/ClusterIntro.md
Go to fileThis commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
225 lines (167 sloc)
12 KB
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
--- | |
# Cluster Introduction (psycl, pirol) | |
This markdown document contains an introduction to the MPCDF clusters available for MPI-P users. | |
[1. Disclaimer](#Disclaimer) | |
[2. Overview of the cluster structure](#Overview) | |
[3. How to get your mpcdf account](#How) | |
[4. First login to MPCDF](#First) | |
[5. Create an alias for pirol/psycl login](#Create) | |
[6. Transfer data from MPI-P cluster to any of the MPCDF clusters (and use FileZilla for easy up- and downloads)](#Transfer) | |
[7. Working with the cluster](#Working) | |
[8. Copy data from/to cluster](#Copy) | |
[9. Useful resources](#Useful) | |
--- | |
## 1. Disclaimer | |
1. Most of the code is for Mac (Unix-system) users and may need some adjustments for windows users. | |
2. The document provides suggestions of how things can be dealt with when using the cluster. But of course there are other ways to use/interact with it. | |
## 2. Overview of the cluster structure | |
<img src="https://github.molgen.mpg.de/mira/ClusterIntro/blob/main/Clusterstructure.png" alt="Description" width="500" height="350"> | |
Psycl will be discontinued in december 2024. Then all users have to use pirol. The nexus storage is available from both sides, so that it is very advisable to store your data on nexus until the folder structure on pirol is ready. | |
## 3. How to get your mpcdf account | |
You can apply for the access to any of the mpcdf clusters following [this link](https://selfservice.mpcdf.mpg.de/index.php?r=registration). Aplly for an account access there and additionally you should e-mail Benno Pütz to get assigned to one or more groups. For instance, you may wish to have access to `/g/mpsnmr`(neuroimaging core unit). Additional permissions can be granted later, too. You don't have to know all potentially relevant folder structures, when applying for an account. | |
Once your request has been approved you will receive two e-mails, one with your user name and one with a password. You have to change the password and add an 2-factor authentication system (e.g. with google authenticator). For this follow the instructions in the e-mails. | |
## 4. First login to MPCDF | |
1. open a terminal on your local machine | |
2. type `ssh -XA yourUserName@gate1.mpcdf.mpg.de`; don't forget to replace your 'yourUserName' with your actual user name | |
3. you will be asked for your password | |
4. you will be asked for your OTP | |
5. once both is provided you will arrive at gate 1. From here you may either connect to psycl or pirol. Enter the following: | |
- `ssh -XA yourUserName@psycl01.bc.rzg.mpg.de` OR | |
- `ssh -XA yourUserName@pirol01.hpccloud.mpcdf.mpg.de` | |
- don't forget to replace your user name | |
6. upon your first login your home directoy will be created automatically (/u/yourUserName) | |
7. a much faster and easier way to connect with the cluster can be achieved by creating an alias for the login. The next section deals with this. | |
## 5. Create an alias for pirol/psycl login (and use FileZilla for easy up- and downloads) | |
On your local machine in your ssh-folder you need to modify the `.config` file | |
1. Show hidden files by typing: `find /path/to/folder -name ".*”` (Unix) OR `dir /A:H /B /S "C:\path\to\folder”` (Windows --> ChatGPT) | |
2. Open the .config file (e.g. by typing `nano config`) and add the lines below for psycl: | |
```{shell} | |
# This is ~/.ssh/config | |
Host * | |
ServerAliveInterval 10 | |
ServerAliveCountMax 5 | |
Host psycl01.bc.rzg.mpg.de | |
User yourUserName | |
ProxyCommand ssh -W %h:%p gate1.mpcdf.mpg.de 2>/dev/null | |
GSSAPIAuthentication yes | |
GSSAPIDelegateCredentials yes | |
ForwardAgent yes | |
ForwardX11 yes | |
Host gate1.mpcdf.mpg.de | |
User yourUserName | |
GSSAPIAuthentication yes | |
GSSAPIDelegateCredentials yes | |
ControlMaster auto | |
ControlPath ~/.ssh/control:%h:%p:%r | |
ForwardAgent yes | |
ForwardX11 yes | |
``` | |
- or for pirol: | |
```{shell} | |
# This is ~/.ssh/config | |
Host * | |
ServerAliveInterval 10 | |
ServerAliveCountMax 5 | |
Host pirol01.hpccloud.mpcdf.mpg.de | |
User yourUserName | |
ProxyCommand ssh -W %h:%p gate1.mpcdf.mpg.de 2>/dev/null | |
GSSAPIAuthentication yes | |
GSSAPIDelegateCredentials yes | |
ForwardAgent yes | |
ForwardX11 yes | |
Host gate1.mpcdf.mpg.de | |
User YourUserName | |
GSSAPIAuthentication yes | |
GSSAPIDelegateCredentials yes | |
ControlMaster auto | |
ControlPath ~/.ssh/control:%h:%p:%r` | |
``` | |
- don't forget to replace `yourUserName` with your actual user name | |
3. If you wish to add both (pirol and psycl) to your list you only need to copy the full block listed above once and for the second host (e.g. pirol) you simply add the middle block (starting with `Host pirol...`) below that. | |
4. save all changes and close your `.config` file | |
5. Next you have to modify your `/.bashrc` file (or other bash file depending on the system you are using) as follows: | |
```{shell} | |
alias psycl='ssh psycl01.bc.rzg.mpg.de' | |
alias pirol='ssh pirol01.hpccloud.mpcdf.mpg.de' | |
``` | |
6. you can use any alias you like as long as it is unique. Save and close the file. | |
7. for your next login you simply open a terminal and type the alias (e.g. `psycl` if you want to connect with psycl). You only need the password and OTP once | |
8. Once your set your alias you can also use 'FileZilla' and moste likely (but didn't try out myself) 'putty' for easy uploads in a drag and drop manner. For this you first need the corresponding software e.g. FileZilla installed on your local machine. Then open an additional terminal and type 'ssh erhartm@gate1.mpcdf.mpg.de -L 2002:pirol01.hpccloud.mpcdf.mpg.de:22 -N' or 'ssh erhartm@gate1.mpcdf.mpg.de -L 2002:psycl01.bc.rzg.mpg.de:22 -N' depending on the cluster you wish to connect with. If you are not already logged in in another terminal, you will be asked for your password and OTP. After successfully providing both you have created a tunnel and can switch to FileZilla. | |
9. Add a new host with the credentials shown below: | |
<img src="https://github.molgen.mpg.de/mira/ClusterIntro/blob/main/Bildschirmfoto%202024-10-16%20um%2015.00.40.png" alt="Description"> | |
10. Now you can use 'FileZilla' as usually. Navigate, view, up- or download files with the visual interface. | |
## 6. Transfer data from MPI-P cluster to any of the MPCDF clusters | |
1. on MPI-P cluster you first have to create .tar files of the data you wish to transfer. You can store approx. 500 GB into each .tar file. Otherwise creating the files takes too long and is prone to errors. Do the following: | |
- open a new screen on the cluster by typing `screen -S meaningfulScreenName` | |
- start the job with `srun` e.g. `srun tar -cf NameOfTarFileToBeCreated.tar NameOfFolder1 NameOfFolder2 NameOfFolder3 NameOfFolder*` needs to be replaced with the actual foldernames you want to create a `.tar` file for. You can also tar only one folder or more than 3. You may want to use a loop if you wish to store many different folders in one `.tar` file. | |
- detach from screen and wait until the folders have been packed | |
2. If the `.tar` file is ready check the file with: `tar -tf nameOfTarFile.tar > /dev/null; echo $?` | |
- the result should be `0`if everything has been tared properly | |
3. run a `md5sum`command for your `.tar`file and save the resulting code somewhere (e.g. `md5sum nameOfTarFile.tar`) | |
4. Now transfer the file with the following command (here shown for psycl but you may replace it with pirol): | |
```{shell} | |
scp -o "ProxyJump yourUserName@gate1.mpcdf.mpg.de" nameOfTarFile.tar yourUserName@psycl01.bc.rzg.mpg.de:/your/destination/folder | |
``` | |
- make sure to replace all `yourUserName` spaceholders and the `nameOfTarFile.tar` spaceholder with the acutal names. | |
5. Once the file has arrived at psycl (or pirol) you should run another `md5sum`command on the file and check whether the code is the same as the one you had before. | |
6. Then unpack your file with `tar -xvf nameOfTarFile.tar`and delete the `.tar`file from both clusters as well as the original un-tared folder you moved. | |
## 7. Working with the cluster | |
### 7.1 Submitting a job | |
In general to submit a job on the cluster you should use the batch system (`.sh`files). Your job will then be allocated to one or several nodes depending on the job's demands and availability of the nodes. To run a simple script (without parallelization) you can use the following as a template: | |
```{shell} | |
#!/bin/sh | |
#SBATCH --mem-per-cpu=20000 | |
#SBATCH --job-name=YourJobName | |
#SBATCH --output=../YourOutfileName.out | |
#SBATCH --error=../YourErrorFileName.err | |
#~~~~~ adjust as needed ~~~~~~~~~~~~~ | |
BASEDIR=/Your/Base/Dir | |
PROJECT=/Your/Project/Dir | |
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |
source ~/.bashrc | |
module load matlab | |
WORKINGPATH=${BASEDIR}/${PROJECT} | |
cd ${WORKINGPATH} | |
matlab -r "NAmeOfYourMatlabScript" | |
``` | |
### 7.2 Use parallel processing on the cluster | |
If you want to run several computations in parallel e.g. first-level analysis for a sample of n=500 subjects you can start all 500 jobs at the same time by using the batch system. This is much more efficient than starting 500 jobs individually. For parallel processing you have to think of a system that "translate" the task ID (only a number) into your subject's name/folder. There are many different ways of how this can be done. Here I will show my way, but you may prefer another solution. | |
To keep track of the subject's progress and potential errors I have a table, that I keep updating. It looks like this: | |
<img src="https://github.molgen.mpg.de/mira/ClusterIntro/blob/main/Bildschirmfoto%202024-09-25%20um%2012.20.56.png" alt="Description" width="500" height="300"> | |
When using parallel processing I first create a task ID, which is basically an array of numbers from 1 to n of my sample. An example shell script for this may look like this: | |
```{shell} | |
#!/bin/sh | |
sbatch --array=1-15 example.sh | |
``` | |
This shell script calls another shell script called `example.sh`, which starts the actual jobs, one for each subject. As an output you will get an error and an outfile for each subject. It is advisable to add printed text after important points of your computation or to implement printed error statements that you can go through in your outfile. In case some subjects do not run, these "check points" are useful hints for you to debug the code. | |
```{shell} | |
#!/bin/sh | |
#SBATCH --mem-per-cpu=20000 | |
#SBATCH --job-name=exampleScript | |
#SBATCH --output=ID-%j.out | |
#SBATCH --error=ID-%j.err | |
source ~/.bashrc | |
module load matlab | |
cd /psycl/u/erhartm/examples | |
echo "slurm task id = $SLURM_ARRAY_TASK_ID" | |
matlab -r "A_function_for_single_sub($SLURM_ARRAY_TASK_ID)" | |
``` | |
### 7.3 Debug in interactive mode | |
If you have trouble identifying a bug in your code you may want to use an interactive mode for debugging and "see" what you are doing or how a variable looks like before the error occurs. If you have added the two lines `ForwardAgent yes` and `ForwardX11 yes` you can open matlab with a visual screen and use it as on a local machine. On Mac you need to have 'XMing' or 'XQuartz' installed. | |
1. Login to one of the clusters | |
2. type `module load matlab`(or `module load R`) | |
3. type `matlab` (or `R`) | |
4. be a bit patient. A screen will open with the usual matlab interface. | |
## 8. Copy data from/to cluster | |
To copy data from the cluster to your local machine use: | |
`rsync -rauP psycl01.bc.rzg.mpg.de:/psycl/path/to/folder/or/file /destination/on/your/computer` | |
To copy data from your local machine to the cluster use: | |
`rsync -rauP /path/to/folder/or/file/to/copy psycl01.bc.rzg.mpg.de:/psycl/destination/folder` | |
Make sure to replace the source and destination paths with your acutal path/file names. And to replace the `psycl` settings with the ones for `pirol` (`pirol01.hpccloud.mpcdf.mpg.de`). | |
## 9. Useful resources | |
If you face an obstacle you can check out many different resources, that are very well documented: | |
- [The MPCDF overview page](https://docs.mpcdf.mpg.de/doc/computing/overview.html#compute-facilities) | |
- [The MPCDF overview about gateway machines](https://docs.mpcdf.mpg.de/doc/computing/gateways.html) | |
- [The MPCDF instruction for psycl and pirol](https://docs.mpcdf.mpg.de/doc/computing/clusters/systems/Psychiatry.html) | |
- [Our self-made IT-Wiki on github molgen](https://github.molgen.mpg.de/mpip/IT-Wiki) | |
And if you are really stuck you can e-mail the [MPCDF helpdesk](https://helpdesk.mpcdf.mpg.de/mpcdf/index.html). Login credentials are the same as for the clusters. |