This note helps you set up Singularity environment on NYU HPC (Greene)
Most up-to-date doc: link
Connect to NYU Greene:
ssh [netid]@greene.hpc.nyu.edu
[Type in you password]
# Last login: Tue Mar 1 15:53:16 2022 from xx.xx.xx.xx
# [hl3797@log-2 hl3797]$
Get a compute node with GPU:
srun --nodes=1 --cpus-per-task=4 --mem=32GB --time=2:00:00 --gres=gpu:1 --pty /bin/bash
# wait until you are directed to the node
# [hl3797@gv001 ~]$
Prepare required files:
cd /scratch/$USER
# [hl3797@gv001 hl3797]$
cp /scratch/work/public/singularity/cuda11.4.2-cudnn8.2.4-devel-ubuntu20.04.3.sif .
cp /scratch/work/public/overlay-fs-ext3/overlay-25GB-500K.ext3.gz .
gunzip -vvv overlay-25GB-500K.ext3.gz
# Note: this takes a long time
ls
# cuda11.4.2-cudnn8.2.4-devel-ubuntu20.04.3.sif overlay-25GB-500K.ext3
Launch singularity container (with GPU access):
singularity exec --nv --bind /scratch/$USER --overlay /scratch/$USER/overlay-25GB-500K.ext3:rw /scratch/$USER/cuda11.4.2-cudnn8.2.4-devel-ubuntu20.04.3.sif /bin/bash
# Singularity>
cd /ext3
# Singularity>
wget <https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh> --no-check-certificate
# 2022-03-01 21:54:02 (151 MB/s) - 'Miniconda3-latest-Linux-x86_64.sh' saved [75660608/75660608]
bash ./Miniconda3-latest-Linux-x86_64.sh -b -p /ext3/miniconda3
# PREFIX=/ext3/miniconda3
# Unpacking payload ...
# [...]
# installation finished.
wget <https://raw.githubusercontent.com/hmdliu/MLLU-SP22-tmp/main/env.sh> --no-check-certificate
# 2022-03-01 21:55:54 (4.76 MB/s) - 'env.sh' saved [143/143]
source /ext3/env.sh
# Singularity>
unset -f which
which python
# /ext3/miniconda3/bin/python
Install packages:
which pip
# /ext3/miniconda3/bin/pip
pip install torch torchvision torchaudio
pip install sklearn numpy scipy pandas matplotlib h5py addict tensorboard
# [normal installation info]
# Note: You may install more pkgs as needed.
Create .bashrc
and .bash_profile
:
Singularity> python
# Python 3.9.7 (default, Sep 16 2021, 13:09:58)
# [...]
>>> import torch
>>> torch.cuda.is_available()
True
>>> torch.cuda.current_device()
0
>>> torch.cuda.device_count()
1
>>> torch.cuda.get_device_name(0)
'Tesla T4'
>>> exit()
Singularity> exit
# [hl3797@gv001 hl3797]$
Fix path:
SCRATCH=/scratch/$USER
echo $SCRATCH
# /scratch/[netid]
Test job:
mkdir $SCRATCH/test
cd $SCRATCH/test
# [hl3797@b-8-72 test]$
wget <https://raw.githubusercontent.com/TeamOfProfGuo/Codebase-Files/main/test_gpu.py>
# 2022-04-03 10:51:05 (86.9 MB/s) - ‘test_gpu.py’ saved [678/678]
wget <https://raw.githubusercontent.com/TeamOfProfGuo/Codebase-Files/main/submit_job.slurm>
# 2022-04-03 10:51:19 (60.5 MB/s) - ‘submit_job.slurm’ saved [439/439]
sbatch submit_job.slurm
# Submitted batch job 53617
# Note: The job can be pending for a while.
squeue -u $USER
# JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
# 17135904 rtx8000 test_gpu hl3797 R 0:10 1 gr004
# Note: Wait until the 'test_gpu' job ends.
cat test.out
# Torch cuda available: True
# GPU name: Quadro RTX 8000
#
#
# CPU matmul elapsed: 1.7312142848968506 sec.
# GPU matmul elapsed: 0.15191888809204102 sec.
cat test.err
# /scratch/hl3797/test/test_gpu.py:34: UserWarning: Sample warning message.
# warnings.warn("Sample warning message.")