You can submit MATLAB computations to one of KAUST clusters IBEX, Neser, or Shaheen directly from the MATLAB user interface. This has the following advantages for you as a MATLAB user:
The KAUST HPC add-on (HPC add-on from now on) for MATLAB lets this happen. It allows users to remotely connect to the clusters to run parallel jobs in MATLAB.
The MATLAB HPC add-on is available for MATLAB versions R2016b or higher. It is not available for MATLAB versions previous to R2016b release. The installation procedure depends upon the operating system (Windows, Linux, macOS).
If you are on a Linux workstation, the most recent version of MATLAB and the HPC add-on are automatically loaded with the following module command:
module load matlab
If you want to use an older version of MATLAB on Linux, use one of the two following module commands to load MATLAB and the HPC add-on.
module load matlab/R2018b
If you are on a Windows or macOS system, you will have to download the HPC add-on then add it to your MATLABPATH.
Download and unzip the HPC add-on package depending on MATLAB version you're using:
Start MATLAB then call
addpath('<path to HPC add-on>')
Execute the following command in your MATLAB command window:
configCluster
You will get a numbered list to choose which cluster you want to create a profile for:
configCluster
[1] IBEX INTEL
[2] NESER
[3] SHAHEEN XC40
Select a cluster [1-3]:
Choose the cluster you want to run your scripts on.
Note that you will need an account on the cluster you want to run. Consult cluster administrators on how to get one.
Once you have created the profiles, you will be able to see the profiles with:
Parallel > Manage Cluster Profiles
When you use the HPC add-on, you have a MATLAB client session and one or more MATLAB workers.
The client session may run on your laptop/desktop computer or it may run on a login node of one of the clusters via an interactive session. In the client session you will run MATLAB commands to set up and submit a batch job. If you run your MATLAB client session on your local computer, it is considered a remote client.
The MATLAB workers always run on the cluster as part of a batch job. You will be able to use any toolboxes that you have a license for by submitting batch jobs from your MATLAB client. The batch jobs you submit from your MATLAB client with the HPC
add-on will be able to utilize functions such as parfor
and spmd
to run your code in parallel, in addition to being able to run normal MATLAB code.
This document describes how to perform a computation on a single worker (one processor) or on multiple workers using a script with a parfor
loop.
Parallel MATLAB on a cluster has a lot of overhead. It may not give you the speedup you expect, especially if you're solving small problems. Another consideration is that the MATLAB workers are single-threaded. They don't take advantage of the multi-threading built into many MATLAB functions.
Notwithstanding these caveats, the act of running your MATLAB jobs on the cluster will free up your client for other tasks. This may be enough of an advantage for you.
The primary way to utilize the HPC add-on is to submit batch jobs from your MATLAB client, either on your local PC, Mac or Linux workstation. These instructions assume that you have already configured your MATLAB client as described above.
MATLAB treats the output directory specified with the submit functions as SCRATCH file space, and will cleanup this directory after you have successfully retrieved your data. However, in practice it has been observed that MATLAB sometimes cleans up this directory even if the commands are unsuccessful (such as in the case of a large file transfer). To avoid data loss, please make sure your entry function/script copies any important output files to an alternate location for redundancy. Or simply save your data to a directory of your choice.
parfor
. If you are using an existing MATLAB script, you need to adapt it as follows:
parpool
.load('<name of input file>')
instead of
load('/<name of directory>/<name of input file>')
parcluster
to create a cluster object in the MATLAB workspace. This command uses default profile, i.e. last one you setup/selected:c = parcluster;
If you want to use Neser, use command:
neser = parcluster('neser')
If you want to use Shaheen, use command:
shaheen = parcluster('shaheen')
R2016b
and later versions. Follow the instructions below based on what MATLAB version you're using.ClusterInfo
. This command allows you to set scheduler options. SLURM is the scheduler of IBEX, Neser, and Shaheen. You can list the current state of scheduler options with ClusterInfo.state
.
You can find the scheduler options that you can set by typing ClusterInfo.set
and then clicking on the tab key. For example, setting the estimated wall clock time is mandatory. Set the wall clock time to
one hour with:ClusterInfo.setWallTime('60')
These are all the parameters you can set on MATLAB 2016b
>> ClusterInfo.state
DataParallelism : eth
EmailAddress :
GpusPerNode :
MemUsage :
JobName :
PrivateKeyFile :
PrivateKeyFileHasPassPhrase : 1
ProcsPerNode :
ProjectName :
QueueName : batch
RequireExclusiveNode : 0
SshPort : 22
UseGpu : 0
UserDefinedOptions :
UserNameOnCluster : arenaam
WallTime :
c.AdditionalProperties
. This command allows you to set scheduler options. SLURM is the scheduler of IBEX, Neser, and Shaheen. You can list the current state of scheduler options with
c.AdditionalProperties
. You can find the scheduler options that you can set by typing c.AdditionalProperties.<name of property>
and then clicking on the tab key. For example, setting the estimated
wall clock time is mandatory. Set the wall clock time to one hour with:c = parcluster; % Default cluster profile
c.AdditionalProperties.WallTime = ('60');
c.saveProfile; % Don't forget to save this for later use
These are all the parameters you can set on MATLAB 2017a and higher
>> c = parcluster;
>> c.AdditionalProperties
ans =
AdditionalProperties with properties:
AdditionalSubmitArgs: ''
ClusterHost: 'ilogin.ibex.kaust.edu.sa'
ClusterName: 'intel'
DataParallelism: 'eth'
DebugMessagesTurnedOn: 0
EmailAddress: ''
IdentityFile: ''
IdentityFileHasPassphrase: 0
JobName: ''
ProcsPerNode: 0
ProjectName: ''
QueueName: 'batch'
RemoteJobStorageLocation: '/ibex/scratch/arenaam/Jobs/R2018a'
RequiresExclusiveNode: 0
SshPort: 22
StraceOn: 0
UseIdentityFile: 1
UserNameOnCluster: 'arenaam'
WallTime: ''
WallTime
:JobName
, a description of what you're runningProcsPerNode
, number of cores per nodeProjectName
(on Shaheen & Neser), account where to charge job consumptionRequiresExclusiveNode
, if you want exclusive nodes for your jobbatch
. This command begins an automated process that connects to the cluster, submits a job to the scheduler, and initializes MDCS. For example if you submit a script to the Neser cluster
with one input file:job = neser.batch('<your script>', 'pool', 39, 'AttachedFiles', '<name of input file>')
Specify the name of your script without the “m” prefix. For instance use eigtest
, and not eigtest.m
.
The argument pool
specified the number of workers you need for execution of the script. Note that the MATLAB adds one worker to the number of workers you specified in pool
. So for instance if you specify 39
workers in the pool, then MATLAB will use 40 workers, which fit nicely on one Neser node.
If you submit a function to the Shaheen cluster with 1 output and 2 input arguments, use:
job = shaheen.batch(@<function-name>, 1, {arg1, arg2}, 'pool', 95)
If you have many input files, you can store them in one directory, and transfer the directory to the cluster with the following command:
job = neser.batch('<your script>', 'pool', 63, 'AttachedFiles', '<name of directory>')
Note that you still do not need to specify the directory name when you load an input file:
load('<name of input file>')
wait
command in your client session to update the jobs status, or wait for the status on the Job Monitor to become finished
. The Linux
experts may go to the cluster & monitor the job's status on a login node to know when the job has completed.% Reinitialize the cluster object using the same profile as before,
% for instance in the case of Neser
neser = parcluster('neser');
% Show all jobs of Neser
jobs = neser.Jobs
% Get the job ID number from the list and
% get the job object with
jobObject = neser.Jobs(<job ID number>);
load
or fetchOutputs
commands retrieve the results of your computations from your MATLAB working directory after the job has finished. They also report
errors if your run failed. Load
is used for exporting the workspace from an entry script, and fetchOutputs
is used for retrieving the specified output arguments of an entry function.% if you submitted a script to the cluster, use load(jobObject) % to load results into your workspace as variables % if you submitted a function to the cluster, use jobObject.fetchOutputs % to retrieve the results
If you create output files in your script, then these output files will be stored in your home directory on the cluster. You then need to retrieve them manually.
sftp
when fetching your files, so large files might take a long time to retrieve. Additionally, if your output arguments/workspace is over a certain size
(around 2GB), using the load
or fetchOutputs
commands will give an index out of range
error caused by MDCS failing to save the output to file. This is due to an internal
limitation in the default version of MATLAB's .mat
files. However, you may work around this limitation by manually saving your workspace or arguments in your entry script, using the -v7.3
switch
in the save
command.% Contents of remote debug log is printed to console
debugLog(job)
By default, you will need to enter a username and password every time you submit a job to a cluster. You can avoid this by setting up passwordless SSH from your client to the cluster. The procedure is as follows:
the cluster
(if you don't already have it!):/home/<username>/.ssh
ssh-keygen -b 1024 -t rsa -f id_rsa -P ""
cat id_rsa.pub >> authorized_keys
id_rsa
to your client (Windows, Mac or Linux). If you copy to Windows, make sure you convert the file to Windows format.
scp ~/.ssh/id_rsa <workstation-IP>:~/.ssh/
ClusterInfo.setPrivateKeyFile('<path to the private key on your client>')
ClusterInfo.setPrivateKeyFileHasPassPhrase(false)
This means your HOME directory isn't correctly configured. You have to copy some files from the system to your HOME.
# Go to your HOME
cd
# Copy the missing files
cp /etc/skel/.bash_profile .
cp /etc/skel/.bashrc .
You have to explicitly (right click on the folder) include the main directory, i.e. the one that contains your script(s) in MATLAB's path.
MATLAB automatically uploads all files used by your script. The problem is that MATLAB places folders in a temporary directory and has issues locating them for your script. The suggested way to use folders from your code is through packages. Let's
say your code uses a directory named myfolder
and calls a file named myInput
in that folder. Your first step is to rename myfolder to +myfolder
so it becomes a package. Now you
have to modify your MATLAB code like this
myfolder.myInput
Basically all calls to the code in your package need a prefix myfolder.
now; even in the package itself.
MATLAB's batch
function will now upload your main script and the packages it depends on. You just have to remember to add the main directory to MATLAB's path.
There are several reasons why you could get this error message. One of the reasons could be that the private key file on your Windows system is in Linux format, and not in Windows format. Solution: Convert your private key file to Windows format.
If you get the following warning message:
Warnings: Unable to change to requested folder: '/Users/<username>/Documents'. Current folder is: '/home/<username>'.
Reason: Cannot CD to /Users/<username>/Documents (Name is nonexistent or not a directory).
please disregard it. You can specify a different working directory by specifying 'currentfolder', '<working directory>'
in the batch command.
This is due to an old pathdef.m file in your home directory. Delete this file, and the warning will go away.
This error could be due to an wrong name in the batch command. When you submit a script in the batch command, remove the “m” suffix. For example, for script eigtest.m
, use command:
batch(neser, 'eigtest', 'pool', 31)
and don't use:
batch(neser, 'eigtest.m', 'pool', 31)
If you have any errors related to the KAUST HPC add-on, or have any other questions about utilizing this add-on, please contact IT Linux Support with: your user id, any relevant error messages, a job ID(s) if applicable, and the version of MATLAB you are using.