Introduction

You can submit MATLAB computations to one of KAUST clusters IBEXNeser, or Shaheen directly from the MATLAB user interface. This has the following advantages for you as a MATLAB user:

  • freeing up your local workstation: you will be able to execute an application on your workstation in parallel to executing MATLAB on the cluster,
  • partition the array among multiple MATLAB workers: if you have an array that is too large for your computer's memory, you can partition it among multiple MATLAB workers on the cluster, so that each worker contains only a part of the array,
  • shorter turnaround time: it is very well possible that MATLAB will run quicker on the cluster, leading to a shorter turnaround time for your results, and
  • stay within MATLAB GUI: by submitting jobs to the cluster directly from the MATLAB user interface, you will not need to go to Linux on the cluster yourself. You will be able to stay within the MATLAB user interface.

The KAUST HPC add-on (HPC add-on from now on) for MATLAB lets this happen. It allows users to remotely connect to the clusters to run parallel jobs in MATLAB.

HPC add-on Installation Procedure

The MATLAB HPC add-on is available for MATLAB versions R2016b or higher. It is not available for MATLAB versions previous to R2016b release. The installation procedure depends upon the operating system (Windows, Linux, macOS).

If you are on a Linux workstation, the most recent version of MATLAB and the HPC add-on are automatically loaded with the following module command:

module load matlab

If you want to use an older version of MATLAB on Linux, use one of the two following module commands to load MATLAB and the HPC add-on.

module load matlab/R2018b

If you are on a Windows or macOS system, you will have to download the HPC add-on then add it to your MATLABPATH.

Download and unzip the HPC add-on package depending on MATLAB version you're using:

Start MATLAB then call

addpath('<path to HPC add-on>')

Configure cluster profiles

Execute the following command in your MATLAB command window:

configCluster

You will get a numbered list to choose which cluster you want to create a profile for:

configCluster
     [1] IBEX INTEL
     [2] NESER
     [3] SHAHEEN XC40
Select a cluster [1-3]:

Choose the cluster you want to run your scripts on.

Note that you will need an account on the cluster you want to run. Consult cluster administrators on how to get one.

Once you have created the profiles, you will be able to see the profiles with:

Parallel > Manage Cluster Profiles

Write, Submit and Run a Batch Job

Clients and Workers

When you use the HPC add-on, you have a MATLAB client session and one or more MATLAB workers.

The client session may run on your laptop/desktop computer or it may run on a login node of one of the clusters via an interactive session. In the client session you will run MATLAB commands to set up and submit a batch job. If you run your MATLAB client session on your local computer, it is considered a remote client.

The MATLAB workers always run on the cluster as part of a batch job. You will be able to use any toolboxes that you have a license for by submitting batch jobs from your MATLAB client. The batch jobs you submit from your MATLAB client with the HPC add-on will be able to utilize functions such as parfor and spmd to run your code in parallel, in addition to being able to run normal MATLAB code.

This document describes how to perform a computation on a single worker (one processor) or on multiple workers using a script with a parfor loop.

Performance Limitations

Parallel MATLAB on a cluster has a lot of overhead. It may not give you the speedup you expect, especially if you're solving small problems. Another consideration is that the MATLAB workers are single-threaded. They don't take advantage of the multi-threading built into many MATLAB functions.

Notwithstanding these caveats, the act of running your MATLAB jobs on the cluster will free up your client for other tasks. This may be enough of an advantage for you.

Usage

The primary way to utilize the HPC add-on is to submit batch jobs from your MATLAB client, either on your local PC, Mac or Linux workstation. These instructions assume that you have already configured your MATLAB client as described above.

MATLAB treats the output directory specified with the submit functions as SCRATCH file space, and will cleanup this directory after you have successfully retrieved your data. However, in practice it has been observed that MATLAB sometimes cleans up this directory even if the commands are unsuccessful (such as in the case of a large file transfer). To avoid data loss, please make sure your entry function/script copies any important output files to an alternate location for redundancy. Or simply save your data to a directory of your choice.

  • Write your MATLAB script to run on the cluster. This can be just a serial script or it may contain parallel commands such as parfor. If you are using an existing MATLAB script, you need to adapt it as follows:
    • Remove any parpool.
    • Adapt the way you do I/O. The reason is that in an existing MATLAB script, the location of input and output files may be hard-coded with a local directory. However if you run your script on the cluster, the I/O is performed differently. When you submit your script to be executed on the cluster, the input files are sent along with the script to the cluster, and they will be located in a temporary directory. Thus the loads should NOT specify a directory, as the input files are all located in that temporary directory. Use
load('<name of input file>')

instead of

load('/<name of directory>/<name of input file>')
  • Execute the command parcluster to create a cluster object in the MATLAB workspace. This command uses default profile, i.e. last one you setup/selected:
c = parcluster;

If you want to use Neser, use command:

neser = parcluster('neser')

If you want to use Shaheen, use command:

shaheen = parcluster('shaheen')
  • Setup connection parameters for selected cluster. These commands differ between R2016b and later versions. Follow the instructions below based on what MATLAB version you're using.
    • On R2016b use ClusterInfo. This command allows you to set scheduler options. SLURM is the scheduler of IBEX, Neser, and Shaheen. You can list the current state of scheduler options with ClusterInfo.state. You can find the scheduler options that you can set by typing ClusterInfo.set and then clicking on the tab key. For example, setting the estimated wall clock time is mandatory. Set the wall clock time to one hour with:
ClusterInfo.setWallTime('60')

These are all the parameters you can set on MATLAB 2016b

>> ClusterInfo.state
 
               DataParallelism : eth
                  EmailAddress : 
                   GpusPerNode : 
                      MemUsage : 
                       JobName : 
                PrivateKeyFile : 
   PrivateKeyFileHasPassPhrase : 1
                  ProcsPerNode : 
                   ProjectName : 
                     QueueName : batch
          RequireExclusiveNode : 0
                       SshPort : 22
                        UseGpu : 0
            UserDefinedOptions : 
             UserNameOnCluster : arenaam
                      WallTime : 
  • On R2017a and higher use c.AdditionalProperties. This command allows you to set scheduler options. SLURM is the scheduler of IBEX, Neser, and Shaheen. You can list the current state of scheduler options with  c.AdditionalProperties. You can find the scheduler options that you can set by typing c.AdditionalProperties.<name of property> and then clicking on the tab key. For example, setting the estimated wall clock time is mandatory. Set the wall clock time to one hour with:
c = parcluster; % Default cluster profile
c.AdditionalProperties.WallTime = ('60');
c.saveProfile; % Don't forget to save this for later use

These are all the parameters you can set on MATLAB 2017a and higher

>> c = parcluster;
>> c.AdditionalProperties
 
ans = 
 
  AdditionalProperties with properties:
 
         AdditionalSubmitArgs: ''
                  ClusterHost: 'ilogin.ibex.kaust.edu.sa'
                  ClusterName: 'intel'
              DataParallelism: 'eth'
        DebugMessagesTurnedOn: 0
                 EmailAddress: ''
                 IdentityFile: ''
    IdentityFileHasPassphrase: 0
                      JobName: ''
                 ProcsPerNode: 0
                  ProjectName: ''
                    QueueName: 'batch'
     RemoteJobStorageLocation: '/ibex/scratch/arenaam/Jobs/R2018a'
        RequiresExclusiveNode: 0
                      SshPort: 22
                     StraceOn: 0
              UseIdentityFile: 1
            UserNameOnCluster: 'arenaam'
                     WallTime: ''
  • In both cases you want to set, besides WallTime:
    • JobName, a description of what you're running
    • ProcsPerNode, number of cores per node
    • ProjectName (on Shaheen & Neser), account where to charge job consumption
    • RequiresExclusiveNode, if you want exclusive nodes for your job
  • Execute the command batch. This command begins an automated process that connects to the cluster, submits a job to the scheduler, and initializes MDCS. For example if you submit a script to the Neser cluster with one input file:
job = neser.batch('<your script>', 'pool', 39, 'AttachedFiles', '<name of input file>')

Specify the name of your script without the “m” prefix. For instance use eigtest, and not eigtest.m.

The argument pool specified the number of workers you need for execution of the script. Note that the MATLAB adds one worker to the number of workers you specified in pool. So for instance if you specify 39 workers in the pool, then MATLAB will use 40 workers, which fit nicely on one Neser node.

If you submit a function to the Shaheen cluster with 1 output and 2 input arguments, use:

job = shaheen.batch(@<function-name>, 1, {arg1, arg2}, 'pool', 95)

If you have many input files, you can store them in one directory, and transfer the directory to the cluster with the following command:

job = neser.batch('<your script>', 'pool', 63, 'AttachedFiles', '<name of directory>')

Note that you still do not need to specify the directory name when you load an input file:

load('<name of input file>')
  • Follow the status of the job with the MATLAB Job Monitor. Your job will be submitted to the system, and assigned a job ID. You can follow the status of the job with the MATLAB Job Monitor (Click on Parallel > Monitor Jobs).

matlab.jobmonitor-1

  • Wait for your job to complete. You may use the wait command in your client session to update the jobs status, or wait for the status on the Job Monitor to become finished. The Linux experts may go to the cluster & monitor the job's status on a login node to know when the job has completed.
  • Reconnect if you have closed your MATLAB client session. The procedure is shown below:
% Reinitialize the cluster object using the same profile as before, 
% for instance in the case of Neser
neser = parcluster('neser');
 
% Show all jobs of Neser
jobs = neser.Jobs
 
% Get the job ID number from the list and 
% get the job object with
jobObject = neser.Jobs(<job ID number>);
  • Retrieve your results. The load or fetchOutputs commands retrieve the results of your computations from your MATLAB working directory after the job has finished. They also report errors if your run failed. Load is used for exporting the workspace from an entry script, and fetchOutputs is used for retrieving the specified output arguments of an entry function.
% if you submitted a script to the cluster, use
load(jobObject)
% to load results into your workspace as variables
 
% if you submitted a function to the cluster, use
jobObject.fetchOutputs
% to retrieve the results

If you create output files in your script, then these output files will be stored in your home directory on the cluster. You then need to retrieve them manually.

  • Special for large data: MATLAB does not utilize sftp when fetching your files, so large files might take a long time to retrieve. Additionally, if your output arguments/workspace is over a certain size (around 2GB), using the load or fetchOutputs commands will give an index out of range error caused by MDCS failing to save the output to file. This is due to an internal limitation in the default version of MATLAB's .mat files. However, you may work around this limitation by manually saving your workspace or arguments in your entry script, using the -v7.3 switch in the save command.
You can also get debug output from the remote job
% Contents of remote debug log is printed to console
debugLog(job)

Passwordless SSH

By default, you will need to enter a username and password every time you submit a job to a cluster. You can avoid this by setting up passwordless SSH from your client to the cluster. The procedure is as follows:

  • Generate a private/public key pair on the cluster (if you don't already have it!):
    • Go to directory /home/<username>/.ssh
      • If you don't have this directory you definitely don't have SSH keys
      • Create the directory if it doesn't exist
    • Execute command
      ssh-keygen -b 1024 -t rsa -f id_rsa -P ""
    • Execute command
      cat id_rsa.pub >> authorized_keys
  • Copy the private key id_rsa to your client (Windows, Mac or Linux). If you copy to Windows, make sure you convert the file to Windows format.
    scp ~/.ssh/id_rsa <workstation-IP>:~/.ssh/
  • Execute the following commands in MATLAB
ClusterInfo.setPrivateKeyFile('<path to the private key on your client>')
ClusterInfo.setPrivateKeyFileHasPassPhrase(false)

Further Reading

Troubleshooting

Command sbatch not found

This means your HOME directory isn't correctly configured. You have to copy some files from the system to your HOME.

# Go to your HOME
cd
 
# Copy the missing files
cp /etc/skel/.bash_profile .
cp /etc/skel/.bashrc .

MATLAB doesn't find my script

You have to explicitly (right click on the folder) include the main directory, i.e. the one that contains your script(s) in MATLAB's path.

MATLAB can't find my folders on the server

MATLAB automatically uploads all files used by your script. The problem is that MATLAB places folders in a temporary directory and has issues locating them for your script. The suggested way to use folders from your code is through packages. Let's say your code uses a directory named myfolder and calls a file named myInput in that folder. Your first step is to rename myfolder to +myfolder so it becomes a package. Now you have to modify your MATLAB code like this

myfolder.myInput

Basically all calls to the code in your package need a prefix myfolder. now; even in the package itself.

MATLAB's batch function will now upload your main script and the packages it depends on. You just have to remember to add the main directory to MATLAB's path.

Error message: Job submission failed because the user supplied CommunicatingSubmitFcn errored

There are several reasons why you could get this error message. One of the reasons could be that the private key file on your Windows system is in Linux format, and not in Windows format. Solution: Convert your private key file to Windows format.

Warning message: Unable to change to requested folder: '/Users/<username>/Documents'

If you get the following warning message:

Warnings: Unable to change to requested folder: '/Users/<username>/Documents'. Current folder is: '/home/<username>'.
                       Reason: Cannot CD to /Users/<username>/Documents (Name is nonexistent or not a directory).

please disregard it. You can specify a different working directory by specifying 'currentfolder', '<working directory>' in the batch command.

Message in log file: Warning: Name is nonexistent or not a directory

This is due to an old pathdef.m file in your home directory. Delete this file, and the warning will go away.

Error message: Undefined variable

This error could be due to an wrong name in the batch command. When you submit a script in the batch command, remove the “m” suffix. For example, for script eigtest.m, use command:

batch(neser, 'eigtest', 'pool', 31)

and don't use:

batch(neser, 'eigtest.m', 'pool', 31)

Errors

If you have any errors related to the KAUST HPC add-on, or have any other questions about utilizing this add-on, please contact IT Linux Support with: your user id, any relevant error messages, a job ID(s) if applicable, and the version of MATLAB you are using.