Matlab

Matlab is a numerical computing and programming environment with a broad range of functionality (matrix manipulation, numerical linear algebra, general-purpose graphics, etc.). Additionally, special application areas are served by a large number of optional toolboxes. Matlab version 2011b is install on the cluster and can be accessed here: /usr/local/bin/matlab

Running Matlab on the headnode

Matlab Distributed Computing toolbox/server is installed on psych-o and this requires running Matlab on the head node due to how it integrates with PBS. Otherwise, users should avoid running Matlab on the head node for processing.

Interactive Matlab Sessions

When running Matlab interactively, users should avoid running it on the head node. Instead, they should request an interactive session through the PBS job scheduler. Since running Matlab on Psycho using an interactive session has been painfully sluggish and unusable, (Apparently there is a bug or problem using PBS job scheduler and Matlab, we have a method you can follow to work around this problem.

Note: Users are not permitted to log onto a cluster backend node from the psych-o head node without having a PBS job on that particular node. Users would get this response similar to this:

[dpane@psych-o ~]$ ssh -Y compute-0-15 
Connection closed by 10.1.1.246

But, if a user initiates a pbs session using qsub to receive an interactive session (NOTE: Keep in mind that walltime remains important. Once your PBS Interactive session ends, your ssh session along with any other processes will be killed automatically.):


[dpane@psych-o ~]$ qsub -I -X
qsub: waiting for job 2462.psych-o.hpc1.cs.cmu.edu to start
qsub: job 2462.psych-o.hpc1.cs.cmu.edu ready

The can the You can then use another psych-o shell to ssh to the node that you have that interactive session or PBS job:

[dpane@psych-o ~]$ ssh -Y compute-0-15 
Last login: Wed Sep 9 15:40:53 2015 from psych-o.local
Rocks Compute Node
Rocks 6.0 (Mamba)
Profile built 18:20 17-Jun-2015

Kickstarted 18:47 17-Jun-2015
[dpane@compute-0-15 ~]$

Users can then use that session to proceed to run Matlab in an interactive way:

[dpane@compute-0-15 ~]$ pwd
/home/<username>
[dpane@compute-0-15 ~]$ module avail

----------------------------------------------------- /usr/share/Modules/modulefiles ----------------------------------------------------------
dot                 glx-indirect        matlab-8.1          module-info         python27            rocks-openmpi
freesurfer          matlab-7.13         matlab-8.1-toolbox  modules             qt-4.8.2            use.own
fsl-5               matlab-7.13-toolbox module-cvs          null                qt-4.8.5

----------------------------------------------------- /etc/modulefiles ---------------------------------------------------------------------
openmpi-x86_64

[dpane@compute-0-15 ~]$ module load matlab-7.13
[dpane@compute-0-15 ~]$ matlab &

When you are done with your interactive commands, you can use the exit command to end the job:

[dpane@compute-0-15 ~]$ exit


Non-interactive Matlab Sessions

Matlab interactive mode can be used to take advantage of the integrated environment. But, it can also be run in a non-interactive mode for "batch processing" and to take advantage of cluster computing resources. If fact, the preferred mode of operation for Matlab on our cluster is non-interactive.

One way to run Matlab non-interactively is through

  • re-directing the standard input and output when invoking Matlab and
  • invoking Matlab from a submission script, submitted to the queue via the PBS scheduler.
Input and output re-direction is an one way of running Matlab non-interactivelly. It is achieved using the Linux operators < and > , with Matlab taking a code file as an input and writing the output to a file, e.g. matlab < myScript.m > myOutput.txt . The main function/program (e.g. myScript.m) should have the exit command at the end in order to force Matlab to quit after finishing the execution of the code.

Here is an example demonstrating a non-interactive Matlab program. A program file named mystats.m that contains a main function, mystats, and two local functions, mymean and mymedian.

function [avg, med] = mystats(x)
n = length(x);
avg = mymean(x,n);
med = mymedian(x,n);
end

function a = mymean(v,n)
% MYMEAN Example of a local function.

a = sum(v)/n;
end

function m = mymedian(v,n)
% MYMEDIAN Another example of a local function.

w = sort(v);
if rem(n,2) == 1
    m = w((n + 1)/2);
else
    m = (w(n/2) + w(n/2 + 1))/2;
end
end

The job is sent to the queue and executed on a backend node using the following PBS file provided and the command qsub run.sh. The script run.sh contains the following line to run the Matlab script:

matlab -nodisplay -nosplash < mystats.m > run.log

The flag nodisplay instructs Matlab to run without the GUI, while nosplash prevents the display of the Matlab logo. The < redirection operator ensures that Matlab runs the script main.m, while the > operator re-directs the standard output (normally to the terminal) to run.log file.

Running Matlab on parallel hardware

The example is run in batch mode with the command qsub run.sh, using the following PBS file:

#!/bin/bash
#PBS -V
#PBS -l nodes=1
#PBS -l walltime=0:05:00
#PBS -N matlab_test

cd $PBS_O_WORKDIR

matlab -nodisplay -nosplash < main.m > run.log

Notice how MATLAB is instructed to not load the interactive window.

Note: do not turn java off when lauching MATLAB (i.e. do not invoke matlab -nojvm); matlabpool uses the Java Virtual Machine.

After the job finishes, the CPU times spent executed the loops in main.m can be found in timings.dat, showing a clear speed-up of the execution in parallel.

Running Matlab on parallel hardware

Matlab can also be run where you can take advantage of parallel hardware in at least two ways.

The first is a built-in feature of Matlab, which "naturally" exploits multi-core processing via the underlying multi-threaded libraries Intel MKL and FFTW. Thus, linear algebra operations (such as the solution to a linear system A\b or matrix products A*B) and FFT operations (using the function fft) are implicitly multi-threaded and make use of all the cores available on a multi-core system without user intervention or special extra programming. Some of the vectorised operations in Matlab are also multi-threaded. However, this type of operations are only a part of Matlab programming and the vast proportion of the Matlab functionality are scripts or functions that can only use a single core.

You can also take advantage of parallel processing through a series of explicit programming techniques. The following techniques are:

  • using the Matlab toolbox Parallel Computing Toolbox;
  • trivial parallelism exploited through independent Matlab processes;
  • multi-threaded MEX programming.
Matlab has two toolboxes (licensed separately from the main distribution) that enable explicit parallel programming: the Parallel Computing Toolbox and the Distributed Computing Server. The Parallel Computing Toolbox is designed for programming multi-core architectures, while the Distributed Computing Server extends the Matlab's functionality to large resources, such as clusters.

The functionality of the Parallel Computing Toolbox is extended from single cluster node processing to distributed processing across multiple nodes by the Distributed Computing Server. To learn more about the product, please visit the Distributed Computing Server webpage.

Exploiting trivial parallelism

An easy way to exploit multi-core systems is to split the workflow into parts that can be processed completely independently. The typical example in this category is a parameter sweep, where the same Matlab script is run a large number of times using different inputs; these runs are indepent from each other and can be carried out concurrently. Thus, the entire workflow can be scheduled in jobs that group 8 independent runs to match the 8 cores available per compute node. This strategy is best coupled with the use of the Matlab mcc compiler in order to avoid an excessive use of licenses.

Multi-threaded MEX programming

Yet another way to exploit multi-core systems is via multi-threaded Mex programming. Mex (Matlab EXecutable) files are dynamically linked subroutines compiled from C, C++ or Fortran source code that can be run from within Matlab in the same way as M-files or built-in functions. These guidelines assume knowledge of serial Mex programming and provide an example of how to augment serial execution with multi-threading through OpenMP. Coupled with OpenMP multi-threading, Mex files become a powerful method to accelerate key parts of a Matlab program.

The main reason to write Mex files in C or Fortran (thus abandoning the high-level abstracted Matlab programming) is to gain speed of execution in computationally intensive operations that otherwise become a bottleneck in an application. Typically, this is done to replace a function that is identified through profiling as being slow and/or called a large number of times. Nevertheless, this programming effort is rewarded to various degrees, with the greatest relative benefits normally met when a Mex replaces a Matlab script (M-file). At the other extreme, Matlab operations that rely on performance libraries like FFTW (e.g. fftn) or BLAS/LAPACK (e.g. solution of a dense linear systems, A\b), which are highly optimised have nothing or very little to benefit from Mex programming. The best source for learning Mex programming is the Mathworks webpages

-- David Pane - 2015-06-11

Comments

Edit | Attach | Watch | Print version | History: r6 < r5 < r4 < r3 < r2 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r6 - 2015-09-09 - dpane
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2017 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback