Note: This discussion is about an older version of the COMSOL Multiphysics® software. The information provided may be out of date.
Discussion Closed This discussion was created more than 6 months ago and has been closed. To start a new discussion with a link back to this one, click here.
Running COMSOL 4.x and COMSOL V4.x with MATLAB on cluster
Posted Feb 10, 2011, 5:19 a.m. EST Version 4.1 4 Replies
Please login with a confirmed email address before reporting spam
Could you please share the PBS-job script for these two running modes?? especially for the running COMSOL with MATLAB on cluster!!
Please login with a confirmed email address before reporting spam
Here is how I run comsol with matlab livelink through a shell script which PBS runs for executing jobs. Please note that request for assigning a job is assumed to have been made already
cat $PBS_NODEFILE | uniq > mpd.hosts #generate host file list in pwd
/comsol41/bin/comsol mpd boot -nn 4 -mpirsh ssh -f mpd.hosts -v -d > mpdstart.txt
/comsol41/bin/comsol server -nn 4 -mpmode owner -port 2222 > server.txt & # start comsol server at port 2222
/matlab_r2010a/bin/matlab -nosplash -nodesktop -r matlab_script > output.txt # launch matlab and run matlab_script.m
/cluster/comsol41/bin/comsol mpd allexit # close mpd when matlab job is over
Make sure that you change "-nn 4" above to correct value of physical nodes you have requested from PBS. Now in the file matlab_script.m you have to write the following commands:
addpath('/comsol41/mli') % this adds comsol libraries to matlab's path
mphstart('2222') % connect to comsol server at port 2222
That is all you need I think. I create matlab scripts from Comsol's GUI essentially so it takes care of the rest nicely. When the PBS job ends, comsol server is terminated as well so you don't need to manually close the port.
Executing a comsol batch job is even simpler. Here is how it works:
cat $PBS_NODEFILE | uniq > mpd.hosts #generate host file list in pwd
/comsol41/bin/comsol mpd boot -nn 4 -mpirsh ssh -f mpd.hosts -v -d > mpdstart.txt
/comsol41/bin/comsol batch -nn 4 -inputfile input.mph -outputfile output.mph > batch.log
/comsol41/bin/comsol mpd allexit
Feel free to write if something fails or I missed. Wish you all the best!
Please login with a confirmed email address before reporting spam
As for the COMSOL batch jobs, I changed the solver to either MUMPS or PARDISO according to the suggestion you gave in another thread. the mph model I have is a 3D RF model and it will sweep the frequency.
I followed the job on cluster. I logged in one of the node which the job was running on. monitored the process usage.
The process usage first increased to 100~799% (I used 8 process on each node.) after around 1~2 mins, the usage maintained at 100%, and never went more than 100% which was not something I expected.
As for the memory usage, it used ~15G at each node (the job run on 3 nodes, and 8cores on each node). I think the memory usage was too large, since the job was running on 3 nodes, and I also tested the mph model on my own computer, it just used ~15G and it worked well, but just took too long time to calculate. so on cluster on the memory-distributed mode, I think it should use less than 15G on each node. So perhaps COMSOL was not running on the memory-distributed mode. Or each node run for a different frequency since the model will do parametric sweep. Anyway I really have no idea what's going on with the job.
the PBS job script I used is following, actually it is almost the same as yours except that I specify the cores will be used on each node. I really have no idea why it doesn't work correctly in our case. maybe something wrong with my mph model. Do you have any idea
#!/bin/sh
#PBS -l nodes=3:ppn=8
#PBS -l walltime=00:30:00
#PBS -o output_comsolmpirun2_0202.file
#PBS -e error_comsolmpirun2_0202.file
cd $PBS_O_WORKDIR
module load COMSOL/4.1.0.112
sort $PBS_NODEFILE |uniq > comsolenodes
NODES=`wc -l comsolenodes | awk '{print $1}'`
TOTAL_TASKS=`wc -l $PBS_NODEFILE | awk '{print $1}'`
CORES=$(($TOTAL_TASKS / $NODES))
comsol -nn $NODES -mpirsh ssh mpd boot -f comsolenodes
comsol -nn $NODES -np $CORES batch -inputfile mpitest.mph
comsol -mpirsh ssh mpd allexit
as for the COMSOL run with matlab, I will look into it later after I solve the batch problem.
really appreciate your reply.
Please login with a confirmed email address before reporting spam
I have actually not tried to run sweeps yet. Only today did I manage to solve my first ever 3D rf problem on distributed memory model both through batch and matlab interface. Does this work for you? It might be useful first to test if you get reliable results without sweeping through parameters. Although I definitely acknowledge the advantages of utilizing parametric features of comsol but that is something which in most cases could alternatively be achieved through matlab script at the cost of computational efficiency.
I would also suggest you to contact comsol support without waiting to exhaust all other avenues. They might take some time occasionally but they do try to help you out. At least that has been my experience. Make sure that you attach detail output of mpd as well. You can get it by starting mpd the way I mentioned in my post earlier.
Wish you success!
Please login with a confirmed email address before reporting spam
Meanwhile I tested it on my own PC (with 16G) using iterative (GMRES) solver, it finished in few minutes.
Anyway, I will contact support
Note that while COMSOL employees may participate in the discussion forum, COMSOL® software users who are on-subscription should submit their questions via the Support Center for a more comprehensive response from the Technical Support team.