Note: This discussion is about an older version of the COMSOL Multiphysics® software. The information provided may be out of date.

Discussion Closed This discussion was created more than 6 months ago and has been closed. To start a new discussion with a link back to this one, click here.

No speedup when using distributed memory cluster

Josh Thomas Certified Consultant

Please login with a confirmed email address before reporting spam

I wanted to see if someone could help me determine why I am not seeing any speedup when I submit jobs on multiple nodes of my HPC cluster (using the distributed memory capability).

I understand that speedup is highly model dependent. Per the suggestions of previous discussion threads, I have tried numerous different models with different physics (both linear and non-linear problems). Also, I have tried large memory models and small memory models (also per previous thread recommendations). Below are my results using the COMSOL Model Library Example "Micromixer Cluster Version":

Micromixer_cluster.mph (mesh as given):
1 node; 1 proc.: Run time = 123 sec
1 node; 12 proc.: Run time = 38 sec
4 nodes; 12 ppn: Run time = 99 sec
8 nodes; 12 ppn: Run time = 223 sec

I see speedup when I go from 1 proc. to 12 proc. running on 1 node (shared memory), but I don't see speedup, in fact I see slowdown, when I try to distribute the job across multiple nodes (distributed memory). The step-by-step instructions said to try refining the mesh for better speedup (this was also COMSOL support's recommendation). Here are the results for 2 different refined meshes:

Micromixer_cluster.mph (refined mesh):
1 node; 1 proc.: Run time = 566 sec
1 node; 12 proc.: Run time = 130 sec
4 nodes; 12 ppn: Run time = 259 sec
8 nodes; 12 ppn: Run time = 501 sec

Micromixer_cluster.mph (super-refined mesh):
1 node; 1 proc.: Run time = 1169 sec
1 node; 12 proc.: Run time = 414 sec
4 nodes; 12 ppn: Run time = 614 sec
8 nodes; 12 ppn: Run time = 896 sec

Still no speedup. Only slowdown.

Has anyone seen any speedup on this COMSOL Model Library example? If so, I'd be interested in your results.

One thing that I am doing that is different than the COMSOL recommendation is submitting jobs through the command line rather than through the Desktop. Does anyone know why COMSOL recommends submitting batch cluster jobs through the Desktop and not through the command line? Could this be my issue?

Any help would be appreciated.

1 Reply Last Post Oct 4, 2012, 12:50 p.m. EDT
COMSOL Moderator

Hello Josh Thomas

Your Discussion has gone 30 days without a reply. If you still need help with COMSOL and have an on-subscription license, please visit our Support Center for help.

If you do not hold an on-subscription license, you may find an answer in another Discussion or in the Knowledge Base.


Please login with a confirmed email address before reporting spam

Posted: 1 decade ago Oct 4, 2012, 12:50 p.m. EDT
Dear Josh,

unfortunately I d not read your post earlier: hope it's not too late!
The models that you have tested are provided to test proper operations of the cluster, not performance. For performance, you'd better try to reproduce the results of the following paper, based upon models also available in the model library, for the same number of DOFs: www.comsol.fr/papers/10248/

Best regards,
Stephan

--
www.comsol.fr
Dear Josh, unfortunately I d not read your post earlier: hope it's not too late! The models that you have tested are provided to test proper operations of the cluster, not performance. For performance, you'd better try to reproduce the results of the following paper, based upon models also available in the model library, for the same number of DOFs: http://www.comsol.fr/papers/10248/ Best regards, Stephan -- www.comsol.fr

Note that while COMSOL employees may participate in the discussion forum, COMSOL® software users who are on-subscription should submit their questions via the Support Center for a more comprehensive response from the Technical Support team.