Discussion Closed This discussion was created more than 6 months ago and has been closed. To start a new discussion with a link back to this one, click here.

linux cluster distribute computing no speed up

Please login with a confirmed email address before reporting spam

cluster configuration:1head node+8nodes,32cores each node,64GB ram each node。

comsol:comsol4.0 floating network licence installed on the head node, intel mpi

mpi file: COMSOL40\models\ACDC_Module\Verification_Models\parallel_wires.mph the air domain is increased in order to get more DOF, stationary study, direct solver MUMPS
test process:
Step 1. Solve this problem in 1 node using 16 cores
Step 2. Solve it in 4 nodes using 16 cores each node

test result: the solution time is almost the same (about 100 seconds), no speedup is obtained. It seems that all the nodes do the same thing.

Please check the folowing logs and help me out please, Thanks a lot.


PBS log for Step 1
------------------------------------------------------------------------------
--- Starting job at: Wed May 9 10:58:22 CST 2012

--- Current working directory is: /home/lubo/comsol
--- Running on 16 processes (cores) on the following nodes:
16 node5
--- mpd BOOT
/home/lubo/comsol/COMSOL40/bin//comsol -nn 1 mpd boot -f /var/torque/aux//1723.mgmt -mpirsh ssh --verbose
running mpdallexit on node5
LAUNCHED mpd on node5 via
RUNNING: mpd on node5
--- mpd TRACE
--- Parallel COMSOL RUN
/home/lubo/comsol/COMSOL40/bin//comsol -nn 1 -np 16 batch -inputfile test.mph -outputfile out_test.mph -batchlog test.log
--- mpd ALLEXIT

--- Job finished at: Wed May 9 10:59:57 CST 2012
--------------------------------------------------------------------------------------------------------------------------------------------------------------- Starting job at: Wed May 9 10:58:22 CST 2012
--- Current working directory is: /home/lubo/comsol--- Running on 16 processes (cores) on the following nodes: 16 node5--- mpd BOOT/home/lubo/comsol/COMSOL40/bin//comsol -nn 1 mpd boot -f /var/torque/aux//1723.mgmt -mpirsh ssh --verboserunning mpdallexit on node5LAUNCHED mpd on node5 via RUNNING: mpd on node5--- mpd TRACE--- Parallel COMSOL RUN/home/lubo/comsol/COMSOL40/bin//comsol -nn 1 -np 16 batch -inputfile test.mph -outputfile out_test.mph -batchlog test.log--- mpd ALLEXIT
--- Job finished at: Wed May 9 10:59:57 CST 2012--------------------------------------------------------------------------------------------------------------------------------------------------------------- Starting job at: Wed May 9 10:58:22 CST 2012
--- Current working directory is: /home/lubo/comsol--- Running on 16 processes (cores) on the following nodes: 16 node5--- mpd BOOT/home/lubo/comsol/COMSOL40/bin//comsol -nn 1 mpd boot -f /var/torque/aux//1723.mgmt -mpirsh ssh --verboserunning mpdallexit on node5LAUNCHED mpd on node5 via RUNNING: mpd on node5--- mpd TRACE--- Parallel COMSOL RUN/home/lubo/comsol/COMSOL40/bin//comsol -nn 1 -np 16 batch -inputfile test.mph -outputfile out_test.mph -batchlog test.log--- mpd ALLEXIT
--- Job finished at: Wed May 9 10:59:57 CST 2012------------------------------------------------------------------------------


PBS log for step 2

------------------------------------------------------------------------------
--- Starting job at: Wed May 9 11:04:36 CST 2012

--- Current working directory is: /home/lubo/comsol
--- Running on 64 processes (cores) on the following nodes:
16 node5
16 node3
16 node2
16 node1
--- mpd BOOT
/home/lubo/comsol/COMSOL40/bin//comsol -nn 4 mpd boot -f /var/torque/aux//1724.mgmt -mpirsh ssh --verbose
running mpdallexit on node5
LAUNCHED mpd on node5 via
RUNNING: mpd on node5
LAUNCHED mpd on node3 via node5
LAUNCHED mpd on node2 via node5
LAUNCHED mpd on node1 via node5
RUNNING: mpd on node3
RUNNING: mpd on node2
RUNNING: mpd on node1
--- mpd TRACE
--- Parallel COMSOL RUN
/home/lubo/comsol/COMSOL40/bin//comsol -nn 4 -np 16 batch -inputfile test.mph -outputfile out_test.mph -batchlog test.log
--- mpd ALLEXIT

--- Job finished at: Wed May 9 11:06:31 CST 2012
------------------------------------------------------------------------------

license logs for step 2
.....
11:01:05 (lmgrd) LMCOMSOL using TCP-port 46447
11:03:00 (LMCOMSOL) TCP_NODELAY NOT enabled
11:03:00 (LMCOMSOL) OUT: "CLUSTERNODE" lubo@node5
11:03:00 (LMCOMSOL) OUT: "CLUSTERNODE" lubo@node2
11:03:00 (LMCOMSOL) OUT: "CLUSTERNODE" lubo@node3
11:03:00 (LMCOMSOL) OUT: "CLUSTERNODE" lubo@node1
11:03:08 (LMCOMSOL) OUT: "ACDC" lubo@node5
11:03:08 (LMCOMSOL) OUT: "COMSOL" lubo@node5
11:04:49 (LMCOMSOL) IN: "CLUSTERNODE" lubo@node5
11:04:49 (LMCOMSOL) IN: "ACDC" lubo@node5
11:04:49 (LMCOMSOL) IN: "COMSOL" lubo@node5
11:04:49 (LMCOMSOL) IN: "CLUSTERNODE" lubo@node1
11:04:49 (LMCOMSOL) IN: "CLUSTERNODE" lubo@node3
11:04:50 (LMCOMSOL) IN: "CLUSTERNODE" lubo@node2


comsol bach output for step 1

*******************************************
********COMSOL progress output file********
*******************************************
Wed May 09 10:58:25 CST 2012
---------- Current Progress: 100 %
Memory: 348/348 1048/1048
Current Progress: 0 %
Memory: 370/370 1065/1065
---------- Current Progress: 100 %
Memory: 387/387 1082/1082
Current Progress: 0 %
Memory: 527/527 1359/1359
Linear solver
Number of degrees of freedom solved for: 1484195
- Current Progress: 10 %
Memory: 526/527 1379/1379
- Current Progress: 13 %
Memory: 932/932 1885/1885
- Current Progress: 15 %
Memory: 1125/1125 2057/2057
-- Current Progress: 27 %
Memory: 1068/1125 2003/2057
--- Current Progress: 30 %
Memory: 876/1125 1812/2057
Symmetric matrices found.
--- Current Progress: 31 %
Memory: 1230/1230 2958/2958
--- Current Progress: 32 %
Memory: 1254/1254 2958/2958
--- Current Progress: 33 %
Memory: 1279/1279 2958/2958
--- Current Progress: 34 %
Memory: 1298/1298 2958/2958
--- Current Progress: 36 %
Memory: 1318/1318 2958/2958
--- Current Progress: 37 %
Memory: 1359/1359 2958/2958
--- Current Progress: 38 %
Memory: 1383/1383 2958/2958
--- Current Progress: 39 %
Memory: 1399/1399 2958/2958
---- Current Progress: 40 %
Memory: 1421/1421 2958/2958
---- Current Progress: 41 %
Memory: 1448/1448 2958/2958
---- Current Progress: 42 %
Memory: 1460/1460 2958/2958
---- Current Progress: 43 %
Memory: 1481/1481 2958/2958
---- Current Progress: 46 %
Memory: 1497/1497 2958/2958
---- Current Progress: 48 %
Memory: 1514/1514 2958/2958
----- Current Progress: 50 %
Memory: 1533/1533 2958/2958
----- Current Progress: 52 %
Memory: 1549/1549 2958/2958
----- Current Progress: 54 %
Memory: 1562/1562 2958/2958
----- Current Progress: 56 %
Memory: 1580/1580 2958/2958
----- Current Progress: 59 %
Memory: 1597/1597 2958/2958
------ Current Progress: 60 %
Memory: 1622/1622 2958/2958
------ Current Progress: 62 %
Memory: 1628/1628 2958/2958
------ Current Progress: 64 %
Memory: 1647/1647 2958/2958
------ Current Progress: 67 %
Memory: 1662/1662 2958/2958
------ Current Progress: 68 %
Memory: 1682/1682 2958/2958
------- Current Progress: 70 %
Memory: 1693/1693 2958/2958
------- Current Progress: 73 %
Memory: 1710/1710 2958/2958
------- Current Progress: 75 %
Memory: 1726/1726 2958/2958
------- Current Progress: 77 %
Memory: 1753/1753 2958/2958
------- Current Progress: 78 %
Memory: 1762/1762 2958/2958
-------- Current Progress: 81 %
Memory: 1777/1777 2958/2958
-------- Current Progress: 83 %
Memory: 1796/1796 2958/2958
-------- Current Progress: 85 %
Memory: 1811/1811 2958/2958
-------- Current Progress: 87 %
Memory: 1828/1828 2958/2958
--------- Current Progress: 90 %
Memory: 1844/1844 2958/2958
Iter Damping Stepsize #Res #Jac #Sol
---------- Current Progress: 100 %
Memory: 2161/2161 3284/3284
1 1.0000000 0.53 1 1 1
Total time: 91.721 s.


comsol batch output for step 2

*******************************************
********COMSOL progress output file********
*******************************************
Wed May 09 11:04:41 CST 2012
---------- Current Progress: 100 %
Memory: 361/361 1061/1061
Current Progress: 0 %
Memory: 383/383 1082/1082
---------- Current Progress: 100 %
Memory: 395/395 1091/1091
Current Progress: 0 %
Memory: 543/543 1375/1375
Linear solver
Number of degrees of freedom solved for: 1484195
- Current Progress: 10 %
Memory: 657/657 1592/1592
- Current Progress: 13 %
Memory: 1269/1269 2365/2365
-- Current Progress: 20 %
Memory: 1122/1269 2057/2365
-- Current Progress: 27 %
Memory: 1079/1269 2016/2365
--- Current Progress: 30 %
Memory: 904/1269 1842/2365
Symmetric matrices found.
--- Current Progress: 31 %
Memory: 1141/1269 2391/2391
--- Current Progress: 33 %
Memory: 1158/1269 2391/2391
--- Current Progress: 35 %
Memory: 1178/1269 2391/2391
--- Current Progress: 37 %
Memory: 1195/1269 2391/2391
--- Current Progress: 39 %
Memory: 1216/1269 2391/2391
---- Current Progress: 41 %
Memory: 1227/1269 2391/2391
---- Current Progress: 43 %
Memory: 1244/1269 2391/2391
---- Current Progress: 45 %
Memory: 1261/1269 2391/2391
---- Current Progress: 47 %
Memory: 1283/1283 2391/2391
---- Current Progress: 49 %
Memory: 1305/1305 2391/2391
----- Current Progress: 52 %
Memory: 1321/1321 2391/2391
----- Current Progress: 54 %
Memory: 1338/1338 2391/2391
----- Current Progress: 55 %
Memory: 1363/1363 2391/2391
Iter Damping Stepsize #Res #Jac #Sol
--------- Current Progress: 94 %
Memory: 1318/1363 2545/2545
1 1.0000000 0.53 1 1 1
---------- Current Progress: 100 %
Memory: 1853/1853 3262/3262
Node 1:
Linear solver
Number of degrees of freedom solved for: 1484195
Symmetric matrices found.
Iter Damping Stepsize #Res #Jac #Sol
1 1.0000000 0.53 1 1 1
Node 2:
Linear solver
Number of degrees of freedom solved for: 1484195
Symmetric matrices found.
Iter Damping Stepsize #Res #Jac #Sol
1 1.0000000 0.53 1 1 1
Node 3:
Linear solver
Number of degrees of freedom solved for: 1484195
Symmetric matrices found.
Iter Damping Stepsize #Res #Jac #Sol
1 1.0000000 0.53 1 1 1
Total time: 108.638 s.






5 Replies Last Post Feb 4, 2015, 10:29 a.m. EST
Josh Thomas Certified Consultant

Please login with a confirmed email address before reporting spam

Posted: 1 decade ago May 16, 2012, 4:44 p.m. EDT
Lu Bo,

I have been experiencing a similar difficulty in seeing any speed-up when running COMSOL on multiple nodes in a cluster.

I have tried your exact test.mph file that you uploaded.

Results:

1 node 12 ppn - Sol'n time: 27 sec
4 nodes 12 ppn - Sol'n time: 62 sec

My only thought from reading documentation and other threads is that the speed-up is very model dependent. If the problem does not spend a lot of time on highly parallel operations then the speed-up can be minimal (if not actually a slow-down as I am seeing!). But, I'm still working on this issue.

One error I believe in your script is in the following line:

/home/lubo/comsol/COMSOL40/bin//comsol -nn 4 -np 16 batch -inputfile test.mph -outputfile out_test.mph -batchlog test.log

Since you have 16 processors per node the -np flag I believe should be nn*ppn=64.

Hope this helps some. Let me know if you make any progress.

Best regards,
Josh Thomas
Lu Bo, I have been experiencing a similar difficulty in seeing any speed-up when running COMSOL on multiple nodes in a cluster. I have tried your exact test.mph file that you uploaded. Results: 1 node 12 ppn - Sol'n time: 27 sec 4 nodes 12 ppn - Sol'n time: 62 sec My only thought from reading documentation and other threads is that the speed-up is very model dependent. If the problem does not spend a lot of time on highly parallel operations then the speed-up can be minimal (if not actually a slow-down as I am seeing!). But, I'm still working on this issue. One error I believe in your script is in the following line: /home/lubo/comsol/COMSOL40/bin//comsol -nn 4 -np 16 batch -inputfile test.mph -outputfile out_test.mph -batchlog test.log Since you have 16 processors per node the -np flag I believe should be nn*ppn=64. Hope this helps some. Let me know if you make any progress. Best regards, Josh Thomas

Please login with a confirmed email address before reporting spam

Posted: 1 decade ago Oct 4, 2012, 1:13 p.m. EDT

unfortunately I have not read your post earlier: hope it's not too late!
The models that you have tested are provided to test proper operations of the cluster, not performance. For performance, you'd better try to reproduce the results of the following paper, based upon models also available in the model library, for the same number of DOFs: www.comsol.fr/papers/10248/
unfortunately I have not read your post earlier: hope it's not too late! The models that you have tested are provided to test proper operations of the cluster, not performance. For performance, you'd better try to reproduce the results of the following paper, based upon models also available in the model library, for the same number of DOFs: http://www.comsol.fr/papers/10248/

Please login with a confirmed email address before reporting spam

Posted: 10 years ago Feb 3, 2015, 5:44 p.m. EST
Hi

I noticed you are dealing with parallel computing speed up. Have you come up with any solution? I am running my model on cluster ( with 1,2,4 and 8 nodes, each node has 16 processor) but I don't get any speed up compared to the simulation time with my workstation. Is there any way to get speed up? I doubt the solver that I use ( PARDISO) is not the best option. My model uses laminar phase field method to simulate air injection to a medium initially filled with water. It is a dynamic process which reaches steady state condition after certain time. Thank you in advance


--
ALI MORADI
Hi I noticed you are dealing with parallel computing speed up. Have you come up with any solution? I am running my model on cluster ( with 1,2,4 and 8 nodes, each node has 16 processor) but I don't get any speed up compared to the simulation time with my workstation. Is there any way to get speed up? I doubt the solver that I use ( PARDISO) is not the best option. My model uses laminar phase field method to simulate air injection to a medium initially filled with water. It is a dynamic process which reaches steady state condition after certain time. Thank you in advance -- ALI MORADI

Josh Thomas Certified Consultant

Please login with a confirmed email address before reporting spam

Posted: 10 years ago Feb 4, 2015, 8:58 a.m. EST
Ali-

A couple things I would check if you haven't:

1) Confirm in the log that the application is actually running on multiple nodes.
2) Confirm that the Cluster settings are appropriate in the COMSOL Desktop application or, if you are running through the command line, that your command line call is requesting the desired number of nodes and processors.
3) PARDISO does not run with distributing memory across nodes (not cluster-capable). MUMPS and SPOOLES direct solvers work on clusters.

--
Best regards,
Josh Thomas
AltaSim Technologies
Ali- A couple things I would check if you haven't: 1) Confirm in the log that the application is actually running on multiple nodes. 2) Confirm that the Cluster settings are appropriate in the COMSOL Desktop application or, if you are running through the command line, that your command line call is requesting the desired number of nodes and processors. 3) PARDISO does not run with distributing memory across nodes (not cluster-capable). MUMPS and SPOOLES direct solvers work on clusters. -- Best regards, Josh Thomas AltaSim Technologies

Walter Frei COMSOL Employee

Please login with a confirmed email address before reporting spam

Posted: 10 years ago Feb 4, 2015, 10:29 a.m. EST
Dear Ali,

If you are solving a transient model, then your problem is almost entirely serial, and you will not see much speedup as you distribute across more nodes of a cluster.
For a gentle introduction to these concepts, please see: www.comsol.com/blogs/understanding-parallel-computing/

Best,
Walter
Dear Ali, If you are solving a transient model, then your problem is almost entirely serial, and you will not see much speedup as you distribute across more nodes of a cluster. For a gentle introduction to these concepts, please see: http://www.comsol.com/blogs/understanding-parallel-computing/ Best, Walter

Note that while COMSOL employees may participate in the discussion forum, COMSOL® software users who are on-subscription should submit their questions via the Support Center for a more comprehensive response from the Technical Support team.