Discussion Locked This discussion was locked by a forum moderator.
Why is COMSOL Slower on a Faster Computer? Part 2 of CFD-Mixer Benchmarks
Posted Oct 29, 2016, 8:03 p.m. EDT 5 Replies
Please login with a confirmed email address before reporting spam
Bottom line: Don’t buy a computer with dual E5-2697v4 processors for COMSOL-CFD !!
On Sept 15 2016 we reported run times for large COMSOL-CFD problems on five different computers (see Transonic-flow validations and benchmarks). We've just evaluated a sixth, which was supposed to be faster, but was MUCH slower !
Computer A, laptop, with an i7-6820HQ and 64 GB DDR4 RAM;
(4 cores, 3.5 GHz turbo, 34 GB/s 8MB cache); SS HD.
Computer B, with two E5-2640 v3’s and 128 GB DDR4 RAM;
(each: 8 cores, 3.4 GHz turbo, 59 GB/s, 20MB cache); C612; 6GB/s SSD.
Computer C, with two E5-2687 v3’s and 256 GB DDR4 RAM;
(each: 10 cores, 3.5 GHz turbo, 68 GB/s, 25MB cache); C612; 7200 rpm 1TB HD.
Computer D, with an i7-3930K and Windows 7; 7200 rpm HD;
(6 cores, 3.2 GHz turbo, 51 GB/s, 12MB cache, 64 GB DDR3 RAM).
Computer E, with two E5-2687 v3’s and 256 GB DDR4 RAM;
(each: 10 cores, 3.5 GHz turbo, 68 GB/s, 25MB cache); C612;
32 GB/s 800GB SSD (an Intel 750 NVMe)
Our primary test problem had ~800k mesh cells about 1000 boundaries (with boundary-layer meshes on most of them), and the meshed file size without solutions is ~100MB. (This model included ~20 transonic nozzles.) In most of the comparisons, we were running two instances of these simulations simultaneously. In all of these cases, we used the PARDISO solver, as it had proven best in some earlier CFD problems.
For the large non-rotating-machinery problems using the L_VEL turbulence model:
Computer C was ~20% faster than computer B,
Computer B was ~80% faster than Computer A, and
Computer A was ~20% faster than Computer D.
Computer E differed from Computer C only in that it used a SSD rather than a conventional HD. The SSD made large Mixer problems (>1M mesh cells, k-omega, PARDISO) ran ~35% faster, though it made little difference in smaller L-VEL non-rotating cases.
From these (and other) tests, we concluded there would is little benefit from having more than 6 cores in each processor, no matter how many instances of large COMSOL problems are running and that memory bandwidth was the best predictor of computer speed for large COMSOL problems. (Earlier tests of a high-end quad-socket board found it to be no faster than Computer A !)
From the above, we surmised that the best current choice for demanding COMSOL problems would be a dual-socket machine with two E5-2697v4’s.
So we ordered new processors and memory, and put together a new computer:
Computer F, with two E5-2697v4’s and 512 GB DDR4/2400;
(each: 16 cores, 3.6 GHz turbo, 77 GB/s, 40MB cache); C612; SSD.
Our CFD-Mixer model had grown since the earlier tests. It was now ~1.1M mesh cells, with a run time of ~4.5 hours on Computer E under COMSOL-CFD-Mixer 5.2a, update 1.
The first successful run on Computer F took an astounding 16 hours !
We had just spent a lot of money trying to get something ~30% faster, not 4 times slower !
Apparently COMSOL had detected 64 cores, rather than the actual 32. Correcting that helped ~30%.
We went to a much smaller model, so we could get results from changes in the hardware and BIOS settings in ~15 minutes rather than 10 hours.
We then spent the better part of a week trying one change after another, in components and BIOS settings, trying to figure out why the new computer, which was ~20% faster according to the standard benchmarks, was so much slower on COMSOL.
We found disabling hyperthreading increased the speed for a single instance by only ~2%; but with hyperthreading enabled, Task Manager was showing only a few percent CPU utilization during about a quarter of the run time, whereas with hyperthreading disabled, Task Manager seldom showed less than 50% CPU utilization. So we suspect disabling hyperthreading would substantially slow run time if several COMSOL instances were running simultaneously (didn’t actually check that).
The various other changes in the BIOS had very little effect.
We found there was no difference in speed between the old 1866 memory and the new 2400 memory.
We found that the new motherboard in Computer F (Supermicro X10DAi, chip set C612), was slower than the others (but only for COMSOL) by 20-40%, depending on various changes in memory, microprocessors, and program. (The motherboards were supposed to all be the same, but need to check more carefully!)
We found that for best case with the new expensive 2697v4 processors on the old (faster) motherboard, it took COMSOL-CFD-Mixer 50% to 330% longer than with the old cheap 2640v3’s and the slower memory on that same motherboard for a single instance of a rather small mixer problem (50K mesh cells, run time ~4.55 minutes on Computer B, 10-15 minutes on Computer F).
We again tried another big Mixer k-omega problem, one that ran in ~4.5 hrs on Computer E. Again, we found Computer F to be slower about a factor of 4 !
We understand the PARDISO solver is an Intel product that COMSOL licenses. So we tried the MUMPS solver again on Computer B on the above small Mixer problem. Their speeds were identical. On computer F, MUMPS was more than 30% faster than PARDISO.
We set up 8 simultaneous instances of the above small Mixer problem (using PARDISO) on Computer B, and launched them all essentially simultaneously. They all finished in ~25.3 minutes.
We set up 8 simultaneous instances of the above small Mixer problem (using MUMPS) on Computer F, and launched them all essentially simultaneously (multithreading was enabled). The first one finished in about 16 minutes. The last one, in about 19 minutes. So computer F (even with its slower motherboard) is faster than Computer B for running 8 COMSOL instances simultaneously.
We then looked as some other large COMSOL problems on Computer F. A large RF problem (~1M mesh cells) ran about only ~20% slower than on Computer E, and perhaps that could be explained by the slow motherboard. However, a large L-VEL CFD problem without rotating machinery (not mixer) was just as bad as the large Mixer problem – under 2% CPU utilization most of the time.
So, until COMSOL figures out how to run on dual E5-2697v4's, if you are doing CFD:
Don’t waste your money on 2697’s;
Don’t waste your money on fast memory;
Don’t try a quad-socket computer;
MUMPS may run faster than PARDISO;
Expect to occasionally get a slow motherboard, that will simply have to be scraped.
The fastest COMSOL computer we have tested is Computer E. About a third of the time during the iterative portion of its run (for a single instance), Task Manager was showing under 10% CPU utilization; about a third of the time, Task Manager was showing 15-30% CPU utilization; and about a third of the time, Task Manager was showing 50% CPU utilization.
Again, for emphasis, the standard publicly available benchmarks showed Computer F to be about 20% faster than Computer E, as we expected. COMSOL found Computer F to be slower by a factor of 1.5 to 4.
We also just put a computer together using the i7-6950X, which we will be evaluating soon. The biggest problem with it is its 128GB RAM limitation, but we’re hopeful that it will be the fastest option available for most of our COMSOL work.
We plan to order several more motherboards for dual E5-2697v4’s and see if we can find one that COMSOL CFD likes. If anyone out there has E5-2697v4’s that are running COMSOL-CFD faster than low-end processors, we’d love to learn more details about your computer! We really need more speed on large CFD-Mixer problems!
Cheers !
F. David Doty, PhD
Doty Scientific Inc
Columbia SC
On Sept 15 2016 we reported run times for large COMSOL-CFD problems on five different computers (see Transonic-flow validations and benchmarks). We've just evaluated a sixth, which was supposed to be faster, but was MUCH slower !
Computer A, laptop, with an i7-6820HQ and 64 GB DDR4 RAM;
(4 cores, 3.5 GHz turbo, 34 GB/s 8MB cache); SS HD.
Computer B, with two E5-2640 v3’s and 128 GB DDR4 RAM;
(each: 8 cores, 3.4 GHz turbo, 59 GB/s, 20MB cache); C612; 6GB/s SSD.
Computer C, with two E5-2687 v3’s and 256 GB DDR4 RAM;
(each: 10 cores, 3.5 GHz turbo, 68 GB/s, 25MB cache); C612; 7200 rpm 1TB HD.
Computer D, with an i7-3930K and Windows 7; 7200 rpm HD;
(6 cores, 3.2 GHz turbo, 51 GB/s, 12MB cache, 64 GB DDR3 RAM).
Computer E, with two E5-2687 v3’s and 256 GB DDR4 RAM;
(each: 10 cores, 3.5 GHz turbo, 68 GB/s, 25MB cache); C612;
32 GB/s 800GB SSD (an Intel 750 NVMe)
Our primary test problem had ~800k mesh cells about 1000 boundaries (with boundary-layer meshes on most of them), and the meshed file size without solutions is ~100MB. (This model included ~20 transonic nozzles.) In most of the comparisons, we were running two instances of these simulations simultaneously. In all of these cases, we used the PARDISO solver, as it had proven best in some earlier CFD problems.
For the large non-rotating-machinery problems using the L_VEL turbulence model:
Computer C was ~20% faster than computer B,
Computer B was ~80% faster than Computer A, and
Computer A was ~20% faster than Computer D.
Computer E differed from Computer C only in that it used a SSD rather than a conventional HD. The SSD made large Mixer problems (>1M mesh cells, k-omega, PARDISO) ran ~35% faster, though it made little difference in smaller L-VEL non-rotating cases.
From these (and other) tests, we concluded there would is little benefit from having more than 6 cores in each processor, no matter how many instances of large COMSOL problems are running and that memory bandwidth was the best predictor of computer speed for large COMSOL problems. (Earlier tests of a high-end quad-socket board found it to be no faster than Computer A !)
From the above, we surmised that the best current choice for demanding COMSOL problems would be a dual-socket machine with two E5-2697v4’s.
So we ordered new processors and memory, and put together a new computer:
Computer F, with two E5-2697v4’s and 512 GB DDR4/2400;
(each: 16 cores, 3.6 GHz turbo, 77 GB/s, 40MB cache); C612; SSD.
Our CFD-Mixer model had grown since the earlier tests. It was now ~1.1M mesh cells, with a run time of ~4.5 hours on Computer E under COMSOL-CFD-Mixer 5.2a, update 1.
The first successful run on Computer F took an astounding 16 hours !
We had just spent a lot of money trying to get something ~30% faster, not 4 times slower !
Apparently COMSOL had detected 64 cores, rather than the actual 32. Correcting that helped ~30%.
We went to a much smaller model, so we could get results from changes in the hardware and BIOS settings in ~15 minutes rather than 10 hours.
We then spent the better part of a week trying one change after another, in components and BIOS settings, trying to figure out why the new computer, which was ~20% faster according to the standard benchmarks, was so much slower on COMSOL.
We found disabling hyperthreading increased the speed for a single instance by only ~2%; but with hyperthreading enabled, Task Manager was showing only a few percent CPU utilization during about a quarter of the run time, whereas with hyperthreading disabled, Task Manager seldom showed less than 50% CPU utilization. So we suspect disabling hyperthreading would substantially slow run time if several COMSOL instances were running simultaneously (didn’t actually check that).
The various other changes in the BIOS had very little effect.
We found there was no difference in speed between the old 1866 memory and the new 2400 memory.
We found that the new motherboard in Computer F (Supermicro X10DAi, chip set C612), was slower than the others (but only for COMSOL) by 20-40%, depending on various changes in memory, microprocessors, and program. (The motherboards were supposed to all be the same, but need to check more carefully!)
We found that for best case with the new expensive 2697v4 processors on the old (faster) motherboard, it took COMSOL-CFD-Mixer 50% to 330% longer than with the old cheap 2640v3’s and the slower memory on that same motherboard for a single instance of a rather small mixer problem (50K mesh cells, run time ~4.55 minutes on Computer B, 10-15 minutes on Computer F).
We again tried another big Mixer k-omega problem, one that ran in ~4.5 hrs on Computer E. Again, we found Computer F to be slower about a factor of 4 !
We understand the PARDISO solver is an Intel product that COMSOL licenses. So we tried the MUMPS solver again on Computer B on the above small Mixer problem. Their speeds were identical. On computer F, MUMPS was more than 30% faster than PARDISO.
We set up 8 simultaneous instances of the above small Mixer problem (using PARDISO) on Computer B, and launched them all essentially simultaneously. They all finished in ~25.3 minutes.
We set up 8 simultaneous instances of the above small Mixer problem (using MUMPS) on Computer F, and launched them all essentially simultaneously (multithreading was enabled). The first one finished in about 16 minutes. The last one, in about 19 minutes. So computer F (even with its slower motherboard) is faster than Computer B for running 8 COMSOL instances simultaneously.
We then looked as some other large COMSOL problems on Computer F. A large RF problem (~1M mesh cells) ran about only ~20% slower than on Computer E, and perhaps that could be explained by the slow motherboard. However, a large L-VEL CFD problem without rotating machinery (not mixer) was just as bad as the large Mixer problem – under 2% CPU utilization most of the time.
So, until COMSOL figures out how to run on dual E5-2697v4's, if you are doing CFD:
Don’t waste your money on 2697’s;
Don’t waste your money on fast memory;
Don’t try a quad-socket computer;
MUMPS may run faster than PARDISO;
Expect to occasionally get a slow motherboard, that will simply have to be scraped.
The fastest COMSOL computer we have tested is Computer E. About a third of the time during the iterative portion of its run (for a single instance), Task Manager was showing under 10% CPU utilization; about a third of the time, Task Manager was showing 15-30% CPU utilization; and about a third of the time, Task Manager was showing 50% CPU utilization.
Again, for emphasis, the standard publicly available benchmarks showed Computer F to be about 20% faster than Computer E, as we expected. COMSOL found Computer F to be slower by a factor of 1.5 to 4.
We also just put a computer together using the i7-6950X, which we will be evaluating soon. The biggest problem with it is its 128GB RAM limitation, but we’re hopeful that it will be the fastest option available for most of our COMSOL work.
We plan to order several more motherboards for dual E5-2697v4’s and see if we can find one that COMSOL CFD likes. If anyone out there has E5-2697v4’s that are running COMSOL-CFD faster than low-end processors, we’d love to learn more details about your computer! We really need more speed on large CFD-Mixer problems!
Cheers !
F. David Doty, PhD
Doty Scientific Inc
Columbia SC
5 Replies Last Post Nov 24, 2016, 8:16 a.m. EST