Author Topic: Alpha 15 comments (Read 12121 times)

gbrak30 · « **on:** June 22, 2015, 05:40:58 PM »

Does the physics engine run only on CPU? I think i saw in an earlier post that for now it will use only CPU then later on it will combine cpu/gpu. Just wanted to confirm. I just got a GTX 980 and I've used it with Alpha 12.1 and it was great! Also, I am really missing the gravity effect in the grid sims, like in 12.1 the fragment grids will attract each other when gravity is turned on. It no longer does this in 15. Other than that, excellent job and keep up the good hard work!

Angel Armageddon · « **Reply #1 on:** June 22, 2015, 08:00:33 PM »

I think that it use to run on gpu and cpu, but know it runs on cup.
But I don't know for sure.

Greenleaf · « **Reply #2 on:** June 22, 2015, 10:53:14 PM »

Quote from: gbrak30 on June 22, 2015, 05:40:58 PM

Does the physics engine run only on CPU? I think i saw in an earlier post that for now it will use only CPU then later on it will combine cpu/gpu. Just wanted to confirm. I just got a GTX 980 and I've used it with Alpha 12.1 and it was great! Also, I am really missing the gravity effect in the grid sims, like in 12.1 the fragment grids will attract each other when gravity is turned on. It no longer does this in 15. Other than that, excellent job and keep up the good hard work!

Yes, right now nbody gravity and collision runs entirely on cpu in managed mode, which is the slowest option. That was to focus the rewrite on the most common platform first. The plan is to next add native mode, which in test is 2-4 times faster and then gpu mode, which this time will initially only be using c++AMP.

tepidbread · « **Reply #3 on:** June 23, 2015, 09:32:05 AM »

I have been thinking about getting an Amd gpu in the near future. Would I see greater performance from an Amd as opposed to an Nvidia gpu? (I have heard that Amd gpus have a massive compute advantage)

Greenleaf · « **Reply #4 on:** June 23, 2015, 02:36:34 PM »

Quote from: tepidbread on June 23, 2015, 09:32:05 AM

I have been thinking about getting an Amd gpu in the near future. Would I see greater performance from an Amd as opposed to an Nvidia gpu? (I have heard that Amd gpus have a massive compute advantage)

Well.. what will matter the most, initially, is the double computation speed vs floats.
Last I looked, nvidia were generally very fast for floats, but much slower for doubles, where some amd gpus were only slightly slower for double. This is relevant, since we currently compute using doubles.
That is something which is on my todo list to optimize later, though.

Angel Armageddon · « **Reply #5 on:** June 23, 2015, 08:35:36 PM »

Quote from: Greenleaf on June 23, 2015, 02:36:34 PM

Quote from: tepidbread on June 23, 2015, 09:32:05 AM
I have been thinking about getting an Amd gpu in the near future. Would I see greater performance from an Amd as opposed to an Nvidia gpu? (I have heard that Amd gpus have a massive compute advantage)

Well.. what will matter the most, initially, is the double computation speed vs floats.
Last I looked, nvidia were generally very fast for floats, but much slower for doubles, where some amd gpus were only slightly slower for double. This is relevant, since we currently compute using doubles.
That is something which is on my todo list to optimize later, though.

So how optimized can the stimulator possibly be?

Angel Armageddon · « **Reply #6 on:** June 23, 2015, 08:38:08 PM »

Quote from: Angel Armageddon on June 23, 2015, 08:35:36 PM

Quote from: Greenleaf on June 23, 2015, 02:36:34 PM
Quote from: tepidbread on June 23, 2015, 09:32:05 AM
I have been thinking about getting an Amd gpu in the near future. Would I see greater performance from an Amd as opposed to an Nvidia gpu? (I have heard that Amd gpus have a massive compute advantage)

Well.. what will matter the most, initially, is the double computation speed vs floats.
Last I looked, nvidia were generally very fast for floats, but much slower for doubles, where some amd gpus were only slightly slower for double. This is relevant, since we currently compute using doubles.
That is something which is on my todo list to optimize later, though.
So how optimized can the simulator possibly be?
Quote from: Angel Armageddon on June 23, 2015, 08:35:36 PM
Quote from: Greenleaf on June 23, 2015, 02:36:34 PM
Quote from: tepidbread on June 23, 2015, 09:32:05 AM
I have been thinking about getting an Amd gpu in the near future. Would I see greater performance from an Amd as opposed to an Nvidia gpu? (I have heard that Amd gpus have a massive compute advantage)

Well.. what will matter the most, initially, is the double computation speed vs floats.
Last I looked, nvidia were generally very fast for floats, but much slower for doubles, where some amd gpus were only slightly slower for double. This is relevant, since we currently compute using doubles.
That is something which is on my todo list to optimize later, though.
So how optimized can the stimulator possibly be?
I meant simulator
😳

Greenleaf · « **Reply #7 on:** June 24, 2015, 01:12:54 AM »

Quote from: Angel Armageddon on June 23, 2015, 08:35:36 PM

So how optimized can the stimulator possibly be?

Very ;-)
For one, we need to put computations back on gpu, after this rewrite. Then we should get tree based gravity up and running again, then we could consider changing from double representation to fixed point math which would help on platforms which are slow at double.
Yes, there are a lot of speed improvements still on the list.

Angel Armageddon · « **Reply #8 on:** June 24, 2015, 06:59:41 AM »

So how long could it take to improve US2 to the fullest possible extent?
Or close to it?

Greenleaf · « **Reply #9 on:** June 24, 2015, 07:51:12 AM »

Quote from: Angel Armageddon on June 24, 2015, 06:59:41 AM

So how long could it take to improve US2 to the fullest possible extent?
Or close to it?

That is a meaningless question, really. How long will it take to improve an engine to the fullest extent? You can always tune something else or even change out parts, as we have done.
As long as better performance is worth anything for the user, that is how long we will keep improving it.
I just listed a few of the most obvious things.

gbrak30 · « **Reply #10 on:** June 24, 2015, 11:59:31 AM »

"double representation" as in double-precision? I understand that the original geforce titan can do double, and some higher end Tesla cards. AMD can do double i think. So in theory, would US2 run really well in a supercomputer environment?

Retsof · « **Reply #11 on:** June 24, 2015, 04:25:40 PM »

Quote from: Greenleaf on June 24, 2015, 01:12:54 AM

tree based gravity

What does this mean? Is it basically "The moon is primarily effected by Earth's gravity, and the Sun's effect is negligible, so the moon ignores the sun"?

Greenleaf · « **Reply #12 on:** June 24, 2015, 11:47:47 PM »

Quote from: gbrak30 on June 24, 2015, 11:59:31 AM

"double representation" as in double-precision? I understand that the original geforce titan can do double, and some higher end Tesla cards. AMD can do double i think. So in theory, would US2 run really well in a supercomputer environment?

I do not know of any modern cards which cannot handle double, but it is at much lower speeds than "single". Commonly 32 times slower.

The computations done in USÂ² would do well on a super computer, when changed to match the architecture, yes. It is a highly parallel computation, and nbody computation is commonly done on super computers for that reason.

Greenleaf · « **Reply #13 on:** June 25, 2015, 12:07:40 AM »

Quote from: Retsof on June 24, 2015, 04:25:40 PM

Quote from: Greenleaf on June 24, 2015, 01:12:54 AM
tree based gravity
What does this mean? Is it basically "The moon is primarily effected by Earth's gravity, and the Sun's effect is negligible, so the moon ignores the sun"?

Actually, the moon is more attracted by Sun than by Earth.
Sun accelerates the moon with about 0.0059 m/sÂ²
Earth accelerates the moon with about 0.0027 m/sÂ²
... but the Sun attraction on Earth and Moon is similar, so they fall around Sun together, as a system.

By "tree based gravity" I mean Barnes-Hut https://en.wikipedia.org/wiki/Barnes%E2%80%93Hut_simulation
which essentially adds gravitational contribution together in larger cells, so you may replace an attraction calculation from multiple bodies with a single combined attraction.
This was implemented a while back, but went out when the common mode became opencl.

I actually show this tree in an old video of mine. This is for SPH, but the gravity is the same.
https://www.youtube.com/watch?v=OXPDHP2rgnI

tepidbread · « **Reply #14 on:** June 25, 2015, 09:44:52 AM »

Quote from: Greenleaf on June 24, 2015, 11:47:47 PM

Quote from: gbrak30 on June 24, 2015, 11:59:31 AM
"double representation" as in double-precision? I understand that the original geforce titan can do double, and some higher end Tesla cards. AMD can do double i think. So in theory, would US2 run really well in a supercomputer environment?

I do not know of any modern cards which cannot handle double, but it is at much lower speeds than "single". Commonly 32 times slower.

The computations done in USÂ² would do well on a super computer, when changed to match the architecture, yes. It is a highly parallel computation, and nbody computation is commonly done on super computers for that reason.

Will it be possible to use more than one compute device at once in the future? (more than one gpu) I think I have heard of programs in the past that can use more than one compute device. (bitcoin mining) I apologize if this seems like a stupid question. I have never really looked into programming. (especially opencl)

Greenleaf · « **Reply #15 on:** June 26, 2015, 07:08:38 AM »

Quote from: tepidbread on June 25, 2015, 09:44:52 AM

Will it be possible to use more than one compute device at once in the future? (more than one gpu) I think I have heard of programs in the past that can use more than one compute device. (bitcoin mining)

That is the plan. It was actually briefly supported previously, but the benefit was not worth the trouble.
It is the hope that this physics rewrite will make it more performant again, so you can partition calculations out to all cpu's and all gpu's and essentially run computations on anything which can compute.

gbrak30 · « **Reply #16 on:** June 26, 2015, 10:09:20 AM »

Great! Looks like I may build an underground supercomputer lab

This is great news though. Especially the multiple gpu part.

Angel Armageddon · « **Reply #17 on:** June 26, 2015, 11:20:41 AM »

Quote from: Greenleaf on June 26, 2015, 07:08:38 AM

Quote from: tepidbread on June 25, 2015, 09:44:52 AM
Will it be possible to use more than one compute device at once in the future? (more than one gpu) I think I have heard of programs in the past that can use more than one compute device. (bitcoin mining)

That is the plan. It was actually briefly supported previously, but the benefit was not worth the trouble.
It is the hope that this physics rewrite will make it more performant again, so you can partition calculations out to all cpu's and all gpu's and essentially run computations on anything which can compute.

So does that mean that I could have a very crap pc, but US2 will work just fine in the future?

Greenleaf · « **Reply #18 on:** June 26, 2015, 11:24:45 AM »

Quote from: Angel Armageddon on June 26, 2015, 11:20:41 AM

So does that mean that I could have a very crap pc, but US2 will work just fine in the future?

No, it means that everything else being constant, computing performance should go up... but there may be other changes requiring more computer power, so... only thing you can know for sure is that we try to make it as feature full and fast and responsive as we possibly can.

Angel Armageddon · « **Reply #19 on:** June 26, 2015, 09:26:03 PM »

Quote from: Greenleaf on June 26, 2015, 11:24:45 AM

Quote from: Angel Armageddon on June 26, 2015, 11:20:41 AM
So does that mean that I could have a very crap pc, but US2 will work just fine in the future?

No, it means that everything else being constant, computing performance should go up... but there may be other changes requiring more computer power, so... only thing you can know for sure is that we try to make it as feature full and fast and responsive as we possibly can.

Oh ok.

tepidbread · « **Reply #20 on:** June 27, 2015, 06:19:11 PM »

Quote from: gbrak30 on June 26, 2015, 10:09:20 AM

Great! Looks like I may build an underground supercomputer lab
This is great news though. Especially the multiple gpu part.

So... I will make it my goal to populate all my pcie lanes with old graphics cards. My psu can handle it. Will I see diminishing returns as I do so? Will pcie bandwidth cap? Will two extra compute devices actually be worth the trouble? By the way. Thank you Greenleaf for taking the time to answer our questions. It means a great deal to me.

Greenleaf · « **Reply #21 on:** June 28, 2015, 02:28:01 AM »

Quote from: tepidbread on June 27, 2015, 06:19:11 PM

So... I will make it my goal to populate all my pcie lanes with old graphics cards.
Will I see diminishing returns as I do so?

To be clear... if you _already_ have the hardware (multiple gpu's and cpu), I have a todo to make the code so it uses it best, and the hope is that it will be faster than running on only a single of those devices. How much faster, that depends on the combination of devices. It will, however, always be better to run on one fast device than on two slow ones, even though "the slow ones" together have the same nominal performance as "the fast one". There is overhead in partitioning the job and gathering the bits afterwards.

Angel Armageddon · « **Reply #22 on:** June 28, 2015, 01:14:16 PM »

"To be clear... if you _already_ have the hardware (multiple gpu's and cpu), I have a todo to make the code so it uses it best, and the hope is that it will be faster than running on only a single of those devices. How much faster, that depends on the combination of devices. It will, however, always be better to run on one fast device than on two slow ones, even though "the slow ones" together have the same nominal performance as "the fast one". There is overhead in partitioning the job and gathering the bits afterwards."
[/quote]

Yeah, Greenleaf.
Here's what I was talking about on that other forum.
http://m.youtube.com/watch?v=r1_dWqG13oA

gbrak30 · « **Reply #23 on:** June 30, 2015, 01:33:26 PM »

By the way, ran the sphere of 1000 moons of two different PCs, running 15.1. One PC has an Intel i7 2600k @4.4Ghz and the other has an AMD FX9370 Black @4.4Ghz as well. I gotta say, it ran much smoother on the AMD cpu than on Intel. My next step is testing older versions that used GPU with a Radeon R290x. I read some articles, it seems like the newer AMD GPUs have better compute performance than Nvidia (for the price range). Ill get back with more details, possibly in a new thread.

Greenleaf · « **Reply #24 on:** June 30, 2015, 02:58:06 PM »

Quote from: gbrak30 on June 30, 2015, 01:33:26 PM

By the way, ran the sphere of 1000 moons of two different PCs, running 15.1. One PC has an Intel i7 2600k @4.4Ghz and the other has an AMD FX9370 Black @4.4Ghz as well. I gotta say, it ran much smoother on the AMD cpu than on Intel. My next step is testing older versions that used GPU with a Radeon R290x. I read some articles, it seems like the newer AMD GPUs have better compute performance than Nvidia (for the price range). Ill get back with more details, possibly in a new thread.

From what I could quickly look up, the amd cpu has 8 actual cores where the intel one has 4. It supports hyperthreading, but that is really still only 4 cores doing the work.