computeNormals optimization

jerome · April 29, 2015

Hi,

After having monitored in many browsers the CPU profiler, I noticed, when morphing meshes, that the computeNormals() method called in each frame was the bottleneck. There was a previous topic where this performance issue was talked about with JCPalmer.

No real improvement were found.

So I decided to hack this method to check if something could be done.

Remember this method was designed long time ago before dynamic morphing was added to BJS, so it wasn't designed to be used in a render loop 60 times per second.

<tl;dr;>

current computeNormals() :

http://www.babylonjs-playground.com/#ZOSGB

local computeNormals() :

http://www.babylonjs-playground.com/#ZOSGB#1

If this mesh is too heavy (48 000 vertices !) for your computer, please adjust your mesh size line 147 :

 var nbPaths = 80;

</tl;td;>

What did I find ?

I found many inexpensive changes could be done.

For instance, the current computeNormals() method does :

- 3 passes : the first for positions.length times, the second for faces number times and the last vertices number * faces of vertices number times

- allocate 3 intermediate arrays : one will be filled with number of vertices new Vector3 objects, another one will be filled number of vertices arrays (an array of arrays) containing faces indexes.

This makes the code very readable (if I can understand it at first sight not being an expert, I consider it very readable) but may be improved in term of passes and created objects.

This is quite important because I noticed that Chromium, which has for now better general perfs in js execution and rendering than FF, starts to consume more and more in GC after a while on a morphing running for minutes... up to 70% CPU on some examples on long duration. The framerate finally decreases...

Since FF keeps a constant lower framerate and a constant GC usage.

Well, for now, I reduced the method down to 2 passes only (the first for nb of faces times and the second for nb of vertices times) and the memory allocation down to 6 Vector3 objects and one array of Vector3 sized nb of vertices length.

The 6 intermediate Vector3 could be not used if I would re-implement vector3 methods locally (add, subtract, cross product, normalize) but I don't think it's worth it in term of GC gain.

I would like to eliminate the second pass and the intermediate array whose use is only to normalize each normal, but I have no idea for now. The CPU profiler doesn't show this pass has a noticeable impact, so it may not be worth it either.

That said... how much is this improvement ?

Well, it depends on the mesh size and on your computer/browser capabilities.

So please in this very heavy example (done in purpose to stress the method), change the mesh size (nb of vertices) line 147 and you'll probably find a value where the improvement is really really noticeable.

On my own computer, with Chromium, I can have 60 fps (and no GC at all) for minutes with the new function since the legacy one will decrease the framerate downto 28 fps after minutes and the GC will increase up to 70% CPU.

You can also compare both method durations by opening your browser console :

new method : http://www.babylonjs-playground.com/#ZOSGB#2

legacy : http://www.babylonjs-playground.com/#ZOSGB#3

On my computer, with local examples (outside PG and PG editor running scripts), I get between x3 and x5 speed increase !

So please make your own tests and let me know what you think about this improvement before I dare a PR

RaananW · April 29, 2015

Without looking into the implementation - the new variant is 30-40% slower on my computer.

jerome · April 29, 2015

mmhh.. which browser ?

I noticed the gain was quite null, even sometimes negative, in the PG (maybe the PG editor has side effects ?).

I will publish a direct link.

jerome · April 29, 2015

I have a lead to get rid the intermediate Vector3 array also.

RaananW · April 29, 2015

Both chrome and IE,almost same 30% reduction.

jahow · April 29, 2015

Hi Jerome, in chrome v. 42 (windows 7) the new version gives 6 FPS, versus 12 FPS for the legacy one.

It's weird since it looks like you've optimized the method quite well...

I tried copy-pasting the original function into your custom one and then the FPS drops to around 4: http://www.babylonjs-playground.com/#ZOSGB#4

So I'd say your method is definitely an improvement, but there is something weird in how the PG behaves!

jerome · April 29, 2015

OK,

Please use Chrome (I don't have IE) so we can compare for now (I hope IE will react the same then).

Please open the Developer Tools.

In the lower panel, click on the Rendering tab, then opt in for "Show FPS meter".

In the upper tab, opt in for "Collect Javascript CPU Profile". Don't start it for now please.

Now, you can just check the FPS with this example (I tuned for the results are very sensible on my machine = 28800 vertices) :

legacy = http://jerome.bousquie.fr/BJS/computeNormals/legacy.html

optimized = http://jerome.bousquie.fr/BJS/computeNormals/optimized.html

My results here are :

constant 28 FPS with legacy ...then decreasing

constant 60 FPS with optimized

If now you go back into the Developer Tools and you start the CPU profiler for, say, 3 minutes.

You'll see the GC will remain at very low level, quite null, with optimized but will increase drastically with legacy version.

Dad72 · April 29, 2015

For me too, I get 3fps with Chrome and the link of Jahow

edite: => http://jerome.bousqu.../optimized.html is ok: 60 FPS

jerome · April 29, 2015

Please test the direct links and let me know

I will add a way to tune the mesh size so you will find a good value to see the difference

For those wanting to to read the code, here's the last commented version :

Two passes only (nb of faces times then nb of normal vectors times) and no more intermediate array.

  var localComputeNormals = function(positions, indices, normals) {    var index = 0;    // temp Vector3    var p1 = BABYLON.Vector3.Zero();    var p2 = BABYLON.Vector3.Zero();    var p3 = BABYLON.Vector3.Zero();    var p1p2 = BABYLON.Vector3.Zero();    var p3p2 = BABYLON.Vector3.Zero();    var faceNormal = BABYLON.Vector3.Zero();    var vertexNormali1 = BABYLON.Vector3.Zero();;    var vertexNormali2 = BABYLON.Vector3.Zero();;    var vertexNormali3 = BABYLON.Vector3.Zero();;    // indice triplets = 1 face    var nbFaces = indices.length / 3;    for (index = 0; index < nbFaces; index++) {      var i1 = indices[index * 3];      var i2 = indices[index * 3 + 1];      var i3 = indices[index * 3 + 2];      // setting the temp V3      BABYLON.Vector3.FromFloatsToRef(positions[i1 * 3], positions[i1 * 3 + 1], positions[i1 * 3 + 2] , p1);      BABYLON.Vector3.FromFloatsToRef(positions[i2 * 3], positions[i2 * 3 + 1], positions[i2 * 3 + 2] , p2);      BABYLON.Vector3.FromFloatsToRef(positions[i3 * 3], positions[i3 * 3 + 1], positions[i3 * 3 + 2] , p3);      p1.subtractToRef(p2, p1p2);      p3.subtractToRef(p2, p3p2);      BABYLON.Vector3.CrossToRef(p1p2, p3p2, faceNormal);      faceNormal.normalize();      // All intermediate results are stored in the normals array :      // get the normals at i1, i2 and i3 indexes      normals[i1 * 3]     = normals[i1 * 3]     || 0;      normals[i1 * 3 + 1] = normals[i1 * 3 + 1] || 0;      normals[i1 * 3 + 2] = normals[i1 * 3 + 2] || 0;      normals[i2 * 3]     = normals[i2 * 3]     || 0;      normals[i2 * 3 + 1] = normals[i2 * 3 + 1] || 0;      normals[i2 * 3 + 2] = normals[i2 * 3 + 2] || 0;      normals[i3 * 3]     = normals[i3 * 3]     || 0;      normals[i3 * 3 + 1] = normals[i3 * 3 + 1] || 0;      normals[i3 * 3 + 2] = normals[i3 * 3 + 2] || 0;      // make intermediate vectors3 from normals values      BABYLON.Vector3.FromFloatsToRef(normals[i1 * 3], normals[i1 * 3 + 1], normals[i1 * 3 + 2] , vertexNormali1);      BABYLON.Vector3.FromFloatsToRef(normals[i2 * 3], normals[i2 * 3 + 1], normals[i2 * 3 + 2] , vertexNormali2);      BABYLON.Vector3.FromFloatsToRef(normals[i3 * 3], normals[i3 * 3 + 1], normals[i3 * 3 + 2] , vertexNormali3);      // add the current normal to the face to these intermediate vectors3      vertexNormali1 = vertexNormali1.addInPlace(faceNormal);      vertexNormali2 = vertexNormali2.addInPlace(faceNormal);      vertexNormali3 = vertexNormali3.addInPlace(faceNormal);      // store back intermediate vectors3 into the normals array      normals[i1 * 3]     = vertexNormali1.x;      normals[i1 * 3 + 1] = vertexNormali1.y;      normals[i1 * 3 + 2] = vertexNormali1.z;      normals[i2 * 3]     = vertexNormali2.x;      normals[i2 * 3 + 1] = vertexNormali2.y;      normals[i2 * 3 + 2] = vertexNormali2.z;      normals[i3 * 3]     = vertexNormali3.x;      normals[i3 * 3 + 1] = vertexNormali3.y;      normals[i3 * 3 + 2] = vertexNormali3.z;    }        // last normalization    for (index = 0; index < normals.length / 3; index++) {      BABYLON.Vector3.FromFloatsToRef(normals[index * 3], normals[index * 3 + 1], normals[index * 3 + 2] , vertexNormali1);      vertexNormali1.normalize();      normals[index * 3]     = vertexNormali1.x;      normals[index * 3 + 1] = vertexNormali1.y;      normals[index * 3 + 2] = vertexNormali1.z;        }  };

Dad72 · April 29, 2015

You've put double semicolons here:

     var vertexNormali1 = BABYLON.Vector3.Zero();;    var vertexNormali2 = BABYLON.Vector3.Zero();;    var vertexNormali3 = BABYLON.Vector3.Zero();;

I tend to avoid multiplying the key word “var” when i can. that optimizes the code slightly.

    var index = 0,        p1 = BABYLON.Vector3.Zero(),        p2 = BABYLON.Vector3.Zero(),        p3 = BABYLON.Vector3.Zero(),        p1p2 = BABYLON.Vector3.Zero(),        p3p2 = BABYLON.Vector3.Zero(),        faceNormal = BABYLON.Vector3.Zero(),        vertexNormali1 = BABYLON.Vector3.Zero(),        vertexNormali2 = BABYLON.Vector3.Zero(),        vertexNormali3 = BABYLON.Vector3.Zero(),        nbFaces = indices.length / 3;

RaananW · April 29, 2015

works wonderfully! I wonder what the problem is with the playground. Your code is 3 times faster.

jerome · April 29, 2015

Use now this (Q&D) link to tune it to your computer : http://jerome.bousquie.fr/BJS/computeNormals/menu.html

Please guys let me know how it behaves in your browsers (Chrome, IE, FF, etc).

I feel a bit afraid to modify computeNormals() in the main repo :wacko: and I need to know before if it works on your platforms as well as on mine.

jerome · April 29, 2015

@dad72 : thank you for the semi-colons, copy-cut bug

I prefer keep every var declaration apart... personal preference for readibility

jahow · April 29, 2015

Excellent optimization Jerome, nice work!

You could maybe set up a few test cases to prove that your new function replaces perfectly the current one? Something link:

1/ compute normals on a ribbon (your current test case)

2/ compute normal on another base shape, for example : a height map

3/ compute normal on an imported mesh

jerome · April 29, 2015

here in chrome, I get a enormous gap between the two functions for a mesh sized 60 (28 800 vertices) and more : 60 fps / 20 fps

3 times faster !

size 70 : 60 fps / 12 fps => 5 times faster !

size 90 : 50 fps / 9 fps => 5,5 times faster

size 120 : 30 fps / 4 fps => 7,5 times faster !!! with the last acceptable framerate : 30 fps

size 150 (135 200 vertices !) : 25 fps / 3 fps. well 8,33 times faster.. but 25 fps only

jerome · April 29, 2015

thank you Jahow

arf, I just dig in this because I noticed my morphing tests reached 80% CPU in Chrome Garbage collection !

jahow · April 29, 2015

That's the benefit of open source projects

Another test case I just thought of: computing normals after a mesh has been converted to flat shaded.

jerome · April 29, 2015

If someone could test please on a imported mesh (never done this)

I will do it on a height map, ok.

I tested on a ribbon because it is intended to be morphed then with CreateRibbon() and because it is highly parametric.

On a height map, I will probably recompute the same normals each frame... well, it is not visible but a big computation.

jerome · April 29, 2015

Actually the mesh type doesn't really matter.

The computeNormals() method only knows about a positions and a indices array as input.

It only depends upon the size of these two arrays.

This example generates meshes sized dozen of thousands of vertices which is quite big compared to usual meshes having hundreds or thousands of vertices only.

jerome · April 29, 2015

Differences are more difficult to notice in FF because when debugLayer is visible, it doesn't go higher than 30 fps here.

Tests should be done with no debugLayer and FF profiler to register the average fps.

I will test later ... (I'm in Toulouse tomorrow for a conf about systems and networks supervision, will try to make a demo of my 3D weathermap )

JCPalmer · April 29, 2015

If someone could test please on a imported mesh (never done this)

.babylon files contain normals, so this is unlikely to be called by them.

GameMonetize · April 29, 2015

THIS IS INCREDIBLY GOOD JOB!

I really appreciate what you did

Can you just be sure that this optimization works well on every browser? (safari, IE, Chrome and firefox as well)

once this is done, please do a PR to replace the previous one

jerome · April 29, 2015

Re-tested on another old computer.

Well, it seems to work well also.

Improvements are far more noticeable in Chrome than in FF.

I can't test in IE as I'm on Linux.

I'm waiting for IE users feedback...

JCPalmer · April 29, 2015

My OSX / machine is pretty old, so I'll leave that for someone with a newer retina display machine.

On iOS 8.x iPad Air 2, fps is about double of legacy.

Temechon · April 30, 2015

Awesome ! I go from 4 FPS on chrome on legacy version to 35 FPS with the optimized version. Really really awesome !

computeNormals optimization

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Join the conversation

Recently Browsing 0 members