jerome Posted April 29, 2015 Share Posted April 29, 2015 Hi, After having monitored in many browsers the CPU profiler, I noticed, when morphing meshes, that the computeNormals() method called in each frame was the bottleneck. There was a previous topic where this performance issue was talked about with JCPalmer.No real improvement were found. So I decided to hack this method to check if something could be done.Remember this method was designed long time ago before dynamic morphing was added to BJS, so it wasn't designed to be used in a render loop 60 times per second. <tl;dr;>current computeNormals() :http://www.babylonjs-playground.com/#ZOSGB local computeNormals() :http://www.babylonjs-playground.com/#ZOSGB#1 If this mesh is too heavy (48 000 vertices !) for your computer, please adjust your mesh size line 147 : var nbPaths = 80;</tl;td;> What did I find ?I found many inexpensive changes could be done.For instance, the current computeNormals() method does :- 3 passes : the first for positions.length times, the second for faces number times and the last vertices number * faces of vertices number times- allocate 3 intermediate arrays : one will be filled with number of vertices new Vector3 objects, another one will be filled number of vertices arrays (an array of arrays) containing faces indexes. This makes the code very readable (if I can understand it at first sight not being an expert, I consider it very readable) but may be improved in term of passes and created objects.This is quite important because I noticed that Chromium, which has for now better general perfs in js execution and rendering than FF, starts to consume more and more in GC after a while on a morphing running for minutes... up to 70% CPU on some examples on long duration. The framerate finally decreases...Since FF keeps a constant lower framerate and a constant GC usage. Well, for now, I reduced the method down to 2 passes only (the first for nb of faces times and the second for nb of vertices times) and the memory allocation down to 6 Vector3 objects and one array of Vector3 sized nb of vertices length.The 6 intermediate Vector3 could be not used if I would re-implement vector3 methods locally (add, subtract, cross product, normalize) but I don't think it's worth it in term of GC gain.I would like to eliminate the second pass and the intermediate array whose use is only to normalize each normal, but I have no idea for now. The CPU profiler doesn't show this pass has a noticeable impact, so it may not be worth it either. That said... how much is this improvement ?Well, it depends on the mesh size and on your computer/browser capabilities. So please in this very heavy example (done in purpose to stress the method), change the mesh size (nb of vertices) line 147 and you'll probably find a value where the improvement is really really noticeable.On my own computer, with Chromium, I can have 60 fps (and no GC at all) for minutes with the new function since the legacy one will decrease the framerate downto 28 fps after minutes and the GC will increase up to 70% CPU. You can also compare both method durations by opening your browser console :new method : http://www.babylonjs-playground.com/#ZOSGB#2legacy : http://www.babylonjs-playground.com/#ZOSGB#3On my computer, with local examples (outside PG and PG editor running scripts), I get between x3 and x5 speed increase ! So please make your own tests and let me know what you think about this improvement before I dare a PR Quote Link to comment Share on other sites More sharing options...
RaananW Posted April 29, 2015 Share Posted April 29, 2015 Without looking into the implementation - the new variant is 30-40% slower on my computer. Quote Link to comment Share on other sites More sharing options...
jerome Posted April 29, 2015 Author Share Posted April 29, 2015 mmhh.. which browser ?I noticed the gain was quite null, even sometimes negative, in the PG (maybe the PG editor has side effects ?).I will publish a direct link. Quote Link to comment Share on other sites More sharing options...
jerome Posted April 29, 2015 Author Share Posted April 29, 2015 I have a lead to get rid the intermediate Vector3 array also. Quote Link to comment Share on other sites More sharing options...
RaananW Posted April 29, 2015 Share Posted April 29, 2015 Both chrome and IE,almost same 30% reduction. Quote Link to comment Share on other sites More sharing options...
jahow Posted April 29, 2015 Share Posted April 29, 2015 Hi Jerome, in chrome v. 42 (windows 7) the new version gives 6 FPS, versus 12 FPS for the legacy one. It's weird since it looks like you've optimized the method quite well... I tried copy-pasting the original function into your custom one and then the FPS drops to around 4: http://www.babylonjs-playground.com/#ZOSGB#4 So I'd say your method is definitely an improvement, but there is something weird in how the PG behaves! Quote Link to comment Share on other sites More sharing options...
jerome Posted April 29, 2015 Author Share Posted April 29, 2015 OK,Please use Chrome (I don't have IE) so we can compare for now (I hope IE will react the same then).Please open the Developer Tools.In the lower panel, click on the Rendering tab, then opt in for "Show FPS meter".In the upper tab, opt in for "Collect Javascript CPU Profile". Don't start it for now please. Now, you can just check the FPS with this example (I tuned for the results are very sensible on my machine = 28800 vertices) :legacy = http://jerome.bousquie.fr/BJS/computeNormals/legacy.htmloptimized = http://jerome.bousquie.fr/BJS/computeNormals/optimized.html My results here are :constant 28 FPS with legacy ...then decreasingconstant 60 FPS with optimized If now you go back into the Developer Tools and you start the CPU profiler for, say, 3 minutes.You'll see the GC will remain at very low level, quite null, with optimized but will increase drastically with legacy version. Quote Link to comment Share on other sites More sharing options...
Dad72 Posted April 29, 2015 Share Posted April 29, 2015 For me too, I get 3fps with Chrome and the link of Jahow edite: => http://jerome.bousqu.../optimized.html is ok: 60 FPS Quote Link to comment Share on other sites More sharing options...
jerome Posted April 29, 2015 Author Share Posted April 29, 2015 Please test the direct links and let me knowI will add a way to tune the mesh size so you will find a good value to see the difference For those wanting to to read the code, here's the last commented version :Two passes only (nb of faces times then nb of normal vectors times) and no more intermediate array. var localComputeNormals = function(positions, indices, normals) { var index = 0; // temp Vector3 var p1 = BABYLON.Vector3.Zero(); var p2 = BABYLON.Vector3.Zero(); var p3 = BABYLON.Vector3.Zero(); var p1p2 = BABYLON.Vector3.Zero(); var p3p2 = BABYLON.Vector3.Zero(); var faceNormal = BABYLON.Vector3.Zero(); var vertexNormali1 = BABYLON.Vector3.Zero();; var vertexNormali2 = BABYLON.Vector3.Zero();; var vertexNormali3 = BABYLON.Vector3.Zero();; // indice triplets = 1 face var nbFaces = indices.length / 3; for (index = 0; index < nbFaces; index++) { var i1 = indices[index * 3]; var i2 = indices[index * 3 + 1]; var i3 = indices[index * 3 + 2]; // setting the temp V3 BABYLON.Vector3.FromFloatsToRef(positions[i1 * 3], positions[i1 * 3 + 1], positions[i1 * 3 + 2] , p1); BABYLON.Vector3.FromFloatsToRef(positions[i2 * 3], positions[i2 * 3 + 1], positions[i2 * 3 + 2] , p2); BABYLON.Vector3.FromFloatsToRef(positions[i3 * 3], positions[i3 * 3 + 1], positions[i3 * 3 + 2] , p3); p1.subtractToRef(p2, p1p2); p3.subtractToRef(p2, p3p2); BABYLON.Vector3.CrossToRef(p1p2, p3p2, faceNormal); faceNormal.normalize(); // All intermediate results are stored in the normals array : // get the normals at i1, i2 and i3 indexes normals[i1 * 3] = normals[i1 * 3] || 0; normals[i1 * 3 + 1] = normals[i1 * 3 + 1] || 0; normals[i1 * 3 + 2] = normals[i1 * 3 + 2] || 0; normals[i2 * 3] = normals[i2 * 3] || 0; normals[i2 * 3 + 1] = normals[i2 * 3 + 1] || 0; normals[i2 * 3 + 2] = normals[i2 * 3 + 2] || 0; normals[i3 * 3] = normals[i3 * 3] || 0; normals[i3 * 3 + 1] = normals[i3 * 3 + 1] || 0; normals[i3 * 3 + 2] = normals[i3 * 3 + 2] || 0; // make intermediate vectors3 from normals values BABYLON.Vector3.FromFloatsToRef(normals[i1 * 3], normals[i1 * 3 + 1], normals[i1 * 3 + 2] , vertexNormali1); BABYLON.Vector3.FromFloatsToRef(normals[i2 * 3], normals[i2 * 3 + 1], normals[i2 * 3 + 2] , vertexNormali2); BABYLON.Vector3.FromFloatsToRef(normals[i3 * 3], normals[i3 * 3 + 1], normals[i3 * 3 + 2] , vertexNormali3); // add the current normal to the face to these intermediate vectors3 vertexNormali1 = vertexNormali1.addInPlace(faceNormal); vertexNormali2 = vertexNormali2.addInPlace(faceNormal); vertexNormali3 = vertexNormali3.addInPlace(faceNormal); // store back intermediate vectors3 into the normals array normals[i1 * 3] = vertexNormali1.x; normals[i1 * 3 + 1] = vertexNormali1.y; normals[i1 * 3 + 2] = vertexNormali1.z; normals[i2 * 3] = vertexNormali2.x; normals[i2 * 3 + 1] = vertexNormali2.y; normals[i2 * 3 + 2] = vertexNormali2.z; normals[i3 * 3] = vertexNormali3.x; normals[i3 * 3 + 1] = vertexNormali3.y; normals[i3 * 3 + 2] = vertexNormali3.z; } // last normalization for (index = 0; index < normals.length / 3; index++) { BABYLON.Vector3.FromFloatsToRef(normals[index * 3], normals[index * 3 + 1], normals[index * 3 + 2] , vertexNormali1); vertexNormali1.normalize(); normals[index * 3] = vertexNormali1.x; normals[index * 3 + 1] = vertexNormali1.y; normals[index * 3 + 2] = vertexNormali1.z; } }; Quote Link to comment Share on other sites More sharing options...
Dad72 Posted April 29, 2015 Share Posted April 29, 2015 You've put double semicolons here: var vertexNormali1 = BABYLON.Vector3.Zero();; var vertexNormali2 = BABYLON.Vector3.Zero();; var vertexNormali3 = BABYLON.Vector3.Zero();;I tend to avoid multiplying the key word “var” when i can. that optimizes the code slightly. var index = 0, p1 = BABYLON.Vector3.Zero(), p2 = BABYLON.Vector3.Zero(), p3 = BABYLON.Vector3.Zero(), p1p2 = BABYLON.Vector3.Zero(), p3p2 = BABYLON.Vector3.Zero(), faceNormal = BABYLON.Vector3.Zero(), vertexNormali1 = BABYLON.Vector3.Zero(), vertexNormali2 = BABYLON.Vector3.Zero(), vertexNormali3 = BABYLON.Vector3.Zero(), nbFaces = indices.length / 3; Quote Link to comment Share on other sites More sharing options...
RaananW Posted April 29, 2015 Share Posted April 29, 2015 works wonderfully! I wonder what the problem is with the playground. Your code is 3 times faster. Quote Link to comment Share on other sites More sharing options...
jerome Posted April 29, 2015 Author Share Posted April 29, 2015 Use now this (Q&D) link to tune it to your computer : http://jerome.bousquie.fr/BJS/computeNormals/menu.html Please guys let me know how it behaves in your browsers (Chrome, IE, FF, etc).I feel a bit afraid to modify computeNormals() in the main repo and I need to know before if it works on your platforms as well as on mine. Quote Link to comment Share on other sites More sharing options...
jerome Posted April 29, 2015 Author Share Posted April 29, 2015 @dad72 : thank you for the semi-colons, copy-cut bug I prefer keep every var declaration apart... personal preference for readibility Quote Link to comment Share on other sites More sharing options...
jahow Posted April 29, 2015 Share Posted April 29, 2015 Excellent optimization Jerome, nice work! You could maybe set up a few test cases to prove that your new function replaces perfectly the current one? Something link:1/ compute normals on a ribbon (your current test case)2/ compute normal on another base shape, for example : a height map3/ compute normal on an imported mesh Quote Link to comment Share on other sites More sharing options...
jerome Posted April 29, 2015 Author Share Posted April 29, 2015 here in chrome, I get a enormous gap between the two functions for a mesh sized 60 (28 800 vertices) and more : 60 fps / 20 fps3 times faster ! size 70 : 60 fps / 12 fps => 5 times faster !size 90 : 50 fps / 9 fps => 5,5 times fastersize 120 : 30 fps / 4 fps => 7,5 times faster !!! with the last acceptable framerate : 30 fpssize 150 (135 200 vertices !) : 25 fps / 3 fps. well 8,33 times faster.. but 25 fps only Quote Link to comment Share on other sites More sharing options...
jerome Posted April 29, 2015 Author Share Posted April 29, 2015 thank you Jahow arf, I just dig in this because I noticed my morphing tests reached 80% CPU in Chrome Garbage collection ! Quote Link to comment Share on other sites More sharing options...
jahow Posted April 29, 2015 Share Posted April 29, 2015 That's the benefit of open source projects Another test case I just thought of: computing normals after a mesh has been converted to flat shaded. Quote Link to comment Share on other sites More sharing options...
jerome Posted April 29, 2015 Author Share Posted April 29, 2015 If someone could test please on a imported mesh (never done this) I will do it on a height map, ok.I tested on a ribbon because it is intended to be morphed then with CreateRibbon() and because it is highly parametric. On a height map, I will probably recompute the same normals each frame... well, it is not visible but a big computation. Quote Link to comment Share on other sites More sharing options...
jerome Posted April 29, 2015 Author Share Posted April 29, 2015 Actually the mesh type doesn't really matter.The computeNormals() method only knows about a positions and a indices array as input.It only depends upon the size of these two arrays. This example generates meshes sized dozen of thousands of vertices which is quite big compared to usual meshes having hundreds or thousands of vertices only. Quote Link to comment Share on other sites More sharing options...
jerome Posted April 29, 2015 Author Share Posted April 29, 2015 Differences are more difficult to notice in FF because when debugLayer is visible, it doesn't go higher than 30 fps here.Tests should be done with no debugLayer and FF profiler to register the average fps. I will test later ... (I'm in Toulouse tomorrow for a conf about systems and networks supervision, will try to make a demo of my 3D weathermap ) Wingnut 1 Quote Link to comment Share on other sites More sharing options...
JCPalmer Posted April 29, 2015 Share Posted April 29, 2015 If someone could test please on a imported mesh (never done this) .babylon files contain normals, so this is unlikely to be called by them. Quote Link to comment Share on other sites More sharing options...
GameMonetize Posted April 29, 2015 Share Posted April 29, 2015 THIS IS INCREDIBLY GOOD JOB! I really appreciate what you did Can you just be sure that this optimization works well on every browser? (safari, IE, Chrome and firefox as well)once this is done, please do a PR to replace the previous one Quote Link to comment Share on other sites More sharing options...
jerome Posted April 29, 2015 Author Share Posted April 29, 2015 Re-tested on another old computer. Well, it seems to work well also. Improvements are far more noticeable in Chrome than in FF. I can't test in IE as I'm on Linux.I'm waiting for IE users feedback... Quote Link to comment Share on other sites More sharing options...
JCPalmer Posted April 29, 2015 Share Posted April 29, 2015 My OSX / machine is pretty old, so I'll leave that for someone with a newer retina display machine. On iOS 8.x iPad Air 2, fps is about double of legacy. jerome 1 Quote Link to comment Share on other sites More sharing options...
Temechon Posted April 30, 2015 Share Posted April 30, 2015 Awesome ! I go from 4 FPS on chrome on legacy version to 35 FPS with the optimized version. Really really awesome ! jerome 1 Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.