dbawel Posted December 2, 2014 Share Posted December 2, 2014 Hi Jeff, I would avoid using .wav files if your medium is HTML 5. Many users will have to install plugins depending on their OS and browser such as Microsoft Media Player and Quicktime, and unless you have a script to check for plugins conditionally - which gets a bit messy - your .wav files won't play. I've read that many people are using .mp4 format, however .mp4 is really an H.264 codec which is slow and often problematic depending on OS, browser, and browser version. I found a chart created a few months ago which provides a decent picture of what is supported in which OS and browser. This is not a complete list, so if you don't find a format you want to use, I might be able to provide additional info. Audio #Firefox supports Ogg Vorbis and WAVOpera supports Ogg Vorbis and WAVSafari supports MP3, AAC, and MP4Chrome supports Ogg Vorbis, MP3, WAV, AAC, and MP4Internet Explorer 9+ supports MP3, AAC, and MP4iOS supports MP3, AAC, and MP4Android supports AAC and MP3I prefer to use .mp3 files, as I've found they are the most widely supported - although I decode them myself as many decoding applications are inefficient and .mp3 files often won't work using the plethora of decoders out there. .mp3 files are slow to load compared to .ogg files, since there are several profiles that exist to load these - but seem to load more efficiently than .mp4 files. .ogg files work under most circumstances and load quickly, however they aren't as universally supported on both desktops and mobile devices. I would be interested in hearing the opinion of the integrator of web audio into the BabylonJS framework, and see what they have experienced. Regardless, for web audio in HTML 5, my preference is .mp3 for it's wide support. But it all depends on your user base, which will conform to whatever specs you provide as your character face tool is a valuable one. I wish I could offer more, but we're still in our infancy developing in WebGL(I certainly am) - and my hat's off to those who are writing the framework. Cheers, David B. Quote Link to comment Share on other sites More sharing options...
JCPalmer Posted December 3, 2014 Author Share Posted December 3, 2014 David, Thanks again. Looking at your table & source reference, WAV & MP3 also hits them all. It is missing one "browser" I consider important, CocoonJS. Values I see for it are based on being iOS or Android:CocoonJS on Android: WAV, X-WAV, Ogg VorbisCocoonJS on iOS: MPEG, MP4, MP3, X-WAV, Ogg Vorbis Also of crucial importance is Web Audio support. Right now I think Firefox, & Chrome are the only production players. If anyone knows up to date info, please share. I wish a list of mandatory formats were part of that standard. That would be the real controller of format. The data we are listing here is for the <audio> tag. Think <audio> might be too latent for this application. FYI, CocoonJS Web Audio Canvas+ thread here: http://support.ludei.com/hc/communities/public/questions/200566779-Accelerated-Canvas-Web-Audio-API-?locale=en-us I am not as interested in their Webview+ container, but looks like it works on iOS: http://support.ludei.com/hc/communities/public/questions/201335185-When-will-Web-Audio-API-be-supported-?locale=en-us I have seen libmp3lame.js that does .mp3, but it is encoded as it is captured in a single step & it is also huge. Means you cannot do one recording, then encode/save to multiple formats later. There are external converter programs as a backup, though. I'll think about this a little more. Jeff Quote Link to comment Share on other sites More sharing options...
JCPalmer Posted December 3, 2014 Author Share Posted December 3, 2014 Nope, I saw libmp3lame.js could be used with an in memory audio/wav Blob made as a first step, but you need a worker thread. WAV is so fast for the short files coming out of this that I was not using a worker. The working example of this was also so slow. MP3 encoding involves patenting / licensing issues. With a many different converters, an online one http://media.io/ , there is no reason that I should clog this system up or expend my resources with multiple formats. WAV is easy to make relative to lossy formats, so it Rules!! FYI, I do have both stereo & mono working. Jeff Quote Link to comment Share on other sites More sharing options...
dbawel Posted December 3, 2014 Share Posted December 3, 2014 Hi Jeff, I'm learning a great deal from you in this post. As for my preferences, I use .mp3 files as they are compressed and WAV files are not. But for small files, .wav is the most widely supported format due to the fact it is not compressed audio. You can always implement .mp3 and perhaps next gen formats at a later date when we hope most browsers adopt a standard - they almost have to in order to survive. MP3 files do not require licensing if for personal use, however, even if you license your facial animation application, you're not encoding which is done by the user of your application. And for the user, if what they produce is not for profit, no one will ask for licensing from my experience. However, if someone does produce a game for profit with .mp3 audio tags, they are required to pay a $2500 licensing fee - and if requested they must pay or face prosecution. If anyone wants to check for audio support in a browser, here is a simple HTML 5 audio checker in Javascript: // returns a booleanvar audioTagSupport = !!(document.createElement('audio').canPlayType); To check file compatibility: // Need to check the canPlayType first or an exception // will be thrown for those browsers that don't support it var myAudio = document.createElement('audio'); if (myAudio.canPlayType) { // Currently canPlayType(type) returns: "", "maybe" or "probably" var canPlayMp3 = !!myAudio.canPlayType && "" !=myAudio.canPlayType('audio/mpeg'); var canPlayOgg = !!myAudio.canPlayType && "" !=myAudio.canPlayType('audio/ogg; codecs="vorbis"'); } And if you want to cover all bases, I'm sure you're familiar with implementing Flash as a fallback to audio support in browsers, but or those who are not, it's quite simple. <audio controls preload="auto"> <source src="elvis.mp3" /> <source src="elvis.ogg" /> <!-- now include flash fall back --></audio> I look forward to your next web application update. Cheers, David B. Quote Link to comment Share on other sites More sharing options...
JCPalmer Posted December 10, 2014 Author Share Posted December 10, 2014 Well, I just cannot seem to resolve the length of the recording. It seems often that the recording stops before the sentence is finished. Adding 500 ms padding just does not seem like the answer. Am also starting to look a the Audio classes. I was wondering why the need to pass scene in the Sound constructor? It just gets its engine member. Why not just make audioEngine a public static member of Engine? Even if there could be multiple Engines in a "VM", could they not share a instance of audioEngine? Seeing errors related to things things missing from mixins: AudioBufferSourceNode, PannerNode, AudioContext, GainNode, webkitAudioContext. The mixins.ts I am using is (parts I am not using are commented out / steal as desired):interface Navigator { isCocoonJS: boolean; // delete once using babylon.2.0.d.ts getUserMedia( options: { video?: boolean; audio?: boolean; }, success: (stream: any) => void, error?: (error: string) => void ) : void; webkitGetUserMedia( options: { video?: boolean; audio?: boolean; }, success: (stream: any) => void, error?: (error: string) => void ) : void; mozGetUserMedia( options: { video?: boolean; audio?: boolean; }, success: (stream: any) => void, error?: (error: string) => void ) : void;}interface Window { AudioContext : AudioContext; webkitAudioContext : AudioContext;}interface HTMLAnchorElement{ download : string;}interface HTMLURL{ revokeObjectURL : (string) => void;}interface AudioContext { new (): any; destination : AudioDestinationNode; sampleRate : number; currentTime : number;// listener : AudioListener; state : AudioContextState; suspend (); resume (); close (); onstatechange : () => void; createBuffer (numberOfChannels : number, length : number, sampleRate : number) : AudioBuffer;// decodeAudioData (audioData : ArrayBuffer, successCallback? : (decodedData : AudioBuffer) => void, errorCallback? : (DOMException : any) => void) : AudioBuffer; createBufferSource () : AudioBufferSourceNode;// createMediaElementSource (mediaElement : HTMLMediaElement) : MediaElementAudioSourceNode; createMediaStreamSource (mediaStream : any) : MediaStreamAudioSourceNode;// createMediaStreamDestination () : MediaStreamAudioDestinationNode;// createAudioWorker (scriptURL : string, numberOfInputChannels? : number, numberOfOutputChannels? : number) : AudioWorkerNode; createScriptProcessor (bufferSize? : number, numberOfInputChannels? : number , numberOfOutputChannels? : number) : ScriptProcessorNode;// createAnalyser () : AnalyserNode; createGain () : GainNode;// createDelay (maxDelayTime? : number) : DelayNode;// createBiquadFilter () : BiquadFilterNode;// createWaveShaper () : WaveShaperNode;// createPanner () : PannerNode;// createStereoPanner () : StereoPannerNode;// createConvolver () : ConvolverNode;// createChannelSplitter (numberOfOutputs? : number) : ChannelSplitterNode;// createChannelMerger (numberOfInputs? : number ) : ChannelMergerNode;// createDynamicsCompressor () : DynamicsCompressorNode;// createOscillator () : OscillatorNode;// createPeriodicWave (real : Float32Array, imag : Float32Array) : PeriodicWave;}interface AudioBuffer { sampleRate : number; length : number; duration : number; numberOfChannels : number; getChannelData (channel : number) : Float32Array; copyFromChannel (destination : Float32Array, channelNumber : number, startInChannel? : number) : void; copyToChannel (source : Float32Array, channelNumber : number, startInChannel? : number) : void;}enum AudioContextState { "suspended", "running", "closed"}interface AudioNode { connect (destination : AudioNode, output? : number, input? : number) : void; connect (destination : AudioParam, output? : number) : void; disconnect (output? : number) : void; context : AudioContext; numberOfInputs : number; numberOfOutputs : number; channelCount : number; channelCountMode : number; channelInterpretation : any;}interface GainNode extends AudioNode { gain : AudioParam;}interface AudioDestinationNode extends AudioNode { maxChannelCount : number;}interface ScriptProcessorNode extends AudioNode { onaudioprocess: (any) => void; bufferSize : number;}interface AudioBufferSourceNode extends AudioNode { buffer : AudioBuffer; playbackRate : AudioParam; detune : AudioParam; loop : boolean; loopStart : number; loopEnd : number; start (when? : number, offset? : number, duration? : number) : void; stop (when? : number) :void; onended : any;}interface MediaStreamAudioSourceNode extends AudioNode {}interface AudioParam { value : number; defaultValue : number; setValueAtTime (value : number, startTime : number) : void; linearRampToValueAtTime (value : number, endTime : number) : void; exponentialRampToValueAtTime (value : number, endTime : number) : void; setTargetAtTime (target : number, startTime : number, timeConstant : number) : void; setValueCurveAtTime (values : Float32Array, startTime : number, duration : number) : void; cancelScheduledValues (startTime : number) : void;}/*interface AudioWorkerNode extends AudioNode { terminate () : void; postMessage (message : string, transfer? : any) : void; attribute EventHandler onmessage; addParameter (DOMString name, optional float defaultValue) : AudioParam; void removeParameter (DOMString name);} */also unrelated mixins missing for DebugLayer in scene, HTMLElement.src in videoTexture, plus compile errors in Mesh.applyDisplacementMap, Mesh.CreateGroundFromHeightMap, & BabylonFileLoader.parshMesh.Errors in tools & database too. Gulp seems to be quite slack if it is letting this stuff through. Quote Link to comment Share on other sites More sharing options...
GameMonetize Posted December 10, 2014 Share Posted December 10, 2014 I added the missing waa.d.ts to support WebAudio interfaces. I let Davrous answer your questions about WebAudio architecture. About compile errors: are they still here even with the waa.d.ts file? Quote Link to comment Share on other sites More sharing options...
davrous Posted December 11, 2014 Share Posted December 11, 2014 Hello, My idea for Web Audio is to add sounds to a scene, this scene will be able to have multiple tracks. In this way, you will be able to apply effects, volume on a dedicated track. I then need to keep references to the sound being added to a scene. I will also let the user instantiate a sound without affecting it to a scene if needed. For the codec support, we will only offer that the browser is offering. For instance, let's imagine that .wav & .mp3 is supported by IE, Firefox & Chrome but .ogg is not supported by IE, it will be up to the developer of the game to use .mp3 or .wav assets to assume cross-browsers compatibility. Supporting codec via JavaScript libraries is not in our scope at all for several reasons: performance first and royalties/patent potential problems. We will then stick to the pure browsers support. I'm just in the very beginning of the Web Audio Babylon.js architecture. If you have ideas on how it should be implemented, I'm open to suggestions, even if I've already have a pretty good idea on how I'd like to implement it. David Quote Link to comment Share on other sites More sharing options...
JCPalmer Posted December 11, 2014 Author Share Posted December 11, 2014 First, yes the waa.d.ts did solve all the syntax issues (also means I can ditch all my non-recording related mixins ). I do not know why it did, since some stuff had nothing to do with audio. Theory: The audio error caused failures later, so new stuff did not compile. Anyway, fixed.- - - - - - - - - -As far as API, having sounds being members of (catalogued by) scene is fine, but grabbing the BABYLON.AudioEngine using the BABYLON.Scene means it cannot be an optional arg. I have restructured my proof of concept into an implementation such that I have a MORPH.Sentence class whose constructor parses my "Arpabet+" string into a MORPH.EventSeries of facial MORPH.Deformations. After looking at your BABYLON.Sound class, I was thinking of making MORPH.Sentence a subclass of it. This would give a very tight integration of the sound to the vertex repositioning directions, and simplify initiating them, by overriding the play() method. These MORPH.Sentences could be constructed independent of knowledge of the actual MORPH.Mesh ( or maybe a MORPH.MakeHuman subclass ). I have not figured out what to do after that yet, but that mandatory BABYLON.Scene arg is making me nervous. I want to have an architecture where everything is OO oriented around a mesh instance, not standalone sub-systems. I do not see a need for more than one instance of BABYLON.AudioEngine. Using the Engines would make my starting of navigator.getUserMedia for recording much less complicated. I would like to ditch my code for instancing an AudioEngine. I could call getAudioEngine() on my engine instance, but prefer to call a static Engine.getAudioEngine() Quote Link to comment Share on other sites More sharing options...
JCPalmer Posted December 11, 2014 Author Share Posted December 11, 2014 One other thing too, could url be optional in the constructor as well? You could just do nothing in that case, leaving _isReadyToPlay = false. In order to use BABYLON.Sound for playback in the recording app, I would have a method in the sub-class, which had a _soundSource _audioBuffer passed in as an arg. During the recording process, the constructor would have already been run. Also, the .wav file is not written every record. Want to give the operator the chance to playback with both the movement & sound, to decide if it is right. Quote Link to comment Share on other sites More sharing options...
davrous Posted December 11, 2014 Share Posted December 11, 2014 Yes, it is planned to build a sound using directly an audioBuffer in order to use it with the Assets Manager for instance. I will have a look to your architecture request. If it can help you building your speech solution, will do. Quote Link to comment Share on other sites More sharing options...
JCPalmer Posted December 11, 2014 Author Share Posted December 11, 2014 Thanks, I just pushed version 1.1 up to the repository, then switched out my mixins.ts stuff for waa.d.ts. Not really perfect, but wanted a snapshot with 1.14, before trying to integrate with 2.0. Needed to get commit points anyway, since it had been about 2 months. Will put out an updated link, once looking good. Quote Link to comment Share on other sites More sharing options...
dbawel Posted December 15, 2014 Share Posted December 15, 2014 Hi Jeff, Can you post your updated mixins.ts file? I'd like to better understand what was happening, and why some of the compile errors you mentioned were apparently solved by waa.d.ts. Thanks, David B. Quote Link to comment Share on other sites More sharing options...
JCPalmer Posted December 15, 2014 Author Share Posted December 15, 2014 David,The entire MORPH 1.1 is published, so you can look at almost anything you want. https://github.com/BabylonJS/Extensions/tree/master/Morph . The one thing not in the repository the is cmudict.0.7a.js, (but .java & .class files that build it are). Too big, and I also do not want to give the impression you need it for anything other than recording. The compile issues were actually in the Babylon.JS repository, not in my code in the extensions respository. There the problem was they were not defining the web audio functions anywhere. The Typescript plug-in I use, was erroring due to this. It was also not recognizing other new stuff. When I pulled the Babylon.js repository with waa.d.ts, all the web audio reference were found. The plug-in also then was ok with unrelated files. Think this is a plug-in phenomenon. I now use waa.d.ts too, and ripped those definitions out of mixins.ts. No need to re-invent the wheel. I am now waiting on the Babylon audio implementation & Make Human 1.2, so I am stopping this for now. I have updated the page behind the link with the voice recorder. (For some reason on my linux system, playback does not show any Voice, but it did save hear able .wav files). Here it is again :https://googledrive.com/host/0B6-s6ZjHyEwUSDVBUGpHOXdtbHc These are the changes since last publish:- writes mono .wav files, if asked- the arpabet no longer show vowel stresses (still in cmudict.0.7a.js though)- Switched the arpabet separator character from ' ' to '_'- the arpabet which actually run is in an editable text control, so now it can be copied to / pasted from the clipboard- the keyboard direction keys no longer move the mesh, which was annoying Also have implemented what I call Arpabet+, which incorporates the data from the sliders into the string. This means expressions can be encoded with speech. The CENSORED expression, you have to add manually like F^2!1+3_AH#_K#_._Y_UW_._AE_S_HH#_OW_L_.# . I am not sure how important censoring is, but is does show the problem syncing the voice with the mesh deformations. It is fine when played Ignoring Censoring, but somehow when silencing certain sections of audio, things run late. I think this also related to having to pad the recording with 500 milli-seconds to keep it from being cut off (The important problem). Internally, I no longer have a monolithic Voice class. Have modularized into a Sentence class, which will be a subclass of BABYLON.Sound, when it is done. There is still a small Voice class, but that is likely to be integrated into a future MORPH.Mesh subclass. Jeff Quote Link to comment Share on other sites More sharing options...
JCPalmer Posted December 15, 2014 Author Share Posted December 15, 2014 As a separate note, Make Human 1.1 is to have facial bones to implement expressions. They say this was for consistency. I think the 1.2 release will have the Blender MHX plugin turning this back into shapekeys, but not totally clear, based on scant into. If it stays as bones, I am going to have to learn how to build shapekeys in python, not just export shapekeys to Babylon.JS. FYI, I saw a fixed bug that had an interesting picture which shows the bones. http://bugtracker.makehuman.org/issues/637 Quote Link to comment Share on other sites More sharing options...
dbawel Posted December 15, 2014 Share Posted December 15, 2014 Hi Jeff, Thanks for the link. I ran a couple of searches yesterday on GitHub, but I found no results under your GDF name - or other indicators I tried. I'm still not sure why you weren't able to reference Web Audio correctly with your use of mixins - but as you stated, 'no need to re-invent the wheel.' However, I'd still like to understand more specifically where mixins failed as this might be an issue in referencing yet unknown components in the future. Do you think that your audio recording issue might be hardware / OS related, since in reviewing your scripts and structure I cannot see why a delay is required for recording - especially such a long delay. I might only expect such a problem if you weren't recording locally - but if this was the case, I doubt 500 ms would solve the resulting truncated audio files you're finding, as this would be too inconsistent to manage effectively. So, it would be valuable to understand causality in this case. Everyone has their preferences, but most people would agree with you that using morph targets provide far better results for your application. As you know, bones introduce rigidity into muscular animation, and the results are far less pliable in performance - as well as limiting vertex ROM (range of motion) and polymorphic influence. The time required in rigging and enveloping bones is also comparatively prohibitive, and ultimately introduce not only transform limitations, but compatibility issues with other functions and attributes. Bones are also less efficient in computation. Will you be introducing an envelope or similar gait to match duration of the sum (or subset) of an animation to the length of an audio waveform? If you are familiar with how Flash syncs audio and facial, they implemented the most basic fit function possible, however the result is somewhat acceptable. You might also implement a function to identify basic waveform peak analysis for the voice recording, which is a simple way to set pointers for key morph targets and their specific duration and attenuation. Thanks for your continued development on this. I'm not yet certain how your "censored" controls will ultimately contribute to overall quality and usability, but it's certainly an interesting attribute. It's great to see someone thinking outside of the box. Cheers, David B. Quote Link to comment Share on other sites More sharing options...
JCPalmer Posted December 15, 2014 Author Share Posted December 15, 2014 Mixin wise, I never have had a problem. I was / am using a version of Babylon 2.0 that is pre-audio. My mixins.ts worked perfectly. I just stopped using my own, when I found they are officially published. I was only commenting of the state of the BabylonJS repository right after they first started adding audio. Using shape keys / morph targets is more than a preference in the case of BJS. There is no bone interpolator, just playing of pre-built animations developed externally, like in Blender, then exported. That means you would be stuck with lip-syncing, not voice-syncing. I am afraid I do not even know what "basic waveform peak analysis" is. I have never used flash. I am mostly just making this up as I go along. Sometimes this produces unexpected results. My thought on censored, if useful at all, is to come with something that might be a requirement of the iOS or GooglePlay stores. Those developer agreements are massive. Quote Link to comment Share on other sites More sharing options...
dbawel Posted December 16, 2014 Share Posted December 16, 2014 Hi Jeff, I had assumed you were already using AnalyserNodes to identify peaks in your audio file, and hopefully would eventually run limited spectrum analysis as well - until I reviewed your code - which is why I asked if you planned on implementing this in the near future when I didn't find any reference to these in a doc search. Access to these are available in the Web Audio API, and are essential to lip sync and animation quality. I'm sure once you implement AnalyserNodes into your application, it will solve most sync issues and dramatically improve performance and quality of animation - well beyond practically anything else you might implement. I cannot imagine developing an application such as what you're building without making AnalyserNodes a key element in many functions. Instead of me explaining this (and more) in detail, here is a very good reference on the use of AnalyserNodes, their function, and how to implement them into your Javascript: http://chimera.labs.oreilly.com/books/1234000001552/ch05.html Your work is about to become allot easier and these will provide a much better result. However, as few people have experience in audio engineering, let me know if there is anything I might explain if it's not clear conceptually or otherwise. A quick note on developer rights - my company is in the middle of trying to fully understand the licensing requirements in using H.265, as this format is being adopted by most all device manufacturers and application developers. Within one year, H.265 will replace H.264 and is expected to be the most widely used format for streaming video and audio worldwide. So as difficult as it is to understand the ramifications of following the guidelines when using other people's IP, everyone should follow your lead and fully understand licensing and usage for any external IP they implement; otherwise, they might find themselves with a cease and desist order against them - having spent years building their application, game, etc. - just to find themselves with a product they cannot legally release or afford to release.And for anyone considering supporting H.265, be aware that anything published for profit or not, is subject to giving up their rights to the property (content) once it is displayed (published) using the H.265 format. However, it will become the standard, and it is by far the highest quality and most efficient compression algorithm for streaming video (4K) and audio with interactive events natively. You Tube has yet to announce any support for H.265, yet the rest of the industry is moving quickly towards this format. Unfortunately, since Google owns the VP9 format, they are forcing You Tube away from H.265. But irregardless of which format is supported, most people publishing content to You Tube will never read their licensing agreement which generally and broadly gives away rights to the content they are publishing. I don't hear much noise about licensing rights these days, however I'm guessing we all will be discussing this extensively in the near future. Let me know if I can assist or advise in any way to help you get your lip syncing and facial application to release. You've laid a very good foundation, and I'm guessing your development and quality of output will advance dramatically moving forward. Cheers, David B Quote Link to comment Share on other sites More sharing options...
JCPalmer Posted January 24, 2015 Author Share Posted January 24, 2015 Was wondering if a public set of an audioBuffer can be added to Sound for sub-classing. Will not be known at construction, and just came from a mic, so no decoding is required. Can always pass some dummy arg to avoid error in call to super in constructor. Something like: public setAudioBuffer(audioBuffer : AudioBuffer) : void{ this._audioBuffer = audioBuffer; this._isReadyToPlay = true; } Quote Link to comment Share on other sites More sharing options...
davrous Posted January 24, 2015 Share Posted January 24, 2015 This is indeed a good idea. I'll work on adding that. Quote Link to comment Share on other sites More sharing options...
pathogen Posted July 2, 2015 Share Posted July 2, 2015 Wow! Fantastic project JCPalmer! And the amount of knowledge in this thread is incredible. It is making my 3:37am brain overflow with ideas davrous: have you seen Tone.js? It's written in typescript, and has a really ingenious oscillator based clock which evades javascript's poor clock timing entirely. Quote Link to comment Share on other sites More sharing options...
dbawel Posted July 3, 2015 Share Posted July 3, 2015 Hi pathogen, I hadn't seen Tone.js before. Interesting API. It's early in development, but looks promising. I've been speaking alot about Intel's XDK lately, as it certainly is developing rapidly, and of course, they have the resources to grow quickly. It's really good for integrating devices into your applications, and has audio and video integration tools such as basic facial recognition, respiratory rates, and audio recognition - although still in it's early forms currently. But this opens the door to fully automate facial animation and speech recognition paired together for the first time. There's lots of tools on the horizon, and many are promising. These days, there's almost too many to choose from - but that I suppose is a good thing. Thanks for the heads up, and I look forward to Jeff's next version for speech and facial animation. As for JCPalmer's question regarding waveform peak analysis (I should have responded to this months ago), it's a good method of detecting and discriminating the progression of audio signals by analising "signal energy present" and "signal energy absense" in specific frequencies of human speech - and ignoring patterns (cadence) which is inconsistant from person to person and by age. Thus by measuring frequencies within the range of 1K to 5K, this information can be used to reliably identify phonemes providing you ignore the patterns of peaks and the duration of signal on and signal off intervals. However, these frequencies are generally consistant in almost all human speech - regardless of language and dialect. So this is what I use for most real-time speech analysis as it is very quick to provide reasonably accurate results to drive morph targets for a list of phonemes of around 12 to 15 targets for a "human" mouth and tounge - which the tounge is typically overlooked but a very important element of human speech. Most recently, I wrote a plugin for Motionbuilder to isolate audio frequencies and to analize the resulting audio to drive animatronic servos which shaped the mouth of the "Scribe" goblin and other goblins for the film "The Hobbit." If you watch the goblin characters in the first film, the speech was driven in real time using waveform peak analysis, while I puppeteered the non speech facial emotions using custom "joystick" controllers in real time and on the film set. This allowed Pete J. to direct the actors naturally as they spoke into a microphone which I then delayed the audio by about 3 miliseconds which is the time required to analize the audio peaks and sync the audio signal to the physical servos driving both the silicon model as well as digital puppet characters. So if you watch the film, the speech appears very natural with just a limited list of phonemes - although a much longer list of "virtual" morph targets does not introduce any additional delays. Compare this to the completely manual puppets from the Star Wars films which the resulting animation was absolute crap comparitively - not that my work is comparable in any way, as I simply set up a process with tools that weren't readily available to Stan Winston at the time. This process is now completely digital prior to controlling the voltage to the animatronic servos, and could be applied as an extension for Babylon.js to drive speech and facial animation in real time to practically any character's facial morph targets for almost any application or game. It just takes a brilliant mind such as JCPalmer's to efficiently write the code to pass the waveform peak analysis "WPA" from a microphone (and camera if desired for facial emotions such as happy, sad, surprise, etc.) to drive a morph target list with some simple math I could assist with if requested. I might also be able to provide someone developing an extension for Babylon.js some Motionbuilder scenes that are completely digital, to drive digitally rendered models instead of animatronic servos. I just have to use some discretion as all of the models are the property of companies such as Dreamworks and New Line. So utilizing an extension such as Web Audio, WPA would be my personal choice for real time speech analysis avoiding the many methods I've seen and also personally applied, which in my opinion, over complicate the process which has been repeatedly proven in formats from video games to 50 foot faces on cinema screens in high resolution. I know this doesn't begin to explain the math driving the process, but the tools appear to be in place to make this process reasonably simple to apply. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.