Hello!
I am working on a small project and, in the near future, I'd like to implement some sort of Text-To-Speech system. The caveat is that it should be somewhat trainable.
Basically, I don't just need TTS but also that it can be trained with, say, my voice to have a similar tone. For example, if I read the test sentences mumbling (but still kind of resembling the actual sounds), the TTS would turn text into mumbles which kind of sound like the actual sounds. Is there anything like that out there already?
I found Merlin (https://github.com/CSTR-Edinburgh/merlin/) and Voice Cloning but they are kind of rough around the edges, especially when it comes to train them with my own data.
Anyone who has worked on similar systems?
Thanks!