Will Programming by Voice Be the Next Frontier in Software Development?

Two software engineers with injuries or chronic pain conditions have both started voice-coding platforms, reports IEEE Spectrum.Programmers utter commands to manipulate code and create custom commands that cater to and automate their workflows.”

The voice-coding app Serenade, for instance, has a speech-to-text engine developed specifically for code, unlike Google’s speech-to-text API, which is designed for conversational speech. Once a software engineer speaks the code, Serenade’s engine feeds that into its natural-language processing layer, whose machine-learning models are trained to identify and translate common programming constructs to syntactically valid code…

Talon has several components to it: speech recognition, eye tracking, and noise recognition. Talon’s speech-recognition engine is based on Facebook’s Wav2letter automatic speech-recognition system, which [founder Ryan] Hileman extended to accommodate commands for voice coding. Meanwhile, Talon’s eye tracking and noise-recognition capabilities simulate navigating with a mouse, moving a cursor around the screen based on eye movements and making clicks based on mouth pops. “That sound is easy to make. It’s low effort and takes low latency to recognize, so it’s a much faster, nonverbal way of clicking the mouse that doesn’t cause vocal strain,” Hileman says…

Open-source voice-coding platforms such as Aenea and Caster are free, but both rely on the Dragon speech-recognition engine, which users will have to purchase themselves. That said, Caster offers support for Kaldi, an open-source speech-recognition tool kit, and Windows Speech Recognition, which comes preinstalled in Windows.


I’m pretty sure it will. The possibilities are huge as we could already see on many of the other uses of voice recognition nowadays.
I’ll be paying attention to some of the projects you shared with us in this article.

1 Like

Yes and no…voice recognition is great and all, but it goes back to the same reason why we still choose typing over “speak to type” solutions. Sometimes “speak to type” is good for things such as live broadcasting for closed captioning, but even those solutions fall short. The issue with voice input comes mainly down to the trained model and also the outside variables that could take place. For example if I’m in a close environment then I know for a fact that the microphone is more than likely going to pick up what I am saying correctly and display/record it correctly.

1 Like