Dragon NaturallySpeaking 12 Premium review
By Lamont Wood | Computerworld US | Published: 17:47, 29 November 2012
You decide what you want to say. You say it. The words appear on the screen.
Forget the frustrating months it took you to learn typing. In fact, you can forget that writing involves any particular effort. Today's powerful, multi-core computers, combined with the latest speech recognition software and a good microphone, can produce results that are, frankly, startling.
The technology has gotten so good, in fact, that the weak link in the system appears to be the user's ability to dictate. While this may sound like a trivial point, dictation turns out to be a distinct skill that involves factors that are not intuitive. But once the skill is mastered, keyboarding seems painfully primitive.
Related Articles on Techworld
Dragon NaturallySpeaking corrects a dictated sentence from Shakespeare's Hamlet: The word "town" is changed to "tongue." In this case the correct alternative is second on the list and can be designated by saying "Choose two."
While newer speech recognition mobile apps such as Siri and Google Now have grabbed most of the headlines, one of the longest-running and most well-known speech recognition software packages is Dragon NaturallySpeaking from Nuance.
There are a variety of versions available. For this review, I tried out Dragon NaturallySpeaking 12 Premium for Windows PCs, available for $199.99. Other versions include a Home Edition for $99.99, which does not integrate with spreadsheets or support off-line dictation and has no playback facility; a Professional Edition with enterprise-level administrative, customisation, and multi-user features for $599.99; and a similar Legal Edition with a law office vocabulary, also for $599.99. There is a version for the Mac called Dragon Dictate ($199.99), along with specialized Mac products for legal and medical workers.
A bit of background: I'm not new to speech recognition. In fact, I've been using PC-based speech recognition on and off for nearly two decades to alleviate the stresses of keyboarding. At first, speech recognition packages were more like frustrating toys with maddening limitations, but they have steadily improved over time.
The crossover point was probably NaturallySpeaking version 8 in 2004, when the utility of speech recognition finally outweighed its limitations. But limitations remained: speech recognition was still more reliable with long words than with short ones (making it popular with doctors); misinterpreted words were often rendered as commands with random and startling results (Bill Gates himself was the victim of this at a live demo in 2006); the software's demand on the hardware was nontrivial (so that switching between documents could be painfully slow); and the software could get confused to the point that it stopped listening.
The skill of dictation
Here are some tips you can follow that will make your use of voice recognition software easier and more effective:
- Enunciate carefully and speak slowly enough so that each word gets its due (although you don't have to go too slow). Remember, you are controlling a machine, not talking to a person.
- While speaking, envision the text you are seeking to produce. This will help you give equal heed to each word (so the computer can too), keep a steady rhythm and suppress "dysfluencies" like, ah, y'know.
- Watch the results on the screen as you go along. This may slow you down but will enhance your accuracy. To paraphrase Wyatt Earp: It's good to be fast, but it's better to be accurate.
- Even a momentary loss of focus can lead to misrecognition, especially of one-syllable words. But if you can maintain focus, the results can be far more accurate than typing.
- A big issue for novices is that they have learned to "think with their fingers," so suddenly removing the keyboard is a major impediment to composition. I have found it best to just speak the text as it comes to you without stopping for mistakes. You can edit it later.
- Finally, there is the environment. Background silence is best, but droning ventilators hurt recognition more than office chatter. Meanwhile, if you don't mind being overheard on the phone then you won't mind being overheard while dictating -- otherwise, find an office. You can use about the same volume for the phone and for speech recognition.
But with version 12, these factors have faded into the background (although they they haven't entirely disappeared). For example, you can dictate effectively at about half the speed of an auctioneer -- should you prove able to do so. Assuming that you stay focused while dictating, the error rate is now trivial (see sidebar).
An important part of that new reliability is the noise canceling headset microphone supplied with the software, which does not react to background noise. It made things a lot easier for me -- I had to turn off my previous microphones every time I stopped speaking to keep them from picking up other sounds. The Home and Premium versions come with a two-speaker analog headset, while the Professional and Legal versions come with a one-speaker USB headset.
Version 12 is outwardly not very different from previous versions, with the same interface and basic command scheme. The vendor claims that accuracy out-of-the-box is 20% better than that of version 11, and in my testing, that did seem to be the case. New features include an interactive tutorial, Bluetooth support, and enhanced support for Gmail and Hotmail.
Dragon installs from a CD; during the installation, it asks a number of questions about your age, gender and accent. (It also tests the microphone, and in my case was not happy until I had tried several ports.) It then listens to your voice during a short training session, taking about five minutes. (With early versions the training took easily 45 minutes.) You have the option to let it examine your document folders and outgoing email folders to look for commonly used words.
When invoked, Dragon puts a thin control bar across the top of the screen. You click an icon in this control bar to turn on the microphone. When you start to talk, text appears at the cursor. If you talk quickly, the text may fall as much as a sentence behind, but I found it invariably caught up fairly quickly. Punctuation marks must be pronounced.
If word X is misrecognized, you can adjust the software by saying "Correct X." Word X will then be selected and Dragon will present a list of possible corrections. If none of them match, you can spell the desired word. Thereafter, Dragon is more likely to recognize the word correctly. (With version 12, I found that one correction was always enough.)
On the other hand, if you simply decide you want to change word X, you say "Select X." Dragon assumes you want to change it as an editorial decision (rather than because there was a mistake), and will not alter its later recognition based on your change. You can also select arbitrary phrases, whole sentences or paragraphs in order to delete, move, or reformat, etc. by saying things like "select next three words," "select previous paragraph," or "select current line," etc.
Dragon (except in the Home edition) automatically records your dictation as you go along, and the Playback feature allows you to listen to what you said. This was useful in previous versions for situations when Dragon committed misrecognitions so bizarre that you had to check back to see what was originally said. I never found this necessary with version 12.
On the other hand, I found the Read Back facility quite useful; its synthetic voice reads selected text aloud from the screen (as opposed to playing back your voice recorded while dictating). The software's most common mistake while you're dictating is to misrecognize or skip one-syllable words, which are then hard to spot when you're proofreading your copy. But they leap out at you when the text is read aloud.
Dragon will also transcribe audio recordings in the WAV, WMA, DSS, DS2 and MP3 formats (again, this isn't available in the Home edition). The software will work with any voice on the recording but, naturally, will get the best results with the voice it has trained with. Possibly as a result, the transcription process takes about twice as long as that of simple one-person dictation.
Dragon also lets you control programs on your computer by speaking aloud shortcuts like "Click close." But this approach requires thorough knowledge of each program's command structure. When switching between multiple programs, I found that things tended to slip out of control and that it was good to have the mouse as backup.
For fine cursor control, you can say "mouse grid" and the screen will be divided into a 3 x 3 grid, each with a number. You can then say the number of the section you want, and that section is divided into a 3 x 3 grid. Then you can do it a third time, zeroing in on a screen section only a few pixels across. After an object is selected (using spoken mouse-click commands), you can move it with spoken mouse-move and mouse-drag commands, or by selecting the destination with another mouse grid operation. This is more tedious than using a physical mouse, of course, but it at least offers the potential for full voice control for those who need it.
While I did a lot of work with Dragon NaturallySpeaking as a user, I also wanted to test its performance on a more objective level. I ran the software on a 2.6GHz four-core Athlon II PC with 6GB of RAM and 64-bit Windows 7, using the analog headset microphone supplied with the software. (Nuance recommends at least a 2.2GHz dual-core processor, and 4GB of RAM.)
I first manually typed the 268 or so words of Lincoln's Gettysburg Address without stopping to correct typos. This took 6 minutes 47 seconds, for a throughput of about 40 dictionary words per minute or 43 five-keystroke words. About 10% of the words had errors.
I then dictated the speech, rattling it off in exactly 2 minutes. (Honest Abe would've been appalled.) There were 43 punctuation marks that, in speech recognition, have to be pronounced as words; as a result, the throughput was 165.5 dictionary words per minute. Dragon made two errors, for a recognition accuracy of 99.4%.
In other words, in my case dictation proved to be more than four times faster than keyboarding, and the error rate was more than an order of magnitude lower.