AI voiced novel misses the mark.

I grit my teeth and pressed the button, brushing aside my ethical misgivings about depriving a voice actor of a fee. I’d accepted an invitation from my publishing platform to turn one of my novels into an audiobook using their AI voice program. The truth is that my fiction ‘business’ runs at a loss, and I had no plans to invest in a voice actor; I was motivated more by curiosity, especially since the company wasn’t charging me a cent.

Their audiobook building process is dead simple: They put your ebook version up on screen. You choose your voices from a selection of American and British examples (I chose three – one for each character) and hit the narration button.

My baby began to speak! The narration was startlingly realistic; I was in awe of the technical virtuosity of the AI engine under the hood – and I still am despite my later reservations.

My job as author was to tune the engine as it hummed along – principally by repairing pronunciation errors. The toolkit is pretty simple: You respell the incorrect word to achieve the right pronunciation, e.g. respelling row as roe to block it rhyming with how. But a good handful of errors were resistant to my efforts, despite some ingenious tactics based on my expertise in phonetics.

The audio conversion runs in real time, but it took me several hours to stop and fix errors. While the voices in the audio version sounded sort of authentic, something was slightly off. All the major phonetic elements were well executed: Properly pronounced vowels and consonants, word stresses on the right syllables, sentence stresses mostly correct, longer segments like phrases, clauses and sentences overlaid with intonation contours to signal when they began and ended.

So what was slightly off? I think the problem lay in the way that the AI engine tackles intonation, a speech mechanism that conveys all kinds of meanings. You can test the magic of intonation by reading aloud “I love you” in ten different ways: You might find yourself conveying passion, regret, sadness, anger, desperation, or even sarcasm – each rendition depending on the context. And it’s likely that the intonation pattern you use is motivated by clues that go back over several sentences or even paragraphs. But while the AI gizmo uses intonation patterns that sound human in isolation, they don’t seem to reflect emotional cues beyond the current sentence. I didn’t see evidence that it ‘remembers’ elements of the text that would motivate subtle intonation patterns.

Don’t get me wrong: We’ve come an unbelievably long way since the first Dalek croaked EXTERMINATE, EXTERMINATE. There are vast numbers of applications for AI voicing where authentic human affect is irrelevant. But my novel sounded emotionally insincere and – dare I say it? – robotic, despite the dazzling technical feat behind its production.

Midway through writing this piece, I jumped onto the audiobook catalogue to listen to the free sample of the book. This time it wasn’t so dazzling. The rendition had an odd, jumpy singsong quality that I attribute mostly to intonation problems. And one sentence slipped disastrously from fruity Midsomer Murders British into a variety of American.

The customers evidently didn’t like it. The ebook has racked up about 2000 sales over the years, was an online bestseller for a day in 2016, and has a ranking of 3.6 stars and 73 genuine reviews. I still get a dribble of ebook sales without doing any serious promotion.

How did the audio book go? Zero sales.

EXTERMINATE, EXTERMINATE.

Check out my books here.

2 thoughts on “AI voiced novel misses the mark.

  1. Very interesting examination of your own work through a different medium. I learned quite a bit from it. And I loved your closing phrasing!

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.