Specialists in oral translation services

Sunday, June 18, 2023

AI in Conference Interpreting: Transcending The Tower of Babel, One Algorithm at a Time

 Por Patrícia Lluch, AIB

If there's one thing I've learned in my 29 orbits around the sun as a conference interpreter, it's that language is just as complex, idiosyncratic, and unpredictable as the humans who speak it. However, it seems we've entered an era where artificial intelligence (AI) is starting to insist, "Hold my beer."

AI technologies are now a significant part of the conference interpreting landscape, and my interpretation booth is increasingly becoming more like an episode of Star Trek, minus the funky Spock ears, of course. From Automated Speech Recognition (ASR) to Text-To-Speech (TTS) systems, end-to-end speech translation to AI-based sentence segmentation, the gadgets and gizmos that interpret for us have taken center stage. But as with any good drama, there are plot twists galore!

Starting with ASR, these platforms have achieved a feat that many thought impossible: listening to humans ramble on and making sense of it. Well, at least to an extent. These systems can transcribe spoken language into written text with surprising accuracy. Now, if only they could decipher the mysterious language my teenager speaks at home!

Then there's Text-To-Speech (TTS) technology. If you thought robotic voices were limited to Siri's blandly charming instructions or your GPS's emotionless directions, think again. TTS has become sophisticated enough to parrot human speech quite convincingly, though it still falls short of mastering the delicate art of sarcasm. 

But the real showstopper in our AI ensemble cast is the end-to-end speech translation. In theory, it's a seamless process. You input the spoken word in one language, and the system spits out the equivalent in another. Magic, right? However, in reality, it's a bit like trying to juggle flaming torches while riding a unicycle. The system grapples with cascading errors, where one minor mistake can trigger a snowball effect and cause an avalanche of inaccuracies. Remember the game of Telephone we used to play as kids? Exactly.

Latency is another buzzkill. AI might be able to play chess, compose music, and, I don't know, probably make a mean Margarita by now, but speed can still be a sticking point. A delay in interpreting can turn a well-timed joke into an awkward silence, or a critical negotiation into a miscommunication debacle. 

Now let's talk sentence segmentation. Human speech isn't exactly known for its perfect punctuation (we're not writing an English essay here!). AI often struggles with this, awkwardly inserting breaks where none exist and sounding as natural as a kangaroo on a treadmill. 

Despite these hurdles, AI is becoming increasingly adept at spontaneous effects, such as adapting to different accents, dialects, and speech rates. However, it still tends to get thrown off by a sudden influx of regional slang or fast-paced delivery, and can sometimes resemble a flustered tourist in a foreign country. 

On the bright side, these AI systems won't complain about working overtime, demand coffee breaks, or lose their cool when asked to interpret a heated debate. And unlike their human counterparts, they're immune to conference-room fatigue, jet lag, and that one delegate who insists on speaking at the speed of light.

AI in conference interpreting is undoubtedly a game-changer. But for all its dazzle, it's worth remembering that these systems are still "learning" languages, much like a determined student buried under stacks of foreign language books. The beauty of language, with all its nuances, quirks, and unpredictability, continues to keep us human interpreters in business.

Until the day an AI can appropriately interpret a diplomatic faux pas, or seamlessly weave a culturally sensitive joke into a translation, this 54-year-old will keep her headset firmly in place, ready to jump into the linguistic fray. Because, let's face it, even if AI can translate a joke, it won't be able to laugh at it...at least not yet.


Probablemente habréis notado ya que este texto no lo he escrito yo. Ha sido generado por ChatGPT-4. El punto de partida ha sido esta primera orden:

"write a blog article from the perspective of a 54 year old conference interpreter about the impact of the use of IA in conference interpreting, which systems exist, what are their limitations and their advantages in comparison with human interpreters. Mention automated speech recognition, TTS, ent-to-end speech translation, latency, sentence segmentation, cascading errors and spontaneous effects. Use a journalistic style with a touch of humor."

He cambiado tan solo dos cosas del texto resultante: "his" por "her" en el último párrafo y la cifra 54 por un 29 en el primero, para corregir la confusión entre mi edad y la duración de mi carrera como intérprete hasta este momento. No es que tenga ningún problema con mi edad, pero ponerme un cuarto de siglo de más de carrera así, alegremente, podría dejar un poco extrañados a aquellos que me conocen. Y mi sexo sigue siendo femenino.

A continuación le he dado una segunda orden: "rewrite the same article in a factual, journalistic style". El resultado ha sido:

"The Changing Landscape of Conference Interpreting: A Look at the Influence of AI Technologies"

In the evolving field of conference interpreting, artificial intelligence (AI) technologies such as Automated Speech Recognition (ASR), Text-To-Speech (TTS), and end-to-end speech translation are becoming increasingly prevalent. These technologies are significantly reshaping the way conference interpreting is performed, offering both new capabilities and challenges.

ASR technology, which transcribes spoken language into written text, has demonstrated remarkable accuracy in recognizing and understanding human speech patterns. This advancement has allowed for a more efficient transcription process, though the technology continues to face challenges, particularly in handling complex human speech filled with colloquial expressions and nuanced meanings.

Similarly, TTS systems have progressed, providing a more human-like quality to synthesized speech. However, the expression of emotion, sarcasm, and certain speech nuances remain outside the current scope of TTS technology.

One of the most noteworthy advancements in AI-based conference interpreting is the development of end-to-end speech translation systems. These systems aim to translate spoken language input into another language output directly, theoretically creating a seamless translation process. However, these systems often grapple with cascading errors, where one incorrect translation can cause a domino effect leading to major inaccuracies in the overall translation. 

Latency also presents a significant challenge in AI-based interpreting. Time delays in the translation process can disrupt the flow of conversations and may lead to miscommunication, particularly in time-sensitive discussions. 

Another inherent challenge for AI in conference interpreting is sentence segmentation. As human speech doesn't conform to the punctuation rules of written language, AI systems often struggle to recognize where a sentence begins or ends, affecting the fluency and naturalness of the translation output.

On a more promising note, AI technologies have made strides in adapting to spontaneous effects, such as different accents, dialects, and speech rates. Nevertheless, AI systems can still be easily overwhelmed by sudden shifts in regional slang or speech speed, underlining the need for continued refinement.

AI systems provide several significant advantages in conference interpreting. For instance, they can work continuously without requiring breaks, aren't susceptible to human fatigue, and can process large amounts of linguistic data. However, these systems are still "learning" languages and, as such, they cannot fully capture the depth of cultural and contextual nuances that human interpreters provide.

In conclusion, AI technologies are undeniably reshaping the field of conference interpreting. They offer new efficiencies and capabilities but also present ongoing challenges. As these technologies continue to evolve, their increasing sophistication and potential to assist human interpreters become more apparent. Yet, the nuances, spontaneity, and complexity inherent in human languages continue to demand the involvement of skilled human interpreters. This dual requirement indicates a future where AI technologies and human expertise will coexist and collaborate in the realm of conference interpreting.


Y después, admito que por pura curiosidad malsana y con una pizca de malevolencia, he copiado y pegado en AI Text Classifier un fragmento de 320 palabras del primer texto para ver si esta herramienta, que pertenece, al igual que ChatGPT, a OpenAI, lo reconocía como generado por una inteligancia artificial. La respuesta, sorprendentemente, ha sido: "Es altamente improbable que este texto haya sido generado por una inteligencia artificial. Hmmm...

Antes de sacar ningún tipo de conclusión de todo esto quiero que quede claro que no soy especialista en IA ni en su posible uso en el sector de la interpretación. Mi contacto con herramientas de IA es más que somero. De hecho se reduce exactamente a lo siguiente: hoy he usado por primera vez ChatGPT con el resultado que acabamos de ver, que en mi opinión no está nada mal, aunque la verdad es que lo encuentro algo soso. También he usado hoy por primera vez el clasificador de textos, que claramente no es fiable. Y en otro orden de cosas el 20 de marzo, a petición de Lourdes de Rioja, leí un texto en voz alta para probar Cymo, una aplicación de transferencia de palabra a texto. El resultado fue muy bueno y lo encontraréis colgado en el Instagram de Lourdes. Por lo demás he usado DeepL en distintas ocasiones en un contexto profesional, imagino que como todos a estas alturas... Y ¡eso es todo, amigos! Este es, pues, mi parco perfil de usuaria. 

Concluyo a partir de este mínimo contacto y viendo un poco hacia dónde está avanzando la profesión que lo que más nos conviene es aprovechar este universo de nuevas posibilidades en nuestro propio beneficio: que nos llega el texto de una intervención y sabemos que va a ser una tortura china hacerla directamente pero que si la pasamos por DeepL nos saldrá bordada, pues se pasa; que hay que hacer una consecutiva y consideramos que usar Cymo en paralelo con las notas nos puede ayudar, pues adelante. Total, en el fondo no estamos hablando de nada nuevo, nuestro objetivo siempre ha sido dar el mejor servicio posible al cliente y si estos sistemas nos ayudan, pues bienvenidos sean. A mí no se me van a caer los anillos, desde luego. 

Ahora bien, también creo que habrá que encontrar el punto de equilibrio entre el uso y el abuso. A ver si va a resultar que un buen día nos toca una intervención de las durillas, no nos han dado el texto y todo lo que sale de la cabina es tartamudeo inconexo porque se nos ha olvidado cómo se hacía. Cuidadito, que somos intérpretes, no locutores.