Google Dialogflow vs Nuance 10 – Which is best?

Let’s discuss the pros and cons of Google Dialogflow and Nuance (Nuance 10). Before we get to the meat of this, lets first explain who they are.

Google Dialogflow

Google Dialogflow is a service for building conversational interfaces for voice and Text. For the voice part, this means the product can understand the words (or utterances) spoken to it. However, the crown jewels of Google Dialogflow is that it uses Natural Language Processing (NLP). NLP is the holy grail when it comes to speech recognition as it enables a caller to speak naturally to the system. In the right hands the Speech system will then respond back in the same natural way.

We at ROC wanted to know if this new whizzy ASR with NLP is better from a performance point of view when up against the industry Powerhouse Nuance and their Nuance 10 product. So, what’s Nuance 10?

 

Nuance 10

For those like me who have been using Nuance ASR since the year 2000, Nuance 10 is the predecessor for the older versions of Nuance ASR like Nuance 8.5 and Nuance 9. The latest version using Artificial Intelligence (AI), neural networks and deep learning to achieve the exceptional performance.

The senior members of Route One Connect have collectively been building IVR solutions for over 40 years (that’s not including those that are not on the senior leadership team). During this time, the IVR landscape was dominated by a few big brands like Avaya (or AT&T, Lucent), Cisco or GenesysHowever, these guys just provided the ability to write IVR code. Some had the ability to recognise canned utterances like yes/no or number strings, but they were never really at the table when you needed the exceptional performance customers expected from their self-service systems. If you wanted speech recognition in your IVR then Nuance were the only player (until amazon came along to disrupt). 

Green Nuance Communications logo

The Criteria When Comparing Google Dialogflow With Nuance

In reviewing these 2 products we need to consider the wider picture to ensure a balanced view. The areas that are important are as follows:

  • Building a recognition state.
  • What does the architect look like?
  • Licensing model
  • Deployment method
  • Tuning

Building A Recognition State

The mechanism you use to build a Nuance 10 grammar is different when compared to how you build an Google Dialogflow input state.

With Nuance you need to have a level of programming and an understanding of XML, VXML and GRXML formats. If you have this, then you can start to play around building a grammar. Be warned, for complex input state where you will want to build a natural language interface you will require a much deeper understanding of programming. 

Here is an example of what a Nuance grammar may look like for a very simple Yes, No grammar.

A Nuance grammar containing a list of yes and no utterances

Google Dialogflow is much simpler. You simply create a new Bot, add Intents (For example ‘Yes’ or ‘No’) and then manually add some sample utterances phrases). It’s straight forward really. 

a microphone being used to listen to ASR utterances

ASR Software Build

An important thing to note when comparing Google Dialogflow vs Nuance is that Nuance 10 requires the customer to stand up infrastructure (physical, Virtual or cloud). This infrastructure can be Windows or Linux based but does need to be sized accordingly. A single server (depending on server specifications) will only be able to cope with a certain amount of traffic. You will need to factor this in if you are to deploy a huge system as you will need multiple server (maybe as many as 15).

Also, bear in mind that should traffic change the setup is fixed so you may need to buy more infrastructure if the system gets busier over time (which is often the case). The fact you have to statically assign and configure the infrastructure up front means you also have to hope you don’t over subscribe and buy to many servers than are necessary…. and do you size for peak load or average… this setup is quite complicated when you think about it especially when you have to install a license manager, recogniser software, NSS and a Nuance Management Station. 

For Google dialogflow, well you don’t need to install anything. You don’t need to buy hardware as the good people at amazon have already provided all the things you need to start. It automatically scales so if volume increases you do not need to go out buy additional capacity. Very Simple! 

a license page with many lisenses

License And Cost Model.

Nuance 10 requires the customer to purchase Nuance ports. 1 port typically represents 1 call. Normally most customers will have 1 Nuance license for every IVR port. This means sizing the system is critical. Order to many you don’t get your money back, order too few and you have to go back to the finance team and request more funds for infrastructure and licenses. 

Google Dialogflow’s cost model is completely different. A pay as you go model where you only ever pay for what you use. That’s it! If call volumes spike you do not need to worry as Amazon have all the capacity you could ever need in the cloud. 

two computers sending data across the network to each other

Deployment Method

This is especially important to. As you could see from the section above called ‘Building a recognition state’ the Nuance grammar is a physical file. It’s possible to make the file dynamic by changing it from a file to something served up by a web server (something like a JSP). However, both the physical file or JSP require you to deploy on the Nuance server. If the content of the grammar is huge then you need to go through a process of compiling the grammar. This process is service affecting by the way. It means any changes to grammars should really be done OOH which has implications for rollback. 

For Amazon Connect you simply click the ‘Build’ button followed by ‘Deploy’. There is no disruption to service when going through this process. 

Amazon Lex tuning

Tuning The Input State

Again, Nuance and Amazon couldn’t do this more different! 

With Nuance 10 you must turn utterance and Logging on. You must also consider the load it will have on the server. The storage space required to store the logs and utterance recordings can be extremely high. If you do this, you have to bear in mind the implications it might have on PCI compliance. 

Once utterance and logging are enabled you then need to go through a tuning exercise. You are advised to use a competent data scientist to review all data available and then make suggestions 

Once the suggestions are documented (normally in a tuning report) this can then be handed to the developer. The developer will then code, test and build the new grammar. Unfortunately, you need to stop the Nuance services, copy the new grammars (and preload.xml if the grammar is large), then restart the Nuance services. Normally at this point you have your fingers crossed that the grammars work perfectly otherwise a role back was in order (and a load of grief from the customer). Historically this has been a costly exercise for those involved 

Amazon has simplified this processFirstly, amazon transcribes the utterances it did not understand on a nightly basis. Second, it has a very handy feature where you can simply add the missed utterances to an intent. Third you simply click ‘Build’ and then ‘Deploy’. It’s so simple that is means you can go through quick, daily tuning exercises to bring the recognition performance up. This increases the speed making it very cost effective. Amazon Lex winners trophy

Conclusion – Good The Bad And The Ugly ?

Nuance 10

Good

  • Nuance 10 is a product that has grow and tweaked over 20 years
  • It is recognised as having great performance

Bad

  • Fixed licensing, not elastic – if you buy 100 ports and don’t use them all then its a waste if you haven’t bought enough you have to buy more. Contact centre call volumes fluctuate so this is challenging to get right.
  • Nuance 10 requires infrastructure to be hosted and maintained. When you need to increase capacity you also have to increase your hardware footprint. This is not ideal for a growing business.
  • If you require disaster recovery you will need to buy twice the license
  • Scalable – the bigger the system is the more hardware you need.

Google Dialogflow

Good

  • Serverless technology so no hardware to manage
  • Pay as you use. Not tied to licencing.
  • Scalable – As demand changes you do not need to worry about adding more hardware.
  • DR is taken care of at no extra cost.
  • Speech science work is made simpler by user friendly GUI.

Bad

  • New ways of managing speech science and speech data requires more modern approaches. Resourcing for such skills are harder to find.

 

We have another blog about Amazon Lex and how it can be used for Conversational AI, click here for more information.