I have recently tried to review the Chinese -> English system. According to https://blogs.msdn.microsoft.com/translation/2017/11/15/microsoft-translator-accelerates-use-of-neural-networks-across-its-offerings/ , those systems were already switched to NMT models. There is also statement, that user can still use the statistical system when setting category to “SMT”.
However the https://blogs.msdn.microsoft.com/translation/2016/01/27/new-microsoft-translator-customization-features-help-unleash-the-power-of-artificial-intelligence-for-everyone/ mentions there were actually three standard categories available for SMT engines: General(default), TECH, SPEECH.
Could you please explain which domain is offered by the SMT category now? And for how long it will be supported on your side?
Thanks
3
Answers
We are working on customizaton using a neural network decoder. Currently, the Microsoft Translator Hub has 3 Category IDs for SMT and they are general, tech and speech.
With content that is not narrowly confined to your domain, you may find it to be better using category=generalnn than your current customization.
Chinese is using the NMT system so using Category=generalnn would result in the same translation when calling the service using the Microsoft Translator Text API.
The second article is addressing Customization where you can create your own custom translation system or dictionary tuned to your domain, style and terminology. If you’re interested in customization (SMT at this time), there are categories associated with using the Translator Text API and the Microsoft Translator Hub. The category identifies the domain for the project you create using the Hub. Two of the categories are Tech and Speech.
See the Microsoft Translator Hub User Guide to learn more about the Hub.
The tech category will produce different results only when translating FROM English to other languages. In the case of English>Chinese, with my sample sentence “My computer doesn’t boot up.”, it does. For Chinese>English, specifying “tech” will fall back to the default, which is neural in the case of Chinese<>English. “speech” generates the same results as “generalnn” in all cases.
It is generally true, including for Hub categories, that a category that is valid in one language pair is valid in all language pairs. The API will fail with an “invalid category” error only if that category doesn’t exist at all. The reason for this design is so that you can build your custom systems out language by language, over time, while still allowing the user to choose between all available languages, at the cost of, maybe, occasionally suboptimal domain vocabulary in an as of yet uncustomized language pair.
The API does not return to you whether a customized system was used or not. A trick to get that feature anyway is to watermark your custom system using a dictionary entry. Make a dictionary entry “_mywatermark” that translates to “CustomSystem180309_1700_en_ru” for instance, and then you can test anytime, in any application, whether you are getting your custom system or not.