SMT

The brain is no computer

Right with its development, the comparision of the computer and our brain started and it goes on till today. If I remember correctly, in the times before, the comparision was between complex mechanical systems as steam machines and the brain as you may see in idioms as "letting of steam". But todays allegories go further - people not only compare the brain with a computer but also think it would indeed work likewise. In the Machine Translation discursus sometimes there would be the argument, that a human brain would not use language e.g. like a statistics based system (and here comes Chomsky who claims it would work with a lexicon and a grammar which is also wrong). The answer often is the comparision with a plane which does not fly as a bird does - but it flies. The attempts to let planes fly like birds were not as successfull as those that used the internal rules (i.e. the laws of thermodynamics) but adapted them to large objects made of steel. So: It does not matter if the brain works like a computer, it matters if we do the right things with brains/computers to make them intelligent. Ok, not quite the discussion I started with. Here is a very interesting article about the brain and how it works and why it is not a computer at all:

Senses, reflexes and learning mechanisms – this is what we start with, and it is quite a lot, when you think about it. If we lacked any of these capabilities at birth, we would probably have trouble surviving.

But here is what we are not born with: information, data, rules, software, knowledge, lexicons, representations, algorithms, programs, models, memories, images, processors, subroutines, encoders, decoders, symbols, or buffers – design elements that allow digital computers to behave somewhat intelligently. Not only are we not born with such things, we also don’t develop them – ever.

aeon: The empty brain

Interlingua in Google Translate

Machine Translation is the master discipline in the computational linguistics; it was one of the first major tasks defined for computers back in the times of Post-World War II. Warren Weaver, an American science administrator stated in a famous memorandum called "Translation" in 1949: „It is very tempting to say that a book written in Chinese is simply a book written in English which was coded into the 'Chinese code'. If we have useful methods for solving almost any cryptographic problem, may it not be that with proper interpretation we already have useful methods for translation?

After many ups and downs in the coming decades, the first real breakthrough came with fast PCs, fast web connections and the possibility to compile and process immense language data sets. But instead of compiling grammar sets in order to define one language and than another and their relationships, the use of statisical models became en vouge: Instead of years of linguistical work, they used some weeks of processing with similar results. While rules based systems created nice looking sentences with often stupid word choiced, statistics based systems created stupid looking sentences with good phrase quality. One thing, linguists as well as statisticians were always dreaming about was the so called Interlingua. A kind of a neutral language in between which would allow to translate the pure meaning of one sentence into this Interlingua and afterwards to construct a sentence in the target language that bears the same meaning. There is a common three step pyramide to the describe the raising quality of machine translation:
First level: Direct translation from one language to another
Second level: Transfer using one elaborated way or another, e.g. rules, statistics, etc.
Third level: Using an Interlingua.

There were many attempts, from planned languages as Esperanto up to semantic primes and lexical functions - the result was always the same: There is no Interlingua. "Meaning" is a to complex concept to model it in a static way.

In 2006, Google released Google Translate, a nowadays very popular system of MT that was statistics based originally, created by the German computer scientist Franz Josef Och (not at Human Longevity). This was an event that inspired me in a very personal way to focus my linguistics career on computational lingustics and inspired me to write my Magister Thesis with the Title "Linguistic Approaches to improve Statistical Machine Translation" (Linguistische Ansätze zur Verbesserung von statistischer maschineller Übersetzung) at the University of Kassel. This is 10 years ago. Recently, I talked to a friend about the success of the Google AI beating of the first Go-Master Lee Sedol using a neural network. Would this be able to change Machine Translation aswell? 

In September, Google announced in their research blog that they are switching their Translation system from statistics based to the Google Neural Machine Translation (GNMT), "an end-to-end learning framework that learns from millions of examples, and provided significant improvements in translation quality". This system is able to make zero shot translation, as they write in an article published three days ago, on November 22th. A zero shot translation is a translation between two languages while the system does not have examples of translation between those two, e.g. it is trained by examples to translate between English and Japanese and between English and Corean, a zero shot translation would be between a data-less translation Japanese and Corean.. As Google state in their blog:

To the best of our knowledge, this is the first time this type of transfer learning has worked in Machine Translation. 
The success of the zero-shot translation raises another important question: Is the system learning a common representation in which sentences with the same meaning are represented in similar ways regardless of language — i.e. an “interlingua”?

This is indeed hard to tell: Neural networks are closed systems. The computer is learning something out of a data set in an intelligent but incomprehensible and obscure way. But Google is able to visualize the produced data and you've got to take a look at the blog post to understand this in detail, but: 

Within a single group, we see a sentence with the same meaning but from three different languages. This means the network must be encoding something about the semantics of the sentence rather than simply memorizing phrase-to-phrase translations. We interpret this as a sign of existence of an interlingua in the network. 

Google, this is awesome! Thank you so much for sharing!

Image: Mihkelkohava Üleslaadija