Interlingua in Google Translate

Machine Translation is the master discipline in the computational linguistics; it was one of the first major tasks defined for computers back in the times of Post-World War II. Warren Weaver, an American science administrator stated in a famous memorandum called "Translation" in 1949: „It is very tempting to say that a book written in Chinese is simply a book written in English which was coded into the 'Chinese code'. If we have useful methods for solving almost any cryptographic problem, may it not be that with proper interpretation we already have useful methods for translation?

After many ups and downs in the coming decades, the first real breakthrough came with fast PCs, fast web connections and the possibility to compile and process immense language data sets. But instead of compiling grammar sets in order to define one language and than another and their relationships, the use of statisical models became en vouge: Instead of years of linguistical work, they used some weeks of processing with similar results. While rules based systems created nice looking sentences with often stupid word choiced, statistics based systems created stupid looking sentences with good phrase quality. One thing, linguists as well as statisticians were always dreaming about was the so called Interlingua. A kind of a neutral language in between which would allow to translate the pure meaning of one sentence into this Interlingua and afterwards to construct a sentence in the target language that bears the same meaning. There is a common three step pyramide to the describe the raising quality of machine translation:
First level: Direct translation from one language to another
Second level: Transfer using one elaborated way or another, e.g. rules, statistics, etc.
Third level: Using an Interlingua.

There were many attempts, from planned languages as Esperanto up to semantic primes and lexical functions - the result was always the same: There is no Interlingua. "Meaning" is a to complex concept to model it in a static way.

In 2006, Google released Google Translate, a nowadays very popular system of MT that was statistics based originally, created by the German computer scientist Franz Josef Och (not at Human Longevity). This was an event that inspired me in a very personal way to focus my linguistics career on computational lingustics and inspired me to write my Magister Thesis with the Title "Linguistic Approaches to improve Statistical Machine Translation" (Linguistische Ansätze zur Verbesserung von statistischer maschineller Übersetzung) at the University of Kassel. This is 10 years ago. Recently, I talked to a friend about the success of the Google AI beating of the first Go-Master Lee Sedol using a neural network. Would this be able to change Machine Translation aswell? 

In September, Google announced in their research blog that they are switching their Translation system from statistics based to the Google Neural Machine Translation (GNMT), "an end-to-end learning framework that learns from millions of examples, and provided significant improvements in translation quality". This system is able to make zero shot translation, as they write in an article published three days ago, on November 22th. A zero shot translation is a translation between two languages while the system does not have examples of translation between those two, e.g. it is trained by examples to translate between English and Japanese and between English and Corean, a zero shot translation would be between a data-less translation Japanese and Corean.. As Google state in their blog:

To the best of our knowledge, this is the first time this type of transfer learning has worked in Machine Translation. 
The success of the zero-shot translation raises another important question: Is the system learning a common representation in which sentences with the same meaning are represented in similar ways regardless of language — i.e. an “interlingua”?

This is indeed hard to tell: Neural networks are closed systems. The computer is learning something out of a data set in an intelligent but incomprehensible and obscure way. But Google is able to visualize the produced data and you've got to take a look at the blog post to understand this in detail, but: 

Within a single group, we see a sentence with the same meaning but from three different languages. This means the network must be encoding something about the semantics of the sentence rather than simply memorizing phrase-to-phrase translations. We interpret this as a sign of existence of an interlingua in the network. 

Google, this is awesome! Thank you so much for sharing!

Image: Mihkelkohava Üleslaadija 

On the tracks of Lovecraft

Two months ago I traveled the East Coast and urged my family to visit Providence / Rhode Island as well. We just made a minor stop there but I tried to see as many sites as possible that are related to the famous horror-, fantasy- and science fiction-author Howard Philips Lovecraft. Due to the fact that we were there for only a night and half a day we just saw the Lovecraft Square, one of his former houses, some streets near Brown University and, of course, his grave a the Swanpoint Cemetery. Unfortunatley, I was not able to visit the Brown University Library in order to study the Lovecraft Collection. I hope that I will have a chance to get there later.
In the Lovecraft Arts and Sciences Shop in the center of Providence I bought "The Annotated Lovecraft", a monumental tome with the most relevant stories by Lovecraft, completed by hundreds of interesting annotations explaining his inspirations, hidden connections and scientific or occult backgrounds. My salutations to the nice shop keeper - we talked a moment about living in Berlin and making music and it was so cool. Thanks!
(Note my Cthulhu-themed t-shirt by one of my favorite bands, the German independent gothic classic horror psycho chamber story-telling industrial rockers Janus)

Game of Thrones: Hodor Translation

One of my favorite characters in Game of Thrones is Hodor, the so called "gentle giant" although there are more nameworthy giants in GOT aswell. Hodor is only able to say his name, but as actor Kristian Nairn statet, he has found 70 different ways to do so. But, to be honest, his leaked script does not emphasize the one or the other way to pronounce it.

In the course of the show it became revealed, that Hodor was not always simple minded but that some incident in his past changed the stable boy to the Buddha-like Hodor we know. What happend, was revealed in Season 6 Episode "the Door". If you don't know it yet and don't want to be spoilered, you should avoid reading about GOT in the web. Otherwise you can see here a nice example of a classical translation problem: A word (the name "Hodor") reveals a hidden meaning a long time after you used it the first time so you have had no possibility to adapt the word in a way that allows to reveal the hidden meaning adequatly in the target language aswell - and now you've got to deal with this. How could one ever know, that "Hodor" is kind of an abbreviation of "Hold the door"? For languages similar to English as German, it was a comparatively easy task as the word "Hold" in German is "Halt" and the word "Door" may be translated with the related word "Tor" (which actually means "gate" but it may be tolerable to be used for doors aswell), so you can make "Hold the door = Hodor" to "Halt das Tor = Hodor" without problems. Other languages as Russian have had bigger issues with this, as you may see in this interesting overview or even hear in this "language test". 


Image: Kristian Nairn speaking at the 2016 San Diego Comic-Con International in San Diego, California by Gage Skidmore

We all live in a Virtual Reality

Human civilization has always been a virtual reality.  At the onset of culture, which was propagated through the proto-media of cave painting, the talking drum, music, fetish art making, oral tradition and the like, Homo sapiens began a march into cultural virtual realities, a march that would span the entirety of the human enterprise.  We don’t often think of cultures as virtual realities, but there is no more apt descriptor for our widely diverse sociological organizations and interpretations than the metaphor of the “virtual reality.”  Indeed, the virtual reality metaphor encompasses the complete human project.

How VR Gaming will Wake Us Up to our Fake Worlds by Eliott Edge



Image: Virtual Reality Demonstrations; The 2015 ISOJ on the University of Texas-Austin campus, Apr. 18, 2015. Gabriel Cristóver Pérez/Knight Center

Article: Rules for Survival in Autocracies

I have lived in autocracies most of my life, and have spent much of my career writing about Vladimir Putin’s Russia. I have learned a few rules for surviving in an autocracy and salvaging your sanity and self-respect. It might be worth considering them now:

Rule #1Believe the autocrat. He means what he says. Whenever you find yourself thinking, or hear others claiming, that he is exaggerating, that is our innate tendency to reach for a rationalization.

NYR Daily - Masha Gessen: Autocracy: Rules for Survival

Wikipedia gives us this definition of autocracy
An autocracy is a system of government in which supreme power is concentrated in the hands of one person, whose decisions are subject to neither external legal restraints nor regularized mechanisms of popular control (except perhaps for the implicit threat of a coup d'état or mass insurrection). Absolute monarchy and dictatorship are the main historical forms of autocracy. In very early times, the term "autocrat" was written in coins as a favorable feature of the ruler, having some connection to the concept of "lack of conflicts of interests".

Incredible WaveNet Speech Synthesis

Yaaaay, there is certainly some magic in deep neural networks - after mastering Go or making huge progress in the field of Spoken Language Recognition, Google now presents WaveNet, a deep neural networks-based approach to Speech Synthesis. It sound astoundingly real and even can compose music or fictional languge-like sounds. Amazing. And spooky. 

WaveNet changes this paradigm by directly modelling the raw waveform of the audio signal, one sample at a time. As well as yielding more natural-sounding speech, using raw waveforms means that WaveNet can model any kind of audio, including music.



Metric Time

As I haver major difficulties in using the 12-based time system (and the 24-based aswell) I would recommend changing to metric time...

Metric Time (MT) is an attempt to create a decimalized time system for our modern base-10 using world. This is a neglected part of the Metric System (or SI) which has created a whole measuring system based on 10 for mass, distance, volume, etc., but no official decimalized time units for normal day-to-day use. Since any system for measuring time is arbitrary, we should be using one that is most practical for us. I think that system is Metric Time.

Guide to Metric Time


Lone Wolf - The Boardgame and the Kickstarter

Some time ago, I stumbled upon the Lone Wolf Boardgame Kickstarter project which was recently funded (they are always already funded or at least ended when I find them). Luckily, I could preorder it which seemed to be a useful way to
a) instrumentalize my current euphory to spend money for the board game
b) give my support to the project team and especially the inventor and main artist of the game, the original Lone Wolf illustrator Gary Chalk
c) get the game after release without having to remember it a year later

Unfortunatley, this was the first Kickstarter project I experienced by myself (although I have not funded it, it felt like this). The main problems seemed to be in the communication between the project partners, Graywood Publishing and Megara Entertaintment, as well as some errors in creation. By accident, the game board was produced in a too good quality, so it raised the shipping costs as the games weight increased significantly. I could live with all this and I was not sorry to wait longer for the game, especially with this quality and the beautiful illustrations by Gary. But the thing that was really sad to see was the conflict emerging between Graywood and Megara. This was neither necessary nor helpful but it was a mess to watch this helplessy. Some of the project posts just consisted of washing dirty linen in public. Guys, I really hope you will come together one day and talk this out. You made a wonderful game and set a monument for a beloved artist. Being in contact with you was always a pleasure. I finally got my game and the quality is the best I have seen for a board game so far. Be proud of it!

Lone Wolf

Do you know Lone Wolf? It is one of the most famous adventure book series ever and was pretty popular among fantasy interested teens in the late eighties resp. early nineties. I loved them. I played them hundred of times. They were the best. I also tried some other books, obviously by the founders of the genre but I didn't find them as intriguing and fascinating as the books about Lone Wolf. Long story short: Lone Wolf is the last survivor of the order of the Kai lords which may be described as medieval jedi-eske warrior monks with super powers. Lone Wolf got to end his training by himself, revive the order of the Kai Lords and find Dark Lord Gnaag to end his tyranny and take revenge. 

Ok, writing this it sounds pretty standard. There may be several reasons why I judge this as great nevertheless. 

1. There was no Harry Potter and no Star Wars Prequel (nor Sequel) and no Lord of the Rings (nor Hobbit) Blockbusters, so finding revenge by killing someone called Dark Lord was not sooooo omnipresent as it may seem today.

2. I was around 12 years old, so this was prette much the beginning of my journey through fantasy literature and roleplaying games.

3. The quality of those 12 books (1) , written by Joe Dever and majorly illustrated by Gary Chalk, was higher than my 2 lines sum up could ever be. Telling you there is a Dark Lord to be killed is a lot less interesting and intruguing than surviving 12 interactive and brillantly written books full of deadly traps and mysterious encounters in order to kill the Dark Lord finally by yourself. 

By the way, there is a pretty new release of the books, if you are interested. I can totally recommend them. OR you could play the interactive eBook, which seems to be half book and half RPG. Looks good aswell. 
AND I have seen there are new books (in form of conventional fiction), co-written by the grandmaster Joe Dever himself. I should risk a look myself, I guess. This is also true for the games that have never been translated to German.

Oh, yeah, AND you can read my post on the Lone Wolf Boardgame. Soon.

(1) As I notice right now, there are 28. In German, there are 12. Wow.