On rhyming algorithms

Really interesting articles on computer created poetry. Some of the examples are really touching or inspiring lyrics, as these lines created by a tool wrtten by Jack Hopkins:

The frozen waters that are dead are now

black as the rain to freeze a boundless sky,

and frozen ode of our terrors with

the grisly lady shall be free to cry

Deutschlandfunk: Lyrik zwischen Null und EinsWer reitet so spät durch Bit und Byte?

NewScientist: Neural network poetry is so bad we think it's written by humans


Image:"Magnetic Fridge Poetry" by Steve Johnson

Grimms Märchen auf Nordhessisch

Cover of Grimms Märchen auf Nordhessisch

English below

Ein kleiner Hinweis in eigener Sache: Ich bin Mitherausgeber des Buchs "Grimms Märchen auf Nordhessisch". Der Sammelband präsentiert erstmalig eine Auswahl von Grimms Märchen, von verschiedenen Autoren aus der Region in Mundarten aus Kassel und Nordhessen übertragen. Darunter zum Beispiel "Die Bremer Stadtmusikanten" übersetzt von dem allseits geschätzten Dark Vatter oder auch "Die Sterntaler" aus meiner Feder.

Wir stellen das hübsche Hardcover-Buch am kommenden Mittwoch, den 9.8., um 19 Uhr am Brüder Grimm Platz 4 in den Räumen der Brüder Grimm-Gesellschaft vor. Dazu möchte ich Euch herzlich einladen. Wer nicht kommen kann, kann sein Exemplar gerne jetzt schon im Shop der Brüder Grimm-Gesellschaft vorbestellen.


On Wednesday, 09-08-2017, at 7 pm we will present our new book "Fairy Tales of the Brothers Grimm in the Dialect of Northern Hessia" in the premises of the Association of the Brothers Grimm in Kassel, Germany. Although the Brothers Grimm lived most of their live in Kassel and collected the fairy tales while living here, there was never a collection of the fairy tales in the regional dialect before. One of the authors/translators is the well known dialect rock singer Dark Vatter (Bremen Town Musicians), I by myself wrote a translation aswell (The Star Talers). In case you are in town - at least there is still the documenta 14 going on, you are kindly invited to join us. If you cannot make it but are interested, here is the link to the shop of the Association of the Brothers Grimm.

Pronouncing English

Not only for foreigns speakers, it is impossible to guess the correct pronounciation (i.e. a pronounciation that is considered correct by a group of native speakers) of a given English word just by its spelling. This is a fact and if you don't believe me read the following poem by Gerard Nolst Trenitéout loud:

The Chaos

Dearest creature in creation,
Study English pronunciation.
I will teach you in my verse
Sounds like corpse, corps, horse, and worse.
I will keep you, Suzy, busy,
Make your head with heat grow dizzy.
Tear in eye, your dress will tear.
So shall I! Oh hear my prayer.
Just compare heart, beard, and heard,
Dies and diet, lord and word,
Sword and sward, retain and Britain.
(Mind the latter, how it’s written.)
Now I surely will not plague you
With such words as plaque and ague.
But be careful how you speak:
Say break and steak, but bleak and streak;
Cloven, oven, how and low,
Script, receipt, show, poem, and toe.
Hear me say, devoid of trickery,
Daughter, laughter, and Terpsichore,
Typhoid, measles, topsails, aisles,
Exiles, similes, and reviles;
Scholar, vicar, and cigar,
Solar, mica, war and far;
One, anemone, Balmoral,
Kitchen, lichen, laundry, laurel;
Gertrude, German, wind and mind,
Scene, Melpomene, mankind.
Billet does not rhyme with ballet,
Bouquet, wallet, mallet, chalet.
Blood and flood are not like food,
Nor is mould like should and would.
Viscous, viscount, load and broad,
Toward, to forward, to reward.
And your pronunciation’s OK
When you correctly say croquet,
Rounded, wounded, grieve and sieve,
Friend and fiend, alive and live.
Ivy, privy, famous; clamour
And enamour rhyme with hammer.
River, rival, tomb, bomb, comb,
Doll and roll and some and home.
Stranger does not rhyme with anger,
Neither does devour with clangour.
Souls but foul, haunt but aunt,
Font, front, wont, want, grand, and grant,
Shoes, goes, does. Now first say finger,
And then singer, ginger, linger,
Real, zeal, mauve, gauze, gouge and gauge,
Marriage, foliage, mirage, and age.
Query does not rhyme with very,
Nor does fury sound like bury.
Dost, lost, post and doth, cloth, loth.
Job, nob, bosom, transom, oath.
Though the differences seem little,
We say actual but victual.
Refer does not rhyme with deafer.
Fe0ffer does, and zephyr, heifer.
Mint, pint, senate and sedate;
Dull, bull, and George ate late.
Scenic, Arabic, Pacific,
Science, conscience, scientific.
Liberty, library, heave and heaven,
Rachel, ache, moustache, eleven.
We say hallowed, but allowed,
People, leopard, towed, but vowed.
Mark the differences, moreover,
Between mover, cover, clover;
Leeches, breeches, wise, precise,
Chalice, but police and lice;
Camel, constable, unstable,
Principle, disciple, label.
Petal, panel, and canal,
Wait, surprise, plait, promise, pal.
Worm and storm, chaise, chaos, chair,
Senator, spectator, mayor.
Tour, but our and succour, four.
Gas, alas, and Arkansas.
Sea, idea, Korea, area,
Psalm, Maria, but malaria.
Youth, south, southern, cleanse and clean.
Doctrine, turpentine, marine.
Compare alien with Italian,
Dandelion and battalion.
Sally with ally, yea, ye,
Eye, I, ay, aye, whey, and key.
Say aver, but ever, fever,
Neither, leisure, skein, deceiver.
Heron, granary, canary.
Crevice and device and aerie.
Face, but preface, not efface.
Phlegm, phlegmatic, ass, glass, bass.
Large, but target, gin, give, verging,
Ought, out, joust and scour, scourging.
Ear, but earn and wear and tear
Do not rhyme with here but ere.
Seven is right, but so is even,
Hyphen, roughen, nephew Stephen,
Monkey, donkey, Turk and jerk,
Ask, grasp, wasp, and cork and work.
Pronunciation (think of Psyche!)
Is a paling stout and spikey?
Won’t it make you lose your wits,
Writing groats and saying grits?
It’s a dark abyss or tunnel:
Strewn with stones, stowed, solace, gunwale,
Islington and Isle of Wight,
Housewife, verdict and indict.
Finally, which rhymes with enough,
Though, through, plough, or dough, or cough?
Hiccough has the sound of cup.
My advice is to give up!!!

And if you have no clue how to pronounce some of the words, take advise from this video: 

How many Jedi is or are the last Jedi

On behalf of the German language, I dare to say to my favourite all things sci fi blog io9: You are welcome. 

[...] another Star Wars: Episode VIII mystery has been answered—namely, whether The Last Jedi refers to a single Jedi or a group of Jedi, since both the singular and the plural form are “Jedi.” The solution comes to us via the release of several foreign-language titles for the film.


StarWars: Die letzten Jedi


Thank you, other languages who modify their adjectives to distinguish the quantity (and gender) of the nouns they’re modifying!

Indeed: Otherwise it would be: "Star Wars: Der letzte Jedi", or, if female: "Die letzte Jedi"; unfortunately we do not further separate male and female in groups, so "die letzten Jedi" may be a group of only men, only women or a mixed one. Spanish is e.g. helpful here, as the correct title would be "Las últimas Jedi" in the case there would be only women-Jedi left. As in other european languages, the male form is used as a generic form aswell, so they may only be men but also a mixed group according to the Spanish title. But this, I think, we knew already after seeing Rey and Luke in Episode VII.

Teamplayer, Ego Shooter, First Person Shooter

In German, the word "Ego Shooter" is often used for "first person shooter". This may be due to the fact that "Erste Person" (first person) is not a common forms to describe perspective. Instead, we say "Ego Perspektive" for "first person" and so developed the term "Ego Shooter". Unfortunatley, the word ego may also refer to egoism and it was just a matter of time until "Ego Shooter" is used as a dysphemism, for example here as an opposition to team player:

„Wer wird zum Teamplayer, wer zum Ego-Shooter, wer überwindet seine Ängste, wer wird der neue Dauerpatient von Dr. Bob und wer wird 2017 König oder Königin des Dschungels?“ (TV channel RTL in an announcement)

"Who will become team player [and] who will become Ego Shooter [...]"

Verband für Deutschlands Video- und Computerspieler: Als Dysphemismus verselbständigt


The brain is no computer

Right with its development, the comparision of the computer and our brain started and it goes on till today. If I remember correctly, in the times before, the comparision was between complex mechanical systems as steam machines and the brain as you may see in idioms as "letting of steam". But todays allegories go further - people not only compare the brain with a computer but also think it would indeed work likewise. In the Machine Translation discursus sometimes there would be the argument, that a human brain would not use language e.g. like a statistics based system (and here comes Chomsky who claims it would work with a lexicon and a grammar which is also wrong). The answer often is the comparision with a plane which does not fly as a bird does - but it flies. The attempts to let planes fly like birds were not as successfull as those that used the internal rules (i.e. the laws of thermodynamics) but adapted them to large objects made of steel. So: It does not matter if the brain works like a computer, it matters if we do the right things with brains/computers to make them intelligent. Ok, not quite the discussion I started with. Here is a very interesting article about the brain and how it works and why it is not a computer at all:

Senses, reflexes and learning mechanisms – this is what we start with, and it is quite a lot, when you think about it. If we lacked any of these capabilities at birth, we would probably have trouble surviving.

But here is what we are not born with: information, data, rules, software, knowledge, lexicons, representations, algorithms, programs, models, memories, images, processors, subroutines, encoders, decoders, symbols, or buffers – design elements that allow digital computers to behave somewhat intelligently. Not only are we not born with such things, we also don’t develop them – ever.

aeon: The empty brain

Der In Der In Der In Der In

Recently I stumbled upon my own blog article on linguistic repetition plays. As I write most of my blog posts mainly to remind myself of things, it was a quite interesting read ;-) In the meantime I have found another German repetition play I really like. It is presented in the form of a riddle: "Bilden Sie mal einen Satz mit viermal 'der in'" ("Build a German sentence that uses four times 'der in'?") The solution ist as easy as it is surprising:

Der Inder in der Inderin.

Translation: The male Indian inside of the female Indian. 

Those "Can you say a sentence" jokes have been pretty popular when my parents were younger so there is quite a number of them:

Sag mal einen Satz mit...

  • Dresden -> Steckst nen Finger in die Nase und dresden (i.e. drehst ihn)
  • Weihnachtsfest -> Der Hirsch hält sein Geweih nachts fest
  • Weihnachtsstern -> Mich würd so ein Geweih nachts stern (störn)

On this page by Michael Schreiner are a lot of other examples

Interlingua in Google Translate

Machine Translation is the master discipline in the computational linguistics; it was one of the first major tasks defined for computers back in the times of Post-World War II. Warren Weaver, an American science administrator stated in a famous memorandum called "Translation" in 1949: „It is very tempting to say that a book written in Chinese is simply a book written in English which was coded into the 'Chinese code'. If we have useful methods for solving almost any cryptographic problem, may it not be that with proper interpretation we already have useful methods for translation?

After many ups and downs in the coming decades, the first real breakthrough came with fast PCs, fast web connections and the possibility to compile and process immense language data sets. But instead of compiling grammar sets in order to define one language and than another and their relationships, the use of statisical models became en vouge: Instead of years of linguistical work, they used some weeks of processing with similar results. While rules based systems created nice looking sentences with often stupid word choiced, statistics based systems created stupid looking sentences with good phrase quality. One thing, linguists as well as statisticians were always dreaming about was the so called Interlingua. A kind of a neutral language in between which would allow to translate the pure meaning of one sentence into this Interlingua and afterwards to construct a sentence in the target language that bears the same meaning. There is a common three step pyramide to the describe the raising quality of machine translation:
First level: Direct translation from one language to another
Second level: Transfer using one elaborated way or another, e.g. rules, statistics, etc.
Third level: Using an Interlingua.

There were many attempts, from planned languages as Esperanto up to semantic primes and lexical functions - the result was always the same: There is no Interlingua. "Meaning" is a to complex concept to model it in a static way.

In 2006, Google released Google Translate, a nowadays very popular system of MT that was statistics based originally, created by the German computer scientist Franz Josef Och (not at Human Longevity). This was an event that inspired me in a very personal way to focus my linguistics career on computational lingustics and inspired me to write my Magister Thesis with the Title "Linguistic Approaches to improve Statistical Machine Translation" (Linguistische Ansätze zur Verbesserung von statistischer maschineller Übersetzung) at the University of Kassel. This is 10 years ago. Recently, I talked to a friend about the success of the Google AI beating of the first Go-Master Lee Sedol using a neural network. Would this be able to change Machine Translation aswell? 

In September, Google announced in their research blog that they are switching their Translation system from statistics based to the Google Neural Machine Translation (GNMT), "an end-to-end learning framework that learns from millions of examples, and provided significant improvements in translation quality". This system is able to make zero shot translation, as they write in an article published three days ago, on November 22th. A zero shot translation is a translation between two languages while the system does not have examples of translation between those two, e.g. it is trained by examples to translate between English and Japanese and between English and Corean, a zero shot translation would be between a data-less translation Japanese and Corean.. As Google state in their blog:

To the best of our knowledge, this is the first time this type of transfer learning has worked in Machine Translation. 
The success of the zero-shot translation raises another important question: Is the system learning a common representation in which sentences with the same meaning are represented in similar ways regardless of language — i.e. an “interlingua”?

This is indeed hard to tell: Neural networks are closed systems. The computer is learning something out of a data set in an intelligent but incomprehensible and obscure way. But Google is able to visualize the produced data and you've got to take a look at the blog post to understand this in detail, but: 

Within a single group, we see a sentence with the same meaning but from three different languages. This means the network must be encoding something about the semantics of the sentence rather than simply memorizing phrase-to-phrase translations. We interpret this as a sign of existence of an interlingua in the network. 

Google, this is awesome! Thank you so much for sharing!

Image: Mihkelkohava Üleslaadija 

Incredible WaveNet Speech Synthesis

Yaaaay, there is certainly some magic in deep neural networks - after mastering Go or making huge progress in the field of Spoken Language Recognition, Google now presents WaveNet, a deep neural networks-based approach to Speech Synthesis. It sound astoundingly real and even can compose music or fictional languge-like sounds. Amazing. And spooky. 

WaveNet changes this paradigm by directly modelling the raw waveform of the audio signal, one sample at a time. As well as yielding more natural-sounding speech, using raw waveforms means that WaveNet can model any kind of audio, including music.