Google is constantly extending its software for automatic machine translation. A new set of languages that was recently added, includes Afrikaans and Swahili. This is the first two languages from Africa that is added to the list. This is interesting for several reasons, and I'm wondering how this can change the landscape for these two languages.

As an Afrikaans speaker, the first reason why this is interesting, is to see how well it fares, and to see which mistakes it makes (totally expected, of course). We all know that Google uses statistical machine translation. This theoretically means that it should just keep on improving as they continue to get more data to work with.
Interesting mistakes that I noticed in translation from English to Afrikaans; (please excuse possible mistakes in grammatical terms. Afrikaans ones added for reference.)
- Morphology (woordbou). Compounds are mostly not handled correctly. It knows about something like "Wêreldbeker" (World cup), but probably just because it encountered it before. How well does statistical machine translation handle target languages like German and Dutch? Will more data make the problem go away?
- Words with apparent Dutch or German inspiration, such as "epigrammatisch", "gewijd", "gefascineerd" that I can't imagine coming out of any Afrikaans source.
- The article 'n is is frequently wrong. It occurs very often as vir' n been (with the apostrophe stuck onto the previous word. It seems that the apostrophe is handled as a quotation, and then it closes the "quotation" from time to time with the apostrophe.
- Where sentences start with the article (lidwoord) 'n, the use of capitalisation is wrong.
Dwayne mentioned that some of the mistakes could be due to training from texts that were collected through optical character recognition. This would explain the problem with the apostrophe, for example. Although statistical machine translation might be language agnostic, the same is definitely not true for optical character recognition.
An interesting one to see was the translation for "long and short-term relationships" — not a bad attempt. The mistake with the incorrect "distant compounding" (afstandsamestelling) can easily be due to optical character recognition.
A few more comments about this:
- African languages is important enough for Google to put this effort in. Although it is later than what we would have liked, at least this is something. I hope more companies will take note and follow the example. It is interesting to see that Afrikaans is supported before some big languages of India. (With big I mean a language like Bengali with more than 200 million speakers.)
- Google is not first. The Apertium-project has had a translator for some time already, which works on totally different principles (it is rule based). I would recommend any interested people to collaborate with the Apertium project to improve their software, especially for languages with fewer resources. They are good at helping people who want to contribute. You don't need to have programming knowledge.
- It works, sort of. Somebody who doesn't understand Afrikaans, should be able to to get an idea of what is written in an Afrikaans text. I did however not get the idea that you would be getting a good idea with the current quality of translation. Try it out and comment below. Humorous examples are especially welcome.
- Due to the previous point, I believe Afrikaans people can now start to write more in Afrikaans, especially if the intended audience is partially Afrikaans. An argument for making a weblog more accessible for a theoretical international audience doesn't measure as strong as before. I translated this blog post myself. Can I use automatic translation from now on for the English version of my weblog? How accessible will it be?
- I have no idea how well it translates into languages other than English. I also haven't yet tested it with source languages other than English. I'm keen to hear if somebody can evaluate this.
- Now more than ever we will need to inform people about the limitations of machine translation. It will be a huge insult if people start to use this without realising that it can in no way and under no circumstances serve as a substitute for a professional translator. There will definitely be people who will want to use this incorrectly, even if they mean well by doing so. If it can't manage the indefinite article 'n, why should we trust it with our marketing material?
- It doesn't matter all that much how good or bad this is. Because it caries the Google name, and will be integrated with other Google services, it will probably become the machine translation system that people will use.
Comments
machine translation
I have a facination for languages. And I don't mean anything bad for afrikaans, it's interesting as a language, it's germanic and all (so I could probably learn to understand it).
But who uses machine translation? I don't understand that. Your hypothesis is fine, but I never read blog posts machine translated. Unless it's critical information I'm looking for, like the Persian movement this year. Machine translation is far from being usable.
But at the same time: Fine. For you. You now have your afrikaans-english translator and YOU can tell how well YOUR language is translated. Maybe it's going to work.
MT is used
I've been working on MT for the last year and I can tell you that MT systems are used for more than just translating blog posts.
Imagine a region where there are various official languages and you have an online newspaper. Obviously, the more people you can reach with your (local) information, the better for you. Probably you can't have all your news translated manually in a daily basis, so MT is a good approach for assimilation.
In governments where several languages are involved, MT can be very helpful too. Think about the European Parliament and the number of present official languages there. If you are a translator and you're about to translate so large amounts of texts (and often related to specific topics), properly built MT systems can give you a good draft to start your work from.
Google adding more and more languages and integrating MT with its other services can open up these systems more to the general public. The bad thing about this is that you're tied to Google, since you probably don't have the large amounts of translated texts and the resources required to process the data for building a competitive SMT system.
Afrikaans
Wow, Afrikaans is almost exactly like a written down Flemish dialect. It's so cute and beautiful it almost makes me cry,
Een kast => 'N kas
Een toetsenbord => 'N sleutelbord
Grappig => snaaks (haha!)
But it's quite a good translator. For example:
"Zo blauw als de de wolken, zo rood als de zon" translates to "So blou soos die wolke, so rooi soos die son".
But "als" translates to "as". I guess "as" comes from the English "as good as" ("zo goed als", "so goed soos"). The second "als" in Dutch can also be "zoals" ("like", "such as"). I guess "soos" is like "zoals" in Dutch. Yet the sentence about the clouds in Dutch translates to English basically means: "As blue like the cloads are, as red as the sun is". You could also write "Blauw zoals de wolken, rood zoals de zon" (and it would mean the same in Dutch). It looks like the translator noticed this and uses "soos".
I'm not sure if that's right in Afrikaans, but if "zoals" translates to "soos", and "als" to "as", then translating "Zo blauw als de wolken" using "soon" would from a Dutch/Flemish guy's point of view also be correct (or even more correct, more descriptive).
The verb "zagen"
It doesn't get everything right, though. For example:
"Wij zagen de takken van de boom" => "Ons sien die takke van die boom"
"Wij zien naar de kinderen" => "Ons sien na die kinders"
Which makes me believe that "sien" means "to see". The first Dutch sentence of course means to saw the branches off a tree. Not to see the branches of a tree. But "zagen" is also the we-form for "to see" in Dutch:
"Wij zagen leuke dingen" => "We saw nice things", "Ons sien leuke dinge"
So it's confused by the verb "zien" in Dutch (Ik zag, wij zagen, etc).
Not in google widget yet :/
I noticed it in the translate page this morning and played around with it! I also thought it's quite cool. It's not in the google widget I use on my homepage yet though. I hope they implement it soon. BTW, "keyboard" in Afrikaans should be "toetsbord", not "sleutelbord". Some translator stuffed up :)
Having some fun with Google's machine translation
Google Translation is pretty limited when it comes to accurate translations.
Such a service is really only useful to get the gist of things but it won't do much more.
To really appreciate the epic failure that machine translation represents - and to have some fun in the process - head to http://translationparty.com.
On this webpage, you type in an english sentence and hit enter.
This will query google translate to get the japanese translation of what you just typed. It will then translate the japanese sentence back to english.
The script will go back and forth until it reaches the "equilibrium", ie: getting the same english sentence after translating it to japanese and back to english.
The results can be quite funny:
http://translationparty.com/#3468143 - how do you say "hello" in Afrikaans?
http://translationparty.com/#1225886 - he runs
http://translationparty.com/#1356139 - you love me.
Have fun!
Amusing errors in Afrikaans translations
I'm translating some articles on fishes into Afrikaans, and ran one through Google. This paragraph made my day:
English: The closely related families Apogonidae (cardinalfishes) and Pempheridae (sweepers) were formerly included in the suborder Percoidei. Cardinalfishes are often colourful, and some are mouth breeders.
Afrikaans: Die nou verwante gesinne Apogonidae (cardinalfishes) en Pempheridae (veegmachines), is vroeër opgeneem in die suborde Percoidei. Cardinalfishes is dikwels kleurvolle, en sommige is die mond fokkers.
Afrikaans
Afrikaans is getting a lot of action these days...every now and then I stumble upon a site that picks up that I'm from South Africa and automatically changes the website to Afrikaans.I see a lot of international ads on Google and facebook also in Afrikaans!!Very cool.
I'm a 22 year old Software developer and a proud supporter of Afrikaans software : My Windows Vista is in Afrikaans...Office...FireFox....everything I use everyday!
Wie sê die Taal van Staal is dood?
*On a different note - I would like to get my xbox interface in Afrikaans("not the games--just the GUI")--but no one can help me with it!
I know of a lot of people that would use it!I even contacted Microsoft--But they seemed to be a little nowhere...
Post new comment