M y    b r a i n    h u r t s  !                                           w e                 r e a l l y                 t h i n k                   w h a t                y o u             k n o w

31 August 2010

Using Google Translate from console

If you have ever used Google Translate and wished you could do the same from console, here is a Python script that does just that.

The script will translate words and entire sentences between any language pair known to Google Translate. It will accept both text passed in as shell arguments, as well as data from standard input.

NOTE: This script stopped working after the translation API version 1 was discontinued on December 2011. See the updated script below for a working version.

NOTE: According to http://code.google.com/apis/language/translate/v1/reference.html
Important: Google Translate API v1 was officially deprecated on May 26, 2011; it was shut off completely on December 1, 2011. For text translations, you can use the Google Translate API v2, which is now available as a paid service. For website translations, we encourage you to use the Google Website Translator gadget.
#!/usr/bin/env python
from urllib2 import urlopen
from urllib import urlencode
import sys
import os

# The google translate API can be found here:
# http://code.google.com/apis/ajaxlanguage/documentation/#Examples

# Language codes are listed here:
#http://code.google.com/apis/ajaxlanguage/documentation/reference.html#LangNameArray

if len(sys.argv) < 3:
    name = os.path.basename(sys.argv[0])
    print '''
Usage:
    %s en es lovely spam
    %s es en < file.txt

Available language codes are listed here:
http://code.google.com/apis/ajaxlanguage/documentation/reference.html#LangNameArray
''' % (name,name)
    sys.exit(-1)

## hack to be able to display UTF-8 in Windows console
if sys.platform == "win32":
    ## set utf8 console
    if not sys.stdin.encoding == 'cp65001':
        os.system('chcp 65001 > nul')
    class UniStream(object):
        __slots__= "fileno", "softspace",
        def __init__(self, fileobject):
            self.fileno= fileobject.fileno()
            self.softspace = False
        def write(self, text):
            if isinstance(text, unicode):
                os.write(self.fileno, text.encode("utf_8"))
            else:
                os.write(self.fileno, text)
    sys.stdout= UniStream(sys.stdout)
    sys.stderr= UniStream(sys.stderr)

lang1=sys.argv[1]
lang2=sys.argv[2]
langpair='%s|%s'%(lang1,lang2)

if len(sys.argv) > 3:
    text=' '.join(sys.argv[3:])
else:
    text=sys.stdin.read()

base_url='http://ajax.googleapis.com/ajax/services/language/translate?'
params=urlencode( (('v',1.0),
('q',text),
('langpair',langpair),) )
url=base_url+params
content=urlopen(url).read()
start_idx=content.find('"translatedText":"')+18
translation=content[start_idx:]
end_idx=translation.find('"}, "')
translation=translation[:end_idx]
sys.stdout.write(translation + '\n')

This is the updated script that uses the web API. Should work after December 2011.

#!/usr/bin/env python
import sys
import os
import urllib2
from urllib import urlencode
import cookielib
import re

# The google translate API can be found here (***NOT OPERATIONAL SINCE DECEMBER 2011***):
# http://code.google.com/apis/ajaxlanguage/documentation/#Examples

# Language codes are listed here:
#http://code.google.com/apis/ajaxlanguage/documentation/reference.html#LangNameArray

if len(sys.argv) < 3:
    name = os.path.basename(sys.argv[0])
    print '''
Usage:
    %s en es lovely spam
    %s es en < file.txt

Available language codes are listed here:
http://code.google.com/apis/ajaxlanguage/documentation/reference.html#LangNameArray

''' % (name,name)
    sys.exit(-1)

## hack to be able to display UTF-8 in Windows console

def fix_win32_console():
    ## set utf8 console
    if not sys.stdin.encoding == 'cp65001':
        os.system('chcp 65001 > nul')
    class UniStream(object):
        __slots__= "fileno", "softspace",
        def __init__(self, fileobject):
            self.fileno= fileobject.fileno()
            self.softspace = False
        def write(self, text):
            if isinstance(text, unicode):
                os.write(self.fileno, text.encode("utf_8"))
            else:
                os.write(self.fileno, text)
    sys.stdout= UniStream(sys.stdout)
    sys.stderr= UniStream(sys.stderr)

if sys.platform == "win32":
    fix_win32_console()

lang1=sys.argv[1]
lang2=sys.argv[2]

if len(sys.argv) > 3:
    text=' '.join(sys.argv[3:])
else:
    text=sys.stdin.read()

base_url='http://translate.google.com.br/translate_a/t'
# sample browser request
#http://translate.google.com/translate_a/t?client=t&text=col&hl=en&sl=en&tl=es&multires=1&otf=2&ssel=4&tsel=0&sc=1
params=urlencode({'client':'t',
    'text':text,
    'hl':'en',
    'sl':lang1,
    'tl':lang2,
    'otf':2,
    'multires':1,
    'ssel':0,
    'tsel':0,
    'sc':1,
    })

url=base_url + '?' + params

cookiejar = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cookiejar))
opener.addheaders = [('User-agent', 'Mozilla/5.0 (Windows NT 5.1) AppleWebKit/535.19 (KHTML, like Gecko) Chrome/18.0.1018.0 Safari/535.19'),
                    ('Referer', 'http://translate.google.com/')
]
response = opener.open(url)
translation=response.read()
matcher = re.search('\[\[\["(?P<human_readable_chunk>[^")]*)', translation)
sys.stdout.write(matcher.group('human_readable_chunk'))



Save the script to a file such as gtrans.py and run it as follows (assuming you have Python in your path):

    python gtrans.py en es Nobody expects the Spanish Inquisition

The first two parameters are the language codes. A list of codes known to google translate is available here: http://code.google.com/apis/ajaxlanguage/documentation/reference.html. For some reason, not all of the listed codes are actually accepted, for example, bo for Tibetan

To pipe a text file through the script:

    python gtrans.py en es < myfile.txt

It is also possible to enter multi-line text directly from the console. To do so, call the script with the language codes only, i.e:

    python gtrans.py en es

Enter your text and use the Enter key to start a new line. When you are done, press CTR+d (on Linux) or CTR+z followed by Enter (on Windows).

Note: On Windows input in other languages than English is not going to work. This is due to poor support of Unicode input in cmd.exe. On Linux international input works fine, provided that the console is UTF-8.

Keep in mind though that google has a limit on the size of text to be translated.

Console Google Translate — curl-based version

As an alternative, here is a bash script which uses curl and sed. Updated to work via Google Translate Web API.

#! /bin/bash

USAGE="Usage: 
       $0 en es Lovely spam!
Some codes: en|fr|de|ru|nl|it|es|ja|la|pl|bo
All language codes:
http://code.google.com/apis/ajaxlanguage/documentation/reference.html#LangNameArray"

if [ "$#" == "0" ]; then
    echo "$USAGE"
    exit 1
fi

FROM_LNG=$1
TO_LNG=$2

shift 2
QUERY=$*

UA="Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.2) Gecko/20040803"
URL="http://translate.google.com.br/translate_a/t?client=t&hl=en&sl=$FROM_LNG&tl=$TO_LNG&otf=2&multires=1&ssel=0&tsel=0&sc=1"
curl  --data-urlencode "text=$QUERY" -A $UA -s -g -4 $URL | sed 's/","/\n/g' | sed 's/\]\|\[\|"//g' | sed 's/","/\n/g' | sed 's/,[0-9]*/ /g'

4 comments :

  1. Thanks for this. I've posted a version of your script that uses the Python json and optparse libraries here: http://gist.github.com/561630

    It should be reasonably compatible with Python3, although I think optparse has been deprecated in favour of argpase.

    ReplyDelete
  2. Nice post! I have written a bash version that works pretty much like yours :-)

    http://ur1.ca/1g5ak

    Regards!

    ReplyDelete
  3. 2ksaver: Thanks for sharing the script! I went back to update the post with a curl-based solution and saw your version of the script. I experimented with curl's `--data-urlencode` and looks like google is happy with getting the params via POST instead of GET.

    ReplyDelete
  4. Very Very (./translate it en utile) useful :)

    ReplyDelete