M y    b r a i n    h u r t s  !                                           w e                 r e a l l y                 t h i n k                   w h a t                y o u             k n o w

31 August 2010

Using Google Translate from console

If you have ever used Google Translate and wished you could do the same from console, here is a Python script that does just that.

The script will translate words and entire sentences between any language pair known to Google Translate. It will accept both text passed in as shell arguments, as well as data from standard input.

NOTE: This script stopped working after the translation API version 1 was discontinued on December 2011. See the updated script below for a working version.

NOTE: According to http://code.google.com/apis/language/translate/v1/reference.html
Important: Google Translate API v1 was officially deprecated on May 26, 2011; it was shut off completely on December 1, 2011. For text translations, you can use the Google Translate API v2, which is now available as a paid service. For website translations, we encourage you to use the Google Website Translator gadget.
#!/usr/bin/env python
from urllib2 import urlopen
from urllib import urlencode
import sys
import os

# The google translate API can be found here:
# http://code.google.com/apis/ajaxlanguage/documentation/#Examples

# Language codes are listed here:
#http://code.google.com/apis/ajaxlanguage/documentation/reference.html#LangNameArray

if len(sys.argv) < 3:
    name = os.path.basename(sys.argv[0])
    print '''
Usage:
    %s en es lovely spam
    %s es en < file.txt

Available language codes are listed here:
http://code.google.com/apis/ajaxlanguage/documentation/reference.html#LangNameArray
''' % (name,name)
    sys.exit(-1)

## hack to be able to display UTF-8 in Windows console
if sys.platform == "win32":
    ## set utf8 console
    if not sys.stdin.encoding == 'cp65001':
        os.system('chcp 65001 > nul')
    class UniStream(object):
        __slots__= "fileno", "softspace",
        def __init__(self, fileobject):
            self.fileno= fileobject.fileno()
            self.softspace = False
        def write(self, text):
            if isinstance(text, unicode):
                os.write(self.fileno, text.encode("utf_8"))
            else:
                os.write(self.fileno, text)
    sys.stdout= UniStream(sys.stdout)
    sys.stderr= UniStream(sys.stderr)

lang1=sys.argv[1]
lang2=sys.argv[2]
langpair='%s|%s'%(lang1,lang2)

if len(sys.argv) > 3:
    text=' '.join(sys.argv[3:])
else:
    text=sys.stdin.read()

base_url='http://ajax.googleapis.com/ajax/services/language/translate?'
params=urlencode( (('v',1.0),
('q',text),
('langpair',langpair),) )
url=base_url+params
content=urlopen(url).read()
start_idx=content.find('"translatedText":"')+18
translation=content[start_idx:]
end_idx=translation.find('"}, "')
translation=translation[:end_idx]
sys.stdout.write(translation + '\n')

This is the updated script that uses the web API. Should work after December 2011.

#!/usr/bin/env python
import sys
import os
import urllib2
from urllib import urlencode
import cookielib
import re

# The google translate API can be found here (***NOT OPERATIONAL SINCE DECEMBER 2011***):
# http://code.google.com/apis/ajaxlanguage/documentation/#Examples

# Language codes are listed here:
#http://code.google.com/apis/ajaxlanguage/documentation/reference.html#LangNameArray

if len(sys.argv) < 3:
    name = os.path.basename(sys.argv[0])
    print '''
Usage:
    %s en es lovely spam
    %s es en < file.txt

Available language codes are listed here:
http://code.google.com/apis/ajaxlanguage/documentation/reference.html#LangNameArray

''' % (name,name)
    sys.exit(-1)

## hack to be able to display UTF-8 in Windows console

def fix_win32_console():
    ## set utf8 console
    if not sys.stdin.encoding == 'cp65001':
        os.system('chcp 65001 > nul')
    class UniStream(object):
        __slots__= "fileno", "softspace",
        def __init__(self, fileobject):
            self.fileno= fileobject.fileno()
            self.softspace = False
        def write(self, text):
            if isinstance(text, unicode):
                os.write(self.fileno, text.encode("utf_8"))
            else:
                os.write(self.fileno, text)
    sys.stdout= UniStream(sys.stdout)
    sys.stderr= UniStream(sys.stderr)

if sys.platform == "win32":
    fix_win32_console()

lang1=sys.argv[1]
lang2=sys.argv[2]

if len(sys.argv) > 3:
    text=' '.join(sys.argv[3:])
else:
    text=sys.stdin.read()

base_url='http://translate.google.com.br/translate_a/t'
# sample browser request
#http://translate.google.com/translate_a/t?client=t&text=col&hl=en&sl=en&tl=es&multires=1&otf=2&ssel=4&tsel=0&sc=1
params=urlencode({'client':'t',
    'text':text,
    'hl':'en',
    'sl':lang1,
    'tl':lang2,
    'otf':2,
    'multires':1,
    'ssel':0,
    'tsel':0,
    'sc':1,
    })

url=base_url + '?' + params

cookiejar = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cookiejar))
opener.addheaders = [('User-agent', 'Mozilla/5.0 (Windows NT 5.1) AppleWebKit/535.19 (KHTML, like Gecko) Chrome/18.0.1018.0 Safari/535.19'),
                    ('Referer', 'http://translate.google.com/')
]
response = opener.open(url)
translation=response.read()
matcher = re.search('\[\[\["(?P<human_readable_chunk>[^")]*)', translation)
sys.stdout.write(matcher.group('human_readable_chunk'))



Save the script to a file such as gtrans.py and run it as follows (assuming you have Python in your path):

    python gtrans.py en es Nobody expects the Spanish Inquisition

The first two parameters are the language codes. A list of codes known to google translate is available here: http://code.google.com/apis/ajaxlanguage/documentation/reference.html. For some reason, not all of the listed codes are actually accepted, for example, bo for Tibetan

To pipe a text file through the script:

    python gtrans.py en es < myfile.txt

It is also possible to enter multi-line text directly from the console. To do so, call the script with the language codes only, i.e:

    python gtrans.py en es

Enter your text and use the Enter key to start a new line. When you are done, press CTR+d (on Linux) or CTR+z followed by Enter (on Windows).

Note: On Windows input in other languages than English is not going to work. This is due to poor support of Unicode input in cmd.exe. On Linux international input works fine, provided that the console is UTF-8.

Keep in mind though that google has a limit on the size of text to be translated.

Console Google Translate — curl-based version

As an alternative, here is a bash script which uses curl and sed. Updated to work via Google Translate Web API.

#! /bin/bash

USAGE="Usage: 
       $0 en es Lovely spam!
Some codes: en|fr|de|ru|nl|it|es|ja|la|pl|bo
All language codes:
http://code.google.com/apis/ajaxlanguage/documentation/reference.html#LangNameArray"

if [ "$#" == "0" ]; then
    echo "$USAGE"
    exit 1
fi

FROM_LNG=$1
TO_LNG=$2

shift 2
QUERY=$*

UA="Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.2) Gecko/20040803"
URL="http://translate.google.com.br/translate_a/t?client=t&hl=en&sl=$FROM_LNG&tl=$TO_LNG&otf=2&multires=1&ssel=0&tsel=0&sc=1"
curl  --data-urlencode "text=$QUERY" -A $UA -s -g -4 $URL | sed 's/","/\n/g' | sed 's/\]\|\[\|"//g' | sed 's/","/\n/g' | sed 's/,[0-9]*/ /g'

28 August 2010

Linux console online dictionary lookup

I often need to lookup words in various online dictionaries, but in many cases I would prefer to get the results right in the console, instead of having to launch the browser, type in the URL and wait for the page to load.

For example, here is a bash oneliner that translates an English word to Spanish right in the console:

Wordreference English-Spanish: 
curl -j -s -A "Opera/9.60 (J2ME/MIDP; Opera Mini/4.2.13337/458; U; en) Presto/2.2.0" "http://www.wordreference.com/es/translation.asp?tranword=`echo $* | sed 's/ /%20/g'`" | html2text -utf8 | less -R

Wordreference Spanish to English:
curl -j -s -A "Opera/9.60 (J2ME/MIDP; Opera Mini/4.2.13337/458; U; en) Presto/2.2.0" "http://www.wordreference.com/es/en/translation.asp?spen=`echo $* | sed 's/ /%20/g'`" | html2text -utf8 | less -R

Merriam Webster Online English Dictionary: 
curl -j -s -A "Opera/9.60 (J2ME/MIDP; Opera Mini/4.2.13337/458; U; en) Presto/2.2.0" "http://www.merriam-webster.com/dictionary/`echo $* | sed 's/ /%20/g'`" | html2text -utf8 | less -R

Wikipedia (EN):
curl -j -s -A "Opera/9.60 (J2ME/MIDP; Opera Mini/4.2.13337/458; U; en) Presto/2.2.0" "http://en.wikipedia.org/wiki/`echo $* | sed 's/ /%20/g'`" | html2text -utf8 | less -R 


Usage: Save the required script to a file, make it executable (chmod +x webster), and then type the script name from the console with the word(s) you need translated as arguments.

Instead of creating a separate script for each dictionary it might be more practical to add the scriptlets as aliases in your .bashrc file, e.g:

alias webster="curl -j -s -A "Opera/9.60 (J2ME/MIDP; Opera Mini/4.2.13337/458; U; en) Presto/2.2.0" "http://www.merriam-webster.com/dictionary/`echo $* | sed 's/ /%20/g'`" | html2text -utf8 | less -R"
Make sure you have curl, html2text , sed, and less installed.

A similar result can be achieved by using some console browser like lynx or w3m. Install w3m:

sudo apt-get install w3m # for Ubuntu
sudo pacman -S w3m       # Archlinux

Create a wrapper script which includes the URL:

echo 'w3m "http://www.wordreference.com/es/translation.asp?tranword=$*"' > ~/bin/en2es
chmod +x ~/bin/en2es

Now the script can be run like this:

en2es word of mouth

The advantage in this case is that you can use all the standard browser features, like clicking links, etc.

The Java Posse (Most Recent Podcasts)