Random consultant: Building an Amazon Echo Like Device with a Raspberry Pi and Google Cloud Speech Api

Wednesday 11 January 2017

Building an Amazon Echo Like Device with a Raspberry Pi and Google Cloud Speech Api

In my previous post I showed how I wrote a python script to read out the latest news headlines using Googles text to speech api. As I commented in that post, voice recognition and talking devices seem to be the in thing with the release of the Amazon Echo and Google Home.

In this post I show how I created a python script to record sound on your raspberry pi, invoke the google cloud speech api to interpret what was said, and then perform a command on your raspberry pi - so a bit like a basic Amazon Echo.

Setting up your mic

Before I get into the python code, you need a mic setup. As the Raspberry Pi does not have a soundcard you will need a USB mic or a webcam which has an inbuilt mic. I went for the latter and used a basic webcam from logitech.

Once you have your mic plugged in, follow the instructions in the "Step 1: Checking Your Microphone" in: https://diyhacking.com/best-voice-recognition-software-for-raspberry-pi/

Install prerequisites

There is one python library you need which is pycURL, which is used to send data to the Google Cloud Speech Api. Follow the instructions here: http://pycurl.io/docs/latest/install.html

You will also need to install SoX which is an opensource tool to analyse sound files. This is used in the script to detect whether any sound is on the recorded audio, before trying to send it to the google api.

You can install this by running:

 sudo apt-get install sox

One more thing to install, flac . Flac is used to record your sound file in a lossless format which is required by the google api:

You can install this by running:

 sudo apt-get install flac

Setup Google Cloud Speech Api

To do the voice to text processing I am using the speech api which is part of Google Cloud. It is in beta at the moment and offering a free trial.

Follow the instructions on the their site to get your api key which will be needed in the script:

https://cloud.google.com/speech/

The current downside I've found with this api is the latency. It's currently taking 5-6 seconds for a response to process a 2 second audio file. The google help files yes the response time should be similar to the length of audio being processed.

Python Script

Now to the actual python code.

All the files required can be downloaded from here:

https://www.dropbox.com/sh/r0e787digng1rgv/AADMGKH6dBTVpPi2yMLtIcYxa/Rpi%20Speech?dl=0

The main file to look at is speechAnalyser.py.

This script does the following:

1. If no audio is playing (you don't want to record if you're playing something on your speakers), records sound from your microphone for 2 seconds

2. Uses SoX to check if any sound is on the file and is above a certain amplitude - this helps to not bother processing when there is silence or just background noises

3. If there is sound at a sufficient amplitude, then send the audio to the google api with a JSON message. As said earlier the google api takes a 5-6 seconds and returns a JSON message with the words detected.

4. If the trigger word in this case "Jarvis" is said during these two seconds, a beep sound is played.

5 Records another 3 seconds to listen for a user speaking a commandand sends to the google api like step 3

6.Checks if keyword found in returned text and executes the appropriate command. For example if "news" is mentioned it invokes the GetNews script which I described in my previous post.
7. Loops back to Step 1.

Remeber to change the line below where it says with the key which was provided when you set up the Google Cloud Speech api

key = ''
stt_url = 'https://speech.googleapis.com/v1beta1/speech:syncrecognize?key=' + ke

Also you should customise your commands in the following section of code:

def listenForCommand(): 
 
 command  = transcribe(3)
 
 print time.strftime("%Y-%m-%d %H:%M:%S ")  + "Command: " + command 

 success=True 

 if command.lower().find("light")>-1  and  command.lower().find("on")>-1   :
  subprocess.call(["/usr/local/bin/tdtool", "-n 1"])
   
 elif command.lower().find("light")>-1  and  command.lower().find("off")>-1   :
  subprocess.call(["/usr/local/bin/tdtool", "-f 1"])
 elif command.lower().find("news")>-1 :
                os.system('python getNews.py')

  elif command.lower().find("weather")>-1 :
                os.system('python getWeather.py')
 
 elif command.lower().find("pray")>-1 :
                os.system('python sayPrayerTimers.py')
 
        elif command.lower().find("time")>-1 :
                subprocess.call(["/home/pi/Documents/speech.sh", time.strftime("%H:%M") ])
 
 elif command.lower().find("tube")>-1 :
                 os.system('python getTubeStatus.py')
 else:
  subprocess.call(["aplay", "i-dont-understand.wav"])
  success=False

 return success

The other interesting part of the script to look at is, where it sends the data over to the Google Cloud Speech Api.

It creates a JSON message, and then encodes the audio in base64.

Within the outgoing JSON message, there is a phrases section, where I've included my trigger word "Jarvis", which makes it more likely the speech engine recognises this

The final bit then gets the text from the response.

#Send sound  to Google Cloud Speech Api to interpret
 #----------------------------------------------------
 
 print time.strftime("%Y-%m-%d %H:%M:%S ")  + "Sending to google api"


   # send the file to google speech api
 c = pycurl.Curl()
 c.setopt(pycurl.VERBOSE, 0)
 c.setopt(pycurl.URL, stt_url)
 fout = StringIO.StringIO()
 c.setopt(pycurl.WRITEFUNCTION, fout.write)
 
 c.setopt(pycurl.POST, 1)
 c.setopt(pycurl.HTTPHEADER, ['Content-Type: application/json'])

 with open(filename, 'rb') as speech:
  # Base64 encode the binary audio file for inclusion in the JSON
         # request.
         speech_content = base64.b64encode(speech.read())

 jsonContentTemplate = """{
    'config': {
         'encoding':'FLAC',
         'sampleRate': 16000,
         'languageCode': 'en-GB',
   'speechContext': {
        'phrases': [
         'jarvis'
      ],
     },
    },
    'audio': {
        'content':'XXX'
    }
 }"""


 jsonContent = jsonContentTemplate.replace("XXX",speech_content)

 #print jsonContent

 start = time.time()

 c.setopt(pycurl.POSTFIELDS, jsonContent)
 c.perform()


 #Extract text from returned message from Google
 #----------------------------------------------
 response_data = fout.getvalue()


 end = time.time()
 #print "Time to run:" 
 #print(end - start)


 #print response_data

 c.close()
 
 start_loc = response_data.find("transcript")
     temp_str = response_data[start_loc + 14:]
 #print "temp_str: " + temp_str
     end_loc = temp_str.find("\""+",")
     final_result = temp_str[:end_loc]
 #print "final_result: " + final_result
     return final_result

I have to give a big shout out to the following sites which gave me ideas on how to write this script:

https://diyhacking.com/best-voice-recognition-software-for-raspberry-pi/ - This contains the instructions on how to setup a microphoen on the raspberry pi

https://github.com/StevenHickson/PiAUISuite - Full Application which does what the above script does but is configurable. But not sure if it still works with the new Google Speech Api

24 comments:

Anonymous4 February 2017 at 16:23
Hey, this is really cool. I installed everything but when I try to run it I get these errors output:

sh: 1: flac: not found
arecord: begin_wave:2516: write error
Traceback (most recent call last):
File "speechAnalyser.py", line 199, in
spokenText = transcribe(2) ;
File "speechAnalyser.py", line 57, in transcribe
maxAmpValue = float(maxAmpValueText)
ValueError: could not convert string to float: open i

Have you experienced this error? It seems like something really simple to fix but I'm pretty new with this.
ReplyDelete
Replies
SM5 February 2017 at 13:09
Uncomment some of the print statements to use to debug (remove the '#' symbols).

Also did sox install properly? Check by running on the command line. Sox is used to check if the file is silent by looking at the maximum amplitude.

Also is the test.flac file saved down?
ReplyDelete
Replies
Anonymous5 February 2017 at 19:22
Ok, so I tested Sox by converting a .wav file to a .au file, so it seems it's been installed properly. I uncommented the print statements but I'm not sure what you mean by the test.flac being "saved down". I now get this error:

listening ..
sh: 1: flac: not found
arecord: begin_wave:2516: write error
Popen outputsox FAIL formats: can't open input file `test.flac': No such file or directory

Max Amp Start: 23
Max Amop Endp: 30
Max Amp: open i
Traceback (most recent call last):
File "speechAnalyser.py", line 199, in
spokenText = transcribe(2) ;
File "speechAnalyser.py", line 57, in transcribe
maxAmpValue = float(maxAmpValueText)
ValueError: could not convert string to float: open i
ReplyDelete
Replies
Anonymous6 February 2017 at 15:51
I tested arecord on its own and it definitely works. I can't see "test.flac" stored in the folder so that line definitely isn't running properly. Are there meant to be apostrophes in line 34?
ReplyDelete
Replies
SM6 February 2017 at 16:07
Add a line before 33 with this:
print 'arecord -D plughw:1,0 -f cd -c 1 -t wav -d ' + str(duration) + ' -q -r 16000 | flac - -s -f --best --sample-rate 16000 -o ' + filename)

this will print out the actual command being sent to arecord. Then yon can try running that on the command line separately.

My suspicion is that flac isn't installed (which might be missing from my instructions)

you will need to run :
sudo apt-get install flac
ReplyDelete
Replies
Anonymous6 February 2017 at 17:08
You were right, I just needed to install flac. The only problem now is that it won't accept my API key for google cloud speech. Every time it tries to send the file to google it says "API key not valid. Please pass a valid API key."
Any idea what's going wrong there?
ReplyDelete
Replies
SM6 February 2017 at 17:17
In the Google Cloud Platform console, get into the API Manager and select "Credentials". Click on create credentials and select "API Key".

Did you sign up fully to the goolge cloud platform. You need to provide payment details even though it's an beta limited time free trial.
ReplyDelete
Replies
Anonymous6 February 2017 at 17:49
Ye I'm sure I'm fully setup and everything. I'll regenerate the API key and have another go later.
ReplyDelete
Replies
Anonymous6 February 2017 at 20:26
Is it definitely an API key you're using, not a service account key?
ReplyDelete
Replies
Zan7 February 2017 at 19:28
Hi! Great tutorial. I've set up everything without any problems and also changed google cloud speech to wit.ai - working like a charm :)
ReplyDelete
Replies
Anonymous8 February 2017 at 21:41
Turned out I hadn't actually enabled the api key. It works perfectly. Thanks for all your help!
ReplyDelete
Replies
Anonymous13 February 2017 at 23:43
is there anything else than google cloud platform speach api i can use? because i cant create acount
ReplyDelete
Replies
rafi19 April 2017 at 10:20
This comment has been removed by the author.
ReplyDelete
Replies
rafi19 April 2017 at 10:21
Hey nice post dude, It would be awesome if you create a module for new google cloud speech API in jasper project. It would help a lot of us.
https://github.com/jasperproject/jasper-client
ReplyDelete
Replies
disheet2 November 2017 at 06:30
I am getting this error while installing pycurl.

Using curl-config (libcurl 7.38.0)
running install
running build
running build_py
running build_ext
building 'pycurl' extension
arm-linux-gnueabihf-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fno-strict-aliasing -D_FORTIFY_SOURCE=2 -g -fstack-protector-strong -Wformat -Werror=format-security -fPIC -I/usr/include/python2.7 -c src/pycurl.c -o build/temp.linux-armv7l-2.7/src/pycurl.o
src/pycurl.c: In function ‘PYCURL_OPT’:
src/pycurl.c:62:20: warning: typedef ‘compile_time_assert_fail__’ locally defined but not used [-Wunused-local-typedefs]
{ typedef int compile_time_assert_fail__[1 - 2 * !(expr)]; }
^
src/pycurl.c:69:5: note: in expansion of macro ‘COMPILE_TIME_ASSERT’
COMPILE_TIME_ASSERT(OPTIONS_SIZE == CURLOPT_HTTP200ALIASES - CURLOPTTYPE_OBJECTPOINT + 1);
^
src/pycurl.c: In function ‘do_curl_setopt’:
src/pycurl.c:1076:25: error: ‘CURLOPT_PASSWDDATA’ undeclared (first use in this function)
option == CURLOPT_PASSWDDATA))
^
src/pycurl.c:1076:25: note: each undeclared identifier is reported only once for each function it appears in
src/pycurl.c:1172:17: warning: implicit declaration of function ‘curl_formparse’ [-Wimplicit-function-declaration]
res = curl_formparse(str, &self->httppost, &last);
^
src/pycurl.c:1239:15: error: unknown type name ‘curl_passwd_callback’
const curl_passwd_callback pwd_cb = password_callback;
^
src/pycurl.c:1239:45: warning: initialization makes integer from pointer without a cast
const curl_passwd_callback pwd_cb = password_callback;
^
src/pycurl.c:1277:14: error: ‘CURLOPT_PASSWDFUNCTION’ undeclared (first use in this function)
case CURLOPT_PASSWDFUNCTION:
^
src/pycurl.c: In function ‘initpycurl’:
src/pycurl.c:2404:35: error: ‘CURLOPT_PASSWDFUNCTION’ undeclared (first use in this function)
insint_c(d, "PASSWDFUNCTION", CURLOPT_PASSWDFUNCTION);
^
src/pycurl.c:2405:31: error: ‘CURLOPT_PASSWDDATA’ undeclared (first use in this function)
insint_c(d, "PASSWDDATA", CURLOPT_PASSWDDATA);
^
error: command 'arm-linux-gnueabihf-gcc' failed with exit status 1
ReplyDelete
Replies
disheet13 November 2017 at 05:34
by default this is a female voice so is there any chance to convert into male voice?
ReplyDelete
Replies

Add comment