cne – Team D6: StenoPhone

Cambrea’s Status Report for May 8

This week I have been working on our final report and we have also been filming our final video, and we have finished the poster.

We are on schedule for completing all documents for May 10.

For the demo next week and plan on including a live demo of our product with the poster.

Team Status Report for May 8

This week we finished the final project presentation. We have also been working on the final video, final report, and poster.

We are finished with creating the project and we are just working on the end documentation so we don’t have any significant risks, changes to the design, or changes to the gantt chart.

We are using iMovie to edit our video shown here

Cambrea’s Status Report for May 1

This week I finished all tests for the audio networking and made more formal output files with the results. Our latency is~30ms and our dropped packet rate is ~.3% , both of these meet our requirements

latencyTestMic1

latencyTestMic2

droppedPacketTestMic1

droppedPacketTestMic2

We also finished the slides for our final presentation.

Today we are working on our final video outline and we are filming some clips of our meeting setup.

My progress is on schedule, next week we will be working more on the video and final report after our presentation

Cambrea’s Status Report for April 24

Last weekend we finished the migration to AWS so now we are running our server code on the ec2 server, instead of running our server on the local computer.

Since we are now running our system on the server we started on our final standard tests. I completed the 2 tests for the audio device to aws server networking this week .

I first wrote the latency test, this test sends packets to the server and should receive the packet back. I capture the timestamp of the packet when it is sent and compare this to the time stamp of when the packet is received back at the raspberry pi to make sure we use the same clock to calculate the latency. After sending 500 and receiving 500 packets we calculate the average latency of the systems round trip time. We are getting the results of 27-80 ms average latency which is below our requirement of 150 ms latency.

The second test is the dropped packet test, this sends packets to the server for a fixed amount of time and counts the number of packets received back from the server, we calculate the dropped packet rate as (number packets sent – number packets received)/ number packets sent. We ran this test for 2 minutes and 10 minutes and found the dropped packet rate to be less than 1% at .2% and .1% . This also meets our requirement of <5% dropped packets in our system.

These 2 tests are in this ClientTest.py, this file works with the arguments

-b for basic connection to the server test, send and received 1 packet to make sure the connection exists

-l run the full latency test

-d run the full dropped packet test

the tests are in the corresponding functions testLatency and testDroppedPacket

ClientTest

This progress is on schedule we will be testing until next week as well.

Next week I will be working on the final presentation slides and the final report.

Updated Gantt Chart

Updated Gantt Chart Link

For our updated gantt chart, we have completed the task “Integration of RPi + Transcript + ML” half a week early so we are currently working on the task “Complete System Debugging / Testing”. Our current “complete system” for the demo uses a computer as the intermediary server that handles the speaker ML and transcript streaming. Ellen is working on some ML improvements for integration and testing at this time. After the demo Cambrea and Mitchell will start the migration to using the AWS Server instead of our local computer for the task “AWS Migration”. For our latency testing and standard testing we will complete these during week 12 and 13 after we have migrated to AWS. We will start revising our final report the during week 11, and work on our final presentation and video during weeks 12-14.

Cambrea’s Status Report for April 10

Last week week I completed the streaming code and AWS server code that is responsible for sending and receiving audio over the network,. The ReSpeaker offers a capability of detecting whether a user is speaking, using the is_voice() parameter. I was testing this capability over the last weekend and found that the output audio using this information is too choppy to be intelligible to the user. We are currently testing if after we tag the packets as voice and feed those to the transcript, if these packets have enough data to create the transcript.

This week we started integration of each of the systems so we were working on campus in HH D level. On Monday and Tuesday I added the raspberry pi’s to the cmu device wifi. We were having issues with connecting the devices to the wifi so we reflashed the OS to the SD cards on the raspberry pi and reconfigured the wifi and it now works on the cmu-device wifi.

On Wednesday we finished the integration between the audio streaming on the audio devices and the transcript generation on the server. For this integration we are currently using Ellens computer to act as the Server so that we can complete the integration for the demo before migrating to using the AWS Server. We are currently developing the speaker Identification more to make sure that I works to recognize different speakers.

This week we will start the tests for transcript accuracy, prepare for the demo, and also start the migration to AWS.

Team Report for April 10th

Last week our team had finished our individual components, Cambrea finished audio streaming between raspberry pi and server, Ellen finished speech to text, and speaker identification ML, and Mitchell finished the database and website setup and transcript streaming. This week our team has focused heavily on complete system integration and testing. We have completed the connections between all components and have real transcript streaming to the website of the input audio from users. We are currently working on improving our speaker identification during the setup phase of the meeting.

Our main risk for right now is that the direction of arrival data from the speaker can have a small fluctuation during the time that a speaker changes or during the time that a speaker is speaking. This fluctuation is mainly affecting our setup phase for when we register speakers to be identified by the speaker identification ML. During the setup phase we rely on the DOA to determine if a new speaker should be registered with the system. We are currently fixing and testing this by only saying that a new speaker should be registered if a new DOA is detected and also if there is a significant amount of audio coming from that direction. This will work to ignore small fluctuations of DOA.

Updated Gantt Chart Link

https://docs.google.com/spreadsheets/d/1eeHeut41JF_Ju9Ys14n_sOLiZGYdpl4HtEale2ySasY/edit?usp=sharing

Cambrea’s Status Report for March 27

For this week, I worked on fixing some audio issues with the audio device code. The issue was that when I went to play the output audio on the speaker(this should be the output audio from users), this was not the correct audio and instead was just computer noise. I was trying to play raw audio using pyaudio like

stream = p.open(

rate=RESPEAKER_RATE,

format=p.get_format_from_width(RESPEAKER_WIDTH),

channels=RESPEAKER_CHANNELS,

output=True,

frames_per_buffer=CHUNK,

input_device_index=RESPEAKER_INDEX,

)

stream.write(data)

This code to stream.write(data) was not working with raw audio but was working when instead I wrote this audio to a wav file and stream.write() the data from the file after.

I am thinking that the raw audio data was not the correct format or type to be played using the stream, but when I print the type of the raw data I can see that it is “bytes” which is very ambiguous, the stream documentation says that it is also meant to play “byte” like objects.

Going forward I just instead chose to write this raw data to wav and then play the wav file through the stream since that gives clear audio.

This week I also received the equipment for the second audio device and assembled it. I have downloaded the necessary code onto the device and am testing currently using my computer as the server(by this I mean I am running basic server code on my computer with both devices connected to it). I am currently testing that the devices send and receive the correct data to eachother.

I am currently on track with my progress, next week I will work on using the AWS server for the audio devices and integrating this code with Ellen’s speech to text code.

Cambrea’s Status Report for March 20

This week I wrote the code the handle all components on the raspberry pi.

These components are the microphone IO, audio processing, and the client code. The RaspberrySystem file acts as the “top” module on the raspberry pi, and is used to run the entire system. In this file I start the threads for the mic IO, audio processing, and client threads.

RaspberrySystem

I also added the MicrophoneIO file. In this file, the startIO function is called by the RaspberrySystem file. This function starts the audio stream to listen on the microphone. When audio is detected on the microphone, it is then put into the queue for the audio processing component. The direction of arrival for that particular audio is also added to the queue. This happens in the callback function.

When audio is ready to be played on the speaker(this is the output audio coming from the second set of users in a different room) the stream to listen to audio is stopped to prevent a feedback loop while the output audio is played.

MicrophoneIO

Lastly I added to the client code. Here I made a text file that holds the audio device’s ID number. This number is sent to the server so that the server can differentiate the audio devices.

I also ordered a respeaker, micro SD card, raspberry pi case and raspberry pi power supply cable to create the second audio device.

I am currently on track with my progress.

Next week I will configure the new audio device when the parts arrive. Ellen and I will also start integrating the audio streaming work that I have done with the transcript generator that Ellen has completed.

Team Status Report for March 20

This week the team finished the design report. Ellen and Mitchell worked on setting up the website using the Django framework, and also on connecting to the AWS Server to access the database. Cambrea worked on the raspberry pi system which set up the threads for audio IO, audio processing and audio streaming to server on the raspberry pi.

Our biggest risk now for our project is that our denoising will have a negative impact on the audio stream and negatively affect the clarity of the audio and the transcript generation. We are planning on using the background noise in the room(when there are no voices) to create a noise file that we will use to denoise the signal. Our current approach is to create this file during the meeting using the extraneous noise from the beginning of the meeting. There could be a problem with this approach since the noise in the beginning wouldn’t necessarily represent the noise throughout the meeting. But we are starting with this approach for now and we will test the result. Alternatively we will focus more on filtering the audio to remove noise.

We haven’t changed the design of our project.

A change in the schedule is that the task of mic initialization backend was reassigned to Ellen.

Gantt Chart – combined (1)