Echo Removal
Please Log In for full access to the web site.
Note that this link will take you to an external site (https://shimmer.mit.edu) to authenticate, and then you will be redirected back to this page.
Consider a person standing near a microphone, with a wall off in the distance, as shown below.
In such a situation, audio recorded from the microphone will be corrupted by an echo that results because the microphone picks up not only sound wave from the nearby source but also the reflection of the sound wave off the wall.
Listen to the audio file phrase_echo.wav
in this week's code distribution,
which mimics this kind of situation.
We can model this process as a convolution y[n] = (x * h_1)[n], where the signal x is the speaker's original utterance, y is the sound picked up by the microphone, and h_1 is a filter of the following form:
Part a. Make a plot of h_1[n] versus n for some value of A and n_0. Determine an expression for H_1(\Omega), which represents the DTFT of h_1[n]. Be prepared to discuss these results during your check-in.
Part b. Having just heard about convolution, Ben Bitdiddle believes that he can remove the echo by convolving the recorded signal y[n] with the following signal:
It turns out that Ben is incorrect. What is the flaw in his reasoning? Be prepared to discuss this during your check-in.
Part c. Design an improved filter h_3 such that, if y[n] = (x * h_1)[n], then (y * h_3)[n] = x[n]. Plot h_3[n] versus n for some value of A and n_0. How many non-zero terms will h_3 have? Determine the first four values of n for which h_3[n]\ne0 and label their values on your plot. Be prepared to discuss these results at your check-in.
Part d.
The lab handout includes a sound file phrase_echo.wav
that was created by convolving an input speech signal with
h_1[n] with A{=}0.9 and n_0{=}1323.
We would like to remove the echo from this signal (y[n]) by convolving it with h_3[n].
One difficulty with this approach is that h_3[n] has infinite length.
However, if we can approximate h_3[n] using its first N samples,
then we can implement the convolution with the provided function called
causal_convolution
which convolves two sequences of finite length, both of
which are presumed to be zero for negative time indices.
Look at the code for causal_convolution
.
How does this function implement convolution?
Create a python list h_3
that contains the first 5 non-zero terms in h_3[n].
Use the causal_convolution
function to convolve h_3
with
phrase_echo.wav
.
Listen to your output.
You may find the result surprising!
Can you explain what is happening, in terms of the kernel we used above?
How would you expect the output to change if we added more terms? Try it out!
Part e. Check In.
- Show a staff member your plots of h_1[\cdot].
- Describe what was wrong with Ben Bitdiddle's approach to echo removal.
- Show your sketch of h_3[\cdot], and explain your process for determining h_3[\cdot].
- Play the result of convolving the echo-corrupted signal with your approximation of h_3 containing only 5 non-zero values. Do the same for 9 nonzero samples. Explain the results you hear.
Please describe your approach to each of the parts of this lab.
Part f.
Echo Removal: Frequency Domain Approach.
As we saw in lecture, we could also perform convolution by operating in the
frequency domain. Start by finding the DTFT of h_3.
What is the relation between H_3(\Omega) and H_1(\Omega)?
Devise a frequency-domain method for removing the echo in
Python, and use it to remove the echo from phrase_echo.wav
.
We have provided a pair of functions -- fft
(Fast Fourier Transform)
and ifft
(inverse FFT) -- to speed up the frequency-domain calculations.
The function fft
takes a single argument, which is a list of samples of the
time domain signal x[\cdot].
Then fft
returns a list of complex-valued samples of the DTFT of x[\cdot].
The output list always has the same length as the input list.
For example, if the input is a list of N samples of x[\cdot],
then the output list will contain N samples of the DTFT,
where the k^{\rm th} sample is given by
The function ifft
inverts the operation of fft
.
It takes a list of samples of a DTFT and returns samples of the corresponding
time-domain signal.
Please note that numerical errors can introduce small imaginary parts
to an output of ifft
that should theoretically be purely real.
You may need to remove such imaginary parts before writing your
output signal to a .wav file.
- Upload your Python code for removing the echo.
- Upload your .wav file for the signal with the echo removed.
- Describe your frequency-domain method. How did you make use of the DTFT of h_3?
- Compare results for the time and frequency approaches. What are the important differences?