posted on 2022-12-22, 14:21authored byRafael Dantas
Increased urbanisation drastically reduces the area a mobile Internet Service Provider
(ISP) may be required to cover to provide a reasonable service. While ISP networks
are constantly being upgraded, market pressures may influence the order in which
these upgrades will be performed, prioritising denser populational centres while leav ing the more sparsely populated regions of a country relegated to older and slower
infrastructure. In such places, there is still value in reducing the amount of data
required by Voice over IP (VoIP) applications to execute calls through the Internet.
This thesis proposes the combined use of speech-to-text to encode the message
being transmitted with a text-to-speech synthesiser to decode the message back into
audible waves, resulting in great savings in bandwidth. A black-box experiment was
conducted to analyse the performance of 10 popular Mobile VoIP applications on
the Android platform with respect to both network usage and perceived quality of
the call. There was a clear correlation between the quality reported by the users and
the amount of data used by the application to represent the conversation. Finally,
another experiment was executed to test the viability of using a speech-to-speech
pipeline as a coding method by measuring the average Word Error Rate (WER) of
the users when transcribing Semantically Unpredictable Sentences (SUS) presented
by either a prerecorded human voice or that of a synthesised one. The experiment
demonstrated a very small WER difference between the prerecorded human speeches
and the synthesised speeches.
In conclusion, the results of our experiments imply that a speech-to-speech pipeline
can be used in replacement of regular speech coding for massive data savings at the
cost of the extra-textual information. Additionally, this speech-to-speech pipeline
can be made entirely independent of traditional Cloud-based solutions.