Background/Problem Statement
The human voice has been shaped by selection to communicate biologically relevant traits of the vocalizer. This means that human voice pitch plays an important role in human social interactions. But this role of fundamental frequency in human verbal vocalizations remains unclear.
Challenge
The previously indicated unclarity in human language vocalization is becoming increasingly difficult. This is because diverse human voice pitch variances exist, and these differences vary from laughter to shouting. In this context, audio analysis of diverse human voice pitch changes is required. There is a need to perform an audio analysis of different human voice pitch variations.
Goal
This project aims to achieve many goals through understanding voice pitch changes in order to overcome the encountered challenges. With this goal, this project aims to determine how confident a speaker is, what needs to be changed throughout sentence delivery, and how a speaker’s emotion changes throughout the speech.
Proposed Solution
Because data was not readily available, a need for data for audio analysis (classification) was required to meet the above-mentioned goals. This project uses the annotated audio data obtained while developing the VerbalVictory Annotation tool to achieve and collect the data. The details of the annotation tool were mentioned in Time and Word-Based Audio Annotation Tool that was developed earlier.
Following the data collection phase, the audio classification step begins. This project employs DeepSpeech, an open-source speech-to-text engine that employs a model built using machine learning techniques, as well as the Vggish categorization model, which is trained on an annotated dataset of people of various ages.
Results
Different subjects were taken into consideration for training and testing in order to evaluate the audio classification. The model has a 70 % accuracy for 20 subjects and a 77 % accuracy when the number of subjects is increased to 50. Furthermore, the project achieves an accuracy of more than 80% when people are trained and tested on the basis of age group classification. It is clear from the findings that categorizing age groups improves project categorization accuracy.
Our vision is to lead the way in the age of Artificial Intelligence, fostering innovation through cutting-edge research and modern solutions.