Using High-Speed Videoendoscopy to Detect Vocal Cord Anomalies

NIH-Funded Research Project Fosters a Better Understanding of Normal Voice Function and Voice Disorder, with the Goal of Improving Treatment for Dysphonia

High-speed videoendoscopy (HSV) could change the way laryngologists and clinicians diagnose and treat patients with voice disorders or dysphonia—allowing for a greater understanding of what is normal in the function of the vocal folds and better detection of anomalies in voice disorders.

At Michigan State University, Assistant Professor Dr. Maryam Naghibolhosseini is using HSV, big data analysis, and her expertise in voice production to compile a new catalogue of information about vocal cords and their vibratory characteristics that affect the produced speech sounds. The catalogue could serve as a resource for clinicians and researchers, providing evidence of what vocal function looks like in both normal and disorder conditions.

The National Institute on Deafness and Other Communication Disorders (NIDCD), part of the National Institutes of Health (NIH), awarded Naghibolhosseini a $700,000 Career Development Grant to further her research into the application of advanced image processing, combined with data mining and statistical analysis to study voice production in neurogenic voice disorders.

“Research has to be directed toward the needs of the clinicians, in order to really make an improvement in the health of people or answer to the needs of people with voice disorders,” said Naghibolhosseini, who is collaborating with Dr. Dimitar Deliyski, Chair of MSU’s Department of Communicative Sciences and Disorders, and others on the project.

Using High-Speed Videoendoscopy to Monitor Vocal Cords

For decades, clinicians have been using videostroboscopy to search for anomalies in vocal fold function. During an assessment, an endoscope is fed into the larynx through a person’s mouth, preventing them from speaking naturally, beyond a few vowel sounds.

Videostroboscopy can be combined with flexible nasolaryngoscopes to observe the vocal fold vibrations during connected speech. This method lets the clinician observe laryngeal movement, but the resolution of the imaging only provides a limited view of the vibrations occurring within a single second. Using an HSV system allows for higher resolution images with thousands of frames per second.

“The recent advancement of coupling the HSV systems with fiberoptic nasolaryngoscopes has enabled us to obtain HSV data during production of running speech for the first time,” said Naghibolhosseini.

In this system, a fiberoptic nasolaryngoscope is passed through the nose into the throat to record videos from the vibrations of the vocal folds. The idea of coupling the high-speed video with the endoscope is a recent development, but researchers at MSU are tapping into the technology to study the structure and function of vocal folds.

The project is focused on the vibration that occurs, the rhythmic and rapid opening and closing of the vocal folds, that create sound. Through a partnership with the Mayo Clinic in Arizona, clinicians are working with patients to film their vocal folds for upwards of one minute. Close-up video footage displays the vocal fold’s vibration for different words and sentences, revealing the biomechanics as the vocal folds move.

“Videostroboscopy allows us to do only 30 frames per second. In comparison, with the HSV system, which gives us thousands of frames, that is really low in number. The low frame rate would limit us in terms of seeing how the vocal fold functions,” said Naghibolhosseini. “Now, I can look at the detailed vibrations of the vocal fold in every cycle.”

Researchers are examining HSV from both people with normal voice function and people with spasmodic dysphonia and unilateral vocal fold paralysis.

“It’s beneficial to use HSV to study the vibrations of the vocal folds because they happen really fast. If I can’t get enough images from that vibrations, that means I can’t have access to a lot of information,” said Naghibolhosseini. “All these muscles, they need to contract, or they need to stretch in order to, for example, shorten or stretch the vocal folds, changing the thickness of the vocal folds or the tension of the vocal folds. Those all affect the sound that is generated.”

“These mechanisms will be different when someone has a voice disorder,” said Naghibolhosseini. “For example, they might struggle more when they want to put the vocal folds together and start the vibration.”

Using Computational Mathematics in the Analysis of Voice and Hearing Lab

The high-speed video camera in this study captures 4,000 frames per second, providing thousands of images to support researchers’ understanding of the vocal fold biomechanics. It also allows researchers to discover hidden physics from the big HSV datasets with the help of data mining.

“A high-speed video camera can give you thousands of pictures from a really short event in time,” said Naghibolhosseini. “The goal is to basically collect this data from people who are vocally normal: no voice disorders or history of voice disorders, and then compare that data with people who have neurogenic voice disorders. We want to see what is going on at the vocal fold stage, because that is where the sound is really generated, the larynx.”

Within the research project, HSV data are stored in the computers and hard drives and sent to the Analysis of Voice and Hearing Lab (AVAH Lab) at MSU. There, researchers with quantitative and clinical backgrounds work on the data analysis. The preliminary findings from this study will be presented at the 2020 Annual Symposium: Care of the Professional Voice in Philadelphia.

“It’s really challenging to analyze the data.” Naghibolhosseini draws upon her background in electrical engineering to conduct the research project, which requires advanced methods. “The way that the data is collected is through fiberoptics, which lead to noisier video data.”

In this project, the HSV data are obtained simultaneously with acoustic recordings from the patient, allowing researchers to compare what they see happening on the level of the vocal folds with what they hear in the patient’s voice. While HSV systems have been used for capturing the vocal fold vibrations during sustained vowel sounds, this is the first time HSV data sets are being collected during running speech.

“The way to analyze the data would be to use data mining, or big data analysis,” said Naghibolhosseini. “So, we have to introduce some measures that would be clinically useful and try to extract that information from the HSV data. Of course, we have to do some image processing and also machine learning in order to extract this information.”

Launched to support the ongoing research, the Analysis of Voice and Hearing Lab (AVAH Lab) is focused on studying the underlying mechanisms of voice production, using the laryngeal imaging technique of HSV. The lab gives researchers a place to study the video and analyze the big data sets using statistical techniques, image processing, and signal processing.

Researchers hope to connect what is happening in the video with what a clinician is hearing during a voice assessment, in order to recommend specific treatments. Naghibolhosseini said the project may also allow researchers to identify what vocal issues are caused by certain anomalies in the structure or function of vocal folds.

“Hopefully, the results of our work would help the clinicians or the surgeons find better therapeutic techniques and to better help people with dysphonia,” said Naghibolhosseini.

Collaborating to Better Understand Voice Biomechanics

In addition to the Mayo Clinic, Naghibolhosseini is collaborating with clinicians, colleagues and student researchers. She hired Ahmed Yousef, an international student from Egypt who is pursuing his Ph.D in Communicative Sciences and Disorders at MSU, to work as a research assistant.

“I am very interested in Dr. Maryam Naghibolhosseini’s research and contribution in voice science and how she uses computational mathematics in performing biomechanical simulation for the biological systems,” said Yousef. “Also, she employs image processing, machine learning, and data mining to study the vocal function in norm and disorder conditions – aiming to present new methods for evaluating voice based on laryngeal imaging.”

Yousef, who studied mechanical engineering, will conduct experimental, theoretical and modeling research on voice production. He will also be involved in data analysis, the largest component of the project.

“The focus of my research in Communicative Sciences and Disorders will be directed to investigate voice production during running speech, using high-speed videoendoscopy, as well as the biomechanics of the laryngeal system,” said Yousef.

When Yousef reached out to Naghibolhosseini about her research, she encouraged him to apply to the Ph.D. program at the College of Communication Arts and Sciences.

“The department provides the substantial factors I need to deepen my expertise in my areas of interest, with a top-tier research environment, labs, and facilities,” said Yousef.

By Melissa Priebe

Communicative Sciences and Disorders