Publications

2024

Towards Improving Real-Time Head-Worn Display Caption Mediated Conversations with Speaker Feedback for Hearing Conversation Partners

Jenna Kang, Emily Layton, David Martin, Thad Starner

Published in CHI EA '24: Extended Abstracts of the CHI Conference on Human Factors in Computing Systems, 2024

Abstract

Many products attempt to provide captioning for Deaf and Hard-of-Hearing individuals through smart glasses using automatic speech recognition. Yet there still remain challenges due to system delays and dropouts, heavy accents, and general mistranscriptions. Due to the imperfections of automatic speech recognition, there remains conversational difficulties for Deaf and Hard-of-Hearing individuals when conversing with hearing individuals. For instance, hearing conversation partners may often not realize that their Deaf or Hard-of-Hearing conversation partner is missing parts of the conversation. This study examines whether providing visual feedback of captioned conversation to hearing conversation partners can enhance conversational accuracy and dynamics. Through a task-based experiment involving 20 hearing participants we measure the impact on visual feedback of captioning on error rates, self-corrections, and subjective workloads. Our findings indicate that when given visual feedback, the average number of errors made by participants was 1.15 less (p = 0.00258) indicating a notable reduction in errors. When visual feedback is provided, the average number of self-corrections increased by 3.15 (p < 0.001), suggesting a smoother and more streamlined conversation These results show that the inclusion of visual feedback in conversation with a Deaf or Hard-of-Hearing individual can lead to improved conversational efficiency.

2023

PopSign ASL v1. 0: An Isolated American Sign Language Dataset Collected via Smartphones

Thad Starner, Sean Forbes, Matthew So, David Martin, Rohit Sridhar, Gururaj Deshpande, Sam Sepah, Sahir Shahryar, Khushi Bhardwaj, Tyler Kwok, Daksh Sehgal, Saad Hassan, Bill Neubauer, Sofia Vempala, Alec Tan, Jocelyn Heath, Unnathi Kumar, Priyanka Mosur, Tavenner Hall, Rajandeep Singh, Christopher Cui, Glenn Cameron, Sohier Dane, Garrett Tanzer

Published in Thirty-seventh Conference on Neural Information Processing Systems, 2023

Abstract

PopSign is a smartphone-based bubble-shooter game that helps hearing parentsof deaf infants learn sign language. To help parents practice their ability to sign,PopSign is integrating sign language recognition as part of its gameplay. Fortraining the recognizer, we introduce the PopSign ASL v1.0 dataset that collectsexamples of 250 isolated American Sign Language (ASL) signs using Pixel 4Asmartphone selfie cameras in a variety of environments. It is the largest publiclyavailable, isolated sign dataset by number of examples and is the first dataset tofocus on one-handed, smartphone signs. We collected over 210,000 examplesat 1944x2592 resolution made by 47 consenting Deaf adult signers for whomAmerican Sign Language is their primary language. We manually reviewed 217,866of these examples, of which 175,023 (approximately 700 per sign) were the signintended for the educational game. 39,304 examples were recognizable as a signbut were not the desired variant or were a different sign. We provide a training setof 31 signers, a validation set of eight signers, and a test set of eight signers. Abaseline LSTM model for the 250-sign vocabulary achieves 82.1% accuracy (81.9%class-weighted F1 score) on the validation set and 84.2% (83.9% class-weightedF1 score) on the test set. Gameplay suggests that accuracy will be sufficient forcreating educational games involving sign language recognition.

FingerSpeller: Camera-Free Text Entry Using Smart Rings for American Sign Language Fingerspelling Recognition

David Martin, Zikang Leng, Tan Gemicioglu, Jon Womack, Jocelyn Heath, Bill Neubauer, Hyeokhyen Kwon, Thomas Plöetz, Thad Starner

Published in The 25th International ACM SIGACCESS Conference on Computers and Accessibility, 2023

Abstract

Camera-based text entry using American Sign Language (ASL) fingerspelling has become more feasible due to recent advancements in recognition technology. However, there are numerous situations where camera-based text entry may not be ideal or acceptable. To address this, we present FingerSpeller, a solution that enables camera-free text entry using smart rings. FingerSpeller utilizes accelerometers embedded in five smart rings from TapStrap, a commercially available wearable keyboard, to track finger motion and recognize fingerspelling. A Hidden Markov Model (HMM) based backend with continuous Gaussian modeling facilitates accurate recognition as evaluated in a real-world deployment. In offline isolated word recognition experiments conducted on a 1,164-word dictionary, FingerSpeller achieves an average character accuracy of 91% and word accuracy of 87% across three participants. Furthermore, we demonstrate that the system can be downsized to only two rings while maintaining an accuracy level of approximately 90% compared to the original configuration. This reduction in form factor enhances user comfort and significantly improves the overall usability of the system.

ToozKit: System for Experimenting with Captions on a Head-worn Display

Peter Feng, David Martin, Thad Starner

Published in UbiComp/ISWC '23 Adjunct: Adjunct Proceedings of the 2023 ACM International Joint Conference on Pervasive and Ubiquitous Computing & the 2023 ACM International Symposium on Wearable Computing, 2023

Abstract

The advent of Automatic Speech Recognition (ASR) has made real-time captioning for the Deaf and Hard-of-Hearing (DHH) community possible, and integration of ASR into Head-worn Displays (HWD) is gaining momentum. We propose a demonstration of an open source, Android-based, captioning toolkit intended to help researchers and early adopters more easily develop interfaces and test usability. Attendees will briefly learn about the the technical architecture, use-cases and features of the toolkit as well as have the opportunity to experience using the captioning glasses on the tooz HWD while engaging in conversation with the demonstrators.

2022

Preferences for Captioning on Emulated Head Worn Displays While in Group Conversation

Gabriel Britian, David Martin, Tyler Kwok, Adam Sumilong, Thad Starner

Published in ISWC '22: Proceedings of the 2022 ACM International Symposium on Wearable Computers, 2022

Abstract

Head worn displays (HWDs) can provide a discreet method of captioning for people who are d/Deaf or hard of hearing (DHH); however, group conversations remain a difficult scenario as the wearer has difficulty in determining who is speaking and where to look. Using an HWD emulator during a group conversation, we compare eight DHH users’ perceptions of four conditions: an 80 degree field-of-view (FOV) HWD that pins captioning text to each speaker (Registered), a HWD where the captioning remains in the same place in the user’s visual field (Non-registered), Non-registered plus indicators as to which direction the current speaker is relative to the user’s line of sight (Indicators), and a control of captions displayed on a Phone. Preference increased in order of Phone, Non-registered, Indicators, and Registered. While an 80 degree FOV HWD is not practical to create in a pair of normal looking eyeglasses, pilot testing with 12 hearing participants suggests a FOV between 20 and 30 degrees might be sufficient.