REAL WORLD PROJECT

AI-TRANSCRIPTION RESEARCH

Team

What I did

Usability Testing

Accessibility Testing

Data Analysis and Visualization

What I used

Python (Google Colab)

Microsoft Excel

Windows Narrator

Usability Guides

Context

Type: Organization Project

Duration: 1 month

Organization: Fanshawe College

Time: Feb. 2024

TL;DR

Fanshawe College is looking to replace digital recorders and their subscription to Note Taking Express (NTE) with an all-in-one Artificial Intelligence (AI) application. We conducted controlled tests on seven transcription platforms and three types of recorders to determine the accuracy and quality of transcriptions. Transcription data from each controlled test was examined using Levenshtein distance and bilingual evaluation understudy (BLEU) algorithms. The results indicated that the three most consistent digital transcribers are Notability, Notta, and Rev, and the most accurate recording technology overall is the digital recorder.

Highlights

Introduction

Fanshawe College provides digital recorders for students who need to record lectures or conversations for transcription. This service is offered through a partnership between Counselling & Accessibility Services and the Inclusive Technology Centre. In addition, the college also offers a subscription to Note Taking Express (NTE) for transcription purposes.

The Sony ICD-PX470 is the most commonly loaned digital recorder by the College. It costs CAD 99.99 and comes equipped with 4GB built-in memory, various audio filters to eliminate background noises, options for two file formats (lossy and lossless), an S-microphone to capture distant or quiet sounds, and an adjustable microphone range for clear voice recording.

NTE is a third-party service offering professionally typed summary notes, transcriptions, and meeting minutes. Students use the free NTE application to record and upload lectures or meetings, and can also include visual materials, such as photos or slideshows. A request is sent for transcription and students receive high-quality notes in as little as 24-48 hours. NTE is inexpensive for the College and has a phone application and a web-based hub where students can upload their recordings and access transcriptions.

Who can access services

Students who are registered with Counselling and Accessibility and require note-taking accommodations are eligible for this service. Digital recorders are available to students either on loan or covered by an accessibility bursary. Apprenticeship students can expect a transcription turnaround time of 24 hours, while other post-secondary students can expect a turnaround time of 48 hours. Counselling and Accessibility has a designated NTE person who helps students create an account and assists with onboarding.

THE PROBLEM 

Students cannot take notes or interact with them while in a classroom setting. NTE either provides summary notes or transcription, but not both.

Many students end up using tools like Read&Write to generate summaries and have them read aloud in software, such as Google Docs. Students are looking for a product or service that can provide them with low-effort transcriptions and summary notes. How might we identify and recommend the most effective AI transcription tools to streamline real-time note-taking for students? 

TESTING
AI-Transcription Usability

GLEAN

Glean is a note-taking tool for transcribing audio, attaching labels and notes, and importing lecture slides. Users can record or upload audio, adjust playback speed, and enhance audio quality. Glean focuses on helping students take, refine, and review notes, aiming for full class engagement.

User-Interface & Experience
Glean's dashboard is minimal and user-friendly, featuring a navigation menu with Home, Collections, Tasks, and search. Users can record or upload audio for transcription, import slides, and add notes, images, tasks, or definitions. Lightning Mode allows quick note-taking with hotkeys. Transcriptions aren't editable but snippets can be posted to the Notes Feed.

Compatibility
Glean is compatible with Chromebooks, Mac, Windows, and Linux devices running Google Chrome or Microsoft Edge. It also works on iOS (12.0+) and Android (7.0+) mobile apps, and can be used offline. However, transcription requires an internet connection.

OTTER.AI

Otter.ai is an AI-powered tool for real-time transcription of meetings and lectures, integrating with Zoom, Teams, and Google Meet. It captures slides, provides summaries and outlines, and allows interactive editing, commenting, highlighting, and adding images to transcriptions.

User-Interface & Experience
Otter.ai offers a clean, collaborative interface. Users can search, sync calendars, record, and import files. Otter Chat provides platform assistance. Personal notes and transcriptions are in My Conversations. For collaboration, there are channels and direct messaging. The interface has Summary and Transcript tabs, with auto-generated keywords, action items, and content summaries. Users can chat, comment, and add notes alongside the transcription.

Compatibility
Otter.ai integrates with iOS (version 13.0 or later), Android (version 6.0 or later), Google Chrome, Slack, and Zoom.

MESSENGER PIGEON

Messenger Pigeon is a note-taking and AI transcription app. Users can record or upload audio for instant transcription, make notes, and use Grammarly for spelling and grammar. Professionally created summary notes are available for $10.80 per hour, delivered within 48 hours. The Pro account includes an AI Study Assistant with designed prompts.

User-Interface & Experience
Messenger Pigeon has a clean, dark, distraction-free interface but relies heavily on icons for navigation, which can be challenging. The dashboard shows all audio files and transcriptions, but accessing them via the kebab menu (⋮) is unintuitive. On the transcription page, users can customize the layout, adjust text size, use a rich-text editor for notes, and modify audio playback. However, recordings cannot be deleted, and the platform feels unfinished with some bugs.

Compatibility
Messenger Pigeon integrates with iOS (version 13.0 or later), Android (version 5.0 or later), Mac, Windows, Google Chrome, and Microsoft Edge.

NOTABILITY

Notability is a note-taking app for Apple devices, allowing handwritten notes, sketches, math equations, and use of templates. Users can annotate imported slides or textbook pages. Notability Plus offers advanced features like handwriting-to-text and math-to-LaTeX conversion. Users can record or import audio for transcription.

User-Interface & Experience
Notability is a clean, easy-to-navigate note-taking app for Mac, iPad, and iPhone. It features a hamburger menu with tabs for all notes, recent notes, favorites, and unfiled notes. The Gallery tab offers templates and user-created notes. Users can write text, attach images, gifs, stickers, and photos, and record or import audio for transcription. Audio features include playback speed adjustment, tuning, and voice amplification. Users can search and copy transcripts easily. Apple's accessibility features enhance user interaction.

Compatibility
Notability is compatible with iOS (version 15.0 or later) and macOS (version 12.0 or later).

ONE NOTE

Microsoft OneNote is a versatile note-taking software that allows users to organize notes into notebooks, sections, pages, and sub-pages. It supports adding text, handwritten notes, drawings, and attachments anywhere on the page. OneNote includes a browser extension for clipping web information and supports audio and video recording with instant transcription.

Compatibility
Microsoft OneNote is available to download on Windows and Mac computers, on iOS (version 16.0 or later) and Android (version 9.0 or later) devices, and as an Office 365 web app (Safari, Google Chrome, Microsoft Edge, Mozilla Firefox).

REV

Rev offers AI and human transcription, subtitles, and captions for audio and video. This report focuses on the AI transcriptions, which process within five minutes and provide transcriptions, key insights, and summaries. Users can create new speakers, add notes, comment, and highlight sections of the transcription. Rev also integrates with Grammarly.

User-Interface & Experience
Rev offers a clean and user-friendly interface with a search function, a folder system for organization, and customizable AI transcription settings for language and transcription style. It provides a resource hub for assistance, and makes accessing, moving, downloading, and sharing transcriptions easy. The interface lets users assign speaker names, take notes, revert versions, adjust audio playback, and use a beta transcription summary feature. Users can edit transcriptions, make comments, highlight or strikethrough text, and use keyboard shortcuts for increased efficiency.

Compatibility
Rev’s free mobile app is compatible with iOS (version 13.0 or later), macOS (version 11.0 or later), and Android (version 8.0 or later). The web version of Rev is officially supported by Google Chrome and Mozilla Firefox. Safari is not recommended and Rev is not supported in Internet Explorer.

NOTTA

Notta is an AI-powered transcription service that lets users record or import audio and video from sources like Dropbox and Google Drive. It offers live meeting transcription and features manual editing, find and replace, and image insertion. Pro users can customize speaker names, adjust playback speeds, skip silence, and use AI Notes templates for summaries and action items. Transcriptions can be exported in .txt, .docx, .srt, and .pdf formats.

User-Interface & Experience
Notta has a user interface similar to other platforms and offers features like calendar syncing, audio recording, file importing, live meeting transcription, and video recording. It includes a 'Quick Find' search function for recordings and folders. Many features are paywalled, limiting testing of some capabilities. The transcription page is simple but allows editing, note-adding, and access to the beta AI Notes feature, which generates summaries and outlines. Users can use a command list for headings, text, lists, and to-do items, and can add notes with colored labels. Playback speed adjustment and merging of adjacent speaker blocks are also available.

Compatibility
Notta integrates with web browsers (including a Chrome extension), virtual meeting platforms (Zoom, Microsoft Teams, WebEx), calendar applications (Outlook, Google Calendars), and file-sharing apps (Google Drive, Dropbox). The free mobile app is compatible with iOS (version 11.0+) and Android (version 7.0+).

TIME TO INVESTIGATE
Platform Accessibility

DESIGNING
Testing Methodology

 Controlled Test 1 

The objective of this methodology is to evaluate seven transcription platforms and three recording devices to determine the most suitable fit for Fanshawe College students. A 117-word passage was recorded using a digital recorder (Sony ICD-PX470), a cellular device (OnePlus 9), and a laptop (Dell Latitude 5520), at the Library Learning Commons desk to simulate a noisy environment. These recordings were then transcribed using Glean, Otter.ai, Messenger Pigeon, Notability, Rev, Notta, and OneNote.

For analysis, Google Colab was used to write code in Python that preprocesses the transcriptions by converting them to lowercase and removing punctuation and date ordinals. Levenshtein Distance was used to calculate word error rate (WER) and word accuracy (WAcc), while the Natural Language Toolkit (NLTK) library generated BLEU-4 scores to assess transcription quality.

 Controlled Test

A second experiment tested the transcription efficacy of a digital recorder (Sony ICD-PX470) and a mobile recorder (OnePlus 9) in a classroom. The computer recorder was excluded due to its inconvenience for students, lack of access, and previous poor performance.

The speaker read the same control paragraph, with the recorders placed on a desk 8 feet away and within a 30-degree angle. Using the same NLTK and Python libraries, transcriptions were cleaned and analyzed for word error rate, word accuracy, and BLEU-4 scores.

The test evaluated transcription performance at a distance. The null hypothesis (H0) stated no change in word accuracy between recorders as distance increases. The alternative hypothesis (H1) predicted decreased word accuracy with distance, favoring the digital recorder for higher-quality transcriptions.

 Limitations 

The small sample size included one English voice recording using a digital recorder, mobile recorder, and computer recorder. Future tests should encompass various accents, dialects, and voices (male, female, gender-neutral), different speech patterns, distances between recorder and speaker, diverse speaker ages, and group discussion settings to evaluate transcribers in dynamic environments.

The recording was short at 117 words and 1:05 minutes, potentially leading to shorter, less detailed summaries. Future tests should use longer recordings in classroom settings to better assess transcription accuracy and quality. Additionally, different mobile devices should be tested as microphone quality can vary by model and age.  

Analysis
Analysis
Analysis
Analysis
Analysis
Analysis
Analysis

Transcription data from each controlled test was examined using three different metrics as follows:

 Levenshtein Distance 

Word error rate (WER) and word accuracy (WAcc) assess the precision of a machine transcription. Based on the results of the first controlled test (see Figures 1 and 2), the most accurate transcriptions (average WAcc between all recording formats) came from Notability (86.89%), Notta (86.61%), and Rev (80.34%). 

To compare, the most accurate transcriptions (average WAcc between the digital recorder and mobile recorder) in the second controlled test (see Figures 1 and 3) came from Notability (74.36%), Rev (73.93%), and Otter.ai (66.97%). There was some overlap in the top performers; however, Otter.ai outperformed Notta in the classroom test.

Figure 1: Average Word Accuracy (%) for Tests 1 & 2

Figure 2: Word Accuracy (%) Comparing All Platforms (Test 1)

Figure 3: Word Accuracy (%) Comparing All Platforms (Test 2)

 BLEU-4 SCORES 

BLEU-4 scores assess the quality of transcription. Data analysis of BLEU-4 overall scores for the first test (see Figure 4) and interpretation in Appendix C suggests that all platforms have produced audio transcriptions of high quality that either meet or surpass human transcription (≥ 60). The top platforms, determined by the average of their overall BLEU-4 scores, include Notability/Notta (83%), Rev/Glean (77%), and OneNote (75%). To compare, BLEU-4 data from the second controlled test (see Figure 5) indicates that Notability (80%), Rev (78.6%), and Otter.ai (74.3%) were the top platforms tested.

Figure 4: BLEU-4 Scores Comparing All Platforms (Test 1)

Figure 5: BLEU-4 Scores Comparing All Platforms (Test 2)

The overall BLEU-4 scores provide an overall evaluation of a test, which includes evaluating from 1-gram to 4-gram. If we aim to conduct a more rigorous test, we can focus on the 4-gram scores. These scores measure the order and accuracy of four consecutive words in a machine-generated sentence, thus capturing more detailed sentence information, meaning, and context. Based on data from the first controlled test, the platforms that consistently measured ≥ 60 include Notability, Notta, Glean, Rev, and OneNote (see Figure 6). 

Figure 6: BLEU-4, 4-Gram Scores Across All Platforms 

 SID Analysis 

SID analysis counts changes (substitutions, insertions, deletions) in transcriptions to gauge accuracy, aiming for minimal alterations. Perfect transcription is challenging due to factors like speech variations, background noise, recorder quality, and transcription technology. The authors analyzed these changes in two controlled tests. 

In the first test (see Figure 7), Notability and Notta had the lowest overall SID scores, indicating that their performance was the most robust regardless of the type of recorder used. Messenger Pigeon, on the other hand, had the highest score and the highest number of substitutions and insertions across both tests. Based on SID analysis, Messenger Pigeon appears to be the least reliable transcription platform. Overall, the results of the first test indicate that Notta and Notability are the most accurate platforms. Notability, however, is available only for iOS and macOS, while Notta is platform-independent. The mobile recorder performed very well from a close distance.

Figure 7: Test 1 SID Analysis 

Figure 8: Test 2 SID Analysis 

In the second test (see Figure 8), Rev and Notability had the lowest overall SID scores and Messenger Pigeon once again had the highest overall score. Notability achieved a low and consistent score for both digital and mobile recordings, indicating that it accurately handles a variety of recorders and makes it a great choice for those without access to a digital recorder. The second test proved the alternate hypothesis to be true, as the digital recorder performed better than the mobile recorder almost every time while at a distance. Overall, the results of the second test indicate that Rev is the most accurate audio transcription platform and that the digital recorder performed best from a further distance.

 Key Takeaways 

Based on SID analysis, the top transcription platforms are Notability, Rev, and Notta and the most accurate recorder is the digital recorder, followed by the mobile recorder and computer recorder. 

The outcomes of the SID analysis support the Levenshtein Distance and BLEU-4 calculations, which measure word accuracy and transcription quality. Notability, Notta, and Rev are the top-performing platforms in all tests (refer to Figure 9), though Glean showed good results in the BLEU-4 tests.

Figure 9: Overall Transcription Platform Results 

REFLECTIONS
Future Scope & Conclusion

Seven digital recording and transcription platforms, along with three types of audio recorders were tested and analysed to determine their accuracy and quality of transcriptions. The study also tracked the number of changes made to the machine transcription while transcribing audio. The results of the analysis supports Notability, Notta, and Rev as the most accurate and high-quality digital transcription platforms. For shorter distances from the speaker, mobile recorder performs just as brilliantly or better as digital recorder, while for larger distances, digital recorder takes the crown.


Students who require note-taking accommodations and use NTE need a platform that provides a summary of transcribed notes. Of the top performers, Rev and Notta provide AI-generated summaries, and Notability does not. In the future, it would be beneficial to conduct testing that examines the accuracy and quality of summaries generated by AI from longer transcriptions. Although Notability has the best accessibility features and some of the best accuracy and quality scores, it is only available on Apple devices. Rev had better accessibility options than Notta, a simpler user interface, and can be used on many different platforms.


Future testing may include a pilot of the selected platform(s) and usability testing, including journey mapping and/or think aloud testing. These tests will assess the user journey of using the platform and highlight a range of experiences, including highlights and pain points.

Get complete PDF report on AI-transcription research

Other Projects

Redesigning McGill Information Studies Student Association's (MISSA) logo

How might we develop the information architecture for a first-ever website of an enrichment school? 

How might we develop the first-ever website of an enrichment school?