Announcing our latest research update OCTAVERead more
Article

Controlling your computer with voice: Windows, MacOS, and AI Solutions

Published on January 22, 2025

Imagine seamlessly controlling your computer with just your voice, like a scene from a sci-fi movie. You could launch applications, compose emails, and navigate the web—all without touching a keyboard or mouse. This is the transformative power of voice control, a technology revolutionizing how we engage with our devices. Beyond enhancing accessibility for individuals with disabilities, voice control offers a more efficient and intuitive way for everyone to interact with their computers.

Voice Control Features Built into Popular Operating Systems

Windows

Windows has a rich history of voice control capabilities, and the latest iteration, Windows 11, enhances this with Voice Access, allowing for hands-free PC operation. To enable it, navigate to Settings > Accessibility > Speech and toggle on Voice Access. Once activated, the Voice Access menu will appear at the top of your screen. You can execute commands like “Open Start,” “Open File Explorer,” or “Open Settings” to access apps and menus. To switch between open windows, simply say “Switch to,” followed by the app's name.

Voice Access also supports creating custom voice shortcuts. To set one up, say “Show commands” or “What can I say” to open the command list. From there, select Voice shortcuts in the sidebar and click Create new shortcut. Name your shortcut and define the desired action, such as launching an app, navigating to a folder, or executing a sequence of keystrokes.

For text input, Windows 11 offers Voice Typing. Ensure you’re online, have a functioning microphone, and place your cursor in a text box. Press Win + H twice to activate Voice Typing. To stop, say “Stop listening” or click the microphone icon on the floating window. You can also edit as you go by saying commands like “Delete that” or “Scratch that” to erase your last input. A convenient feature is automatic punctuation, which adds commas and periods without needing to dictate them explicitly.

Additionally, Windows provides a comprehensive set of Speech Recognition commands to control various computer functions, offering a hands-free, efficient, and accessible computing experience.

Command

Action

Start

Open Start

Press Windows C

Open Cortana

Press Windows S

Open Search

Right-click

Perform an action in an app

File

Select an item by its name

Click Recycle Bin

Select an item or icon

Double-click Recycle Bin

Double-click an item

Switch to Paint

Switch to an open app

Scroll up

Scroll in one direction

New paragraph

Insert a new paragraph or new line in a document

Select word

Select a word in a document

Correct word

Select a word and start to correct it

Delete word

Select and delete specific words

What can I say?

Show a list of applicable commands

Refresh speech commands

Update the list of speech commands that are currently available

Start listening

Turn on listening mode

Stop listening

Turn off listening mode

Move speech recognition

Move the Speech Recognition microphone bar

Minimize speech recognition

Minimize the microphone bar

In 2022, Microsoft launched Voice Clarity, a feature aimed at improving the quality of voice communication, especially in noisy settings.

Starting in September 2024, Windows Speech Recognition, available in earlier versions of Windows, will be replaced by Voice Access. Although Microsoft has not confirmed plans for Speech Recognition in Windows 10, they may choose to bring Voice Access to Windows 10 or retain Speech Recognition without providing further updates.

macOS

macOS includes a built-in voice control feature that allows you to operate your Mac using voice commands. To enable it, go to the Apple menu > System Settings > Accessibility > Voice Control and toggle it on. Once activated, you can use various commands to control your Mac.

To view a list of available commands, say “Show Commands.” You can also say “Show items numbers” to display numbers next to selectable items on the screen, then select an item by saying its corresponding number.

Starting with macOS Sonoma, an interactive guide is included to help you get started with Voice Control. You can access it by navigating to Apple menu > System Settings > Accessibility > Voice Control > Open Guide.

For users with visual impairments, Voice Control works seamlessly with VoiceOver, macOS’s built-in screen reader. This powerful combination allows full control of your Mac and navigation using voice commands. For example, you can say “VoiceOver rotor,” “VoiceOver read all,” or “VoiceOver select first item” to perform specific actions.

Linux

Linux provides a range of voice control options, including both built-in features and third-party tools. One such tool is Voice2Json, an open-source system that integrates voice commands into existing applications or Unix-style workflows.

Another option is Google2Ubuntu, which leverages the Google Speech Recognition API. To install it, simply open a terminal and run the following command:

sudo add-apt-repository ppa:benoitfra/google2ubuntu
sudo apt-get update
sudo apt-get install google2ubuntu

Linux supports voice control in 67 languages. Some popular voice command applications for Linux include:

  • Julius: An open-source speech recognition engine for real-time voice command processing.

  • CMU Sphinx: A robust speech recognition system that supports various languages and can be customized for specific tasks 11.

A typical Linux voice assistant consists of these core components:

  • Speech-to-Text (STT): Converts spoken words into text using automatic speech recognition (ASR) technologies.

  • Natural Language Understanding (NLU): Interprets the meaning behind the transcribed text by identifying intent and extracting relevant information 12.

  • Dialogue Management: Determines the appropriate response or action based on user intent and context.

  • Text-to-Speech (TTS): Synthesizes natural-sounding speech to deliver responses back to the user.

Third-Party Voice Control Software

Besides built-in features, there are third-party voice control software options:

Software

Platform

Key Features

Pros

Cons

Dragon by Nuance

Windows, macOS, iOS, Android

Customizable dictation, high accuracy, supports multiple languages

Widely regarded as the most accurate and reliable voice control software, offers a wide range of features and customization options

Can be expensive, requires some training

Otter

Windows, macOS, iOS, Android, Web

Collaboration-focused, real-time transcription, speaker identification

Offers a free plan with limited features, good for collaborative work and meetings

Accuracy may not be as high as Dragon, limited features in the free plan

Notta

Windows, macOS, iOS, Android, Web

Automated transcription, high accuracy, supports multiple languages

Claims 99.9% accuracy, offers a variety of features for transcription and note-taking

Relatively new software, pricing may be a concern for some users

Other notable options include:

  • Braina: Allows dictation into third-party software and websites, filling web forms, and executing vocal commands.

  • Tazti: Enables users to create speech command profiles to play PC games and control applications.

  • Voice Finger: Improves the Windows speech recognition system by adding several extensions.

Free alternatives are also available:

  • Microsoft Dictate: Integrated into Microsoft 365 apps like Word, OneNote, and Outlook.

  • Google Docs Voice Typing: Built into Google Docs in Chrome.

  • Windows Speech: Built into Windows.

  • Apple Dictation: Built into Mac and iOS.

  • Gboard keyboard: Offers voice typing on mobile devices.

Limitations of Voice Control for Computers

While voice control is promising, it has limitations:

  • Accuracy: Accuracy can be affected by noisy environments, accents, and the clarity of the speaker's voice.

  • Speed: Voice control can be slower than typing for complex tasks or when extensive editing is required.

  • Privacy: Voice control software may record and store voice data, raising privacy concerns. It's crucial to understand the privacy policies of the software you use.

  • Security: Voice control can be vulnerable to security risks, such as replay attacks where a voice is replicated to gain unauthorized access.

It's also important to understand the distinction between voice recognition and speech recognition:

  • Voice recognition focuses on identifying the speaker based on their unique vocal characteristics.

  • Speech recognition focuses on transcribing spoken words into text and understanding the content of the speech.

Reviews and Comparisons of Different Voice Control Software

Dragon Professional Individual is widely regarded as the leading voice control software, praised for its exceptional accuracy, reliability, and user-friendly design. Microsoft Word's integrated dictation feature is also a competitive option, offering strong accuracy and smooth compatibility with Microsoft 365.

In one review, Dragon NaturallySpeaking outperformed IBM ViaVoice, excelling in accuracy, speed, and stability, while IBM ViaVoice was noted to be slower and more error-prone.

Another comparison between Windows 10 Voice Recognition and Dragon Professional 16 highlighted Dragon's superior accuracy, intuitive interface, and advanced voice commands, enabling faster and more efficient dictation.

Tutorials on How to Set Up and Use Voice Control on a Computer

Windows

To set up Speech Recognition in Windows, press Windows logo key + Ctrl + S to launch the "Set up Speech Recognition" wizard. Follow the on-screen instructions to complete the setup. If microphone issues arise during the process, the wizard will display them and provide options to help resolve them.

To enable Voice Access in Windows 11, navigate to Settings > Accessibility > Speech and toggle on the Voice access switch.

You can improve your computer's ability to recognize your voice in Speech Recognition by retraining it. To do this, go to Control Panel > Ease of Access > Speech Recognition > Train your computer to better understand you and follow the prompts.

Speech Recognition in Windows offers two activation modes:

  • Manual activation mode: Speech Recognition turns off when you say "Stop Listening" and must be turned on by clicking the microphone button or pressing Ctrl + Windows keys 28.

  • Voice activation mode: Speech Recognition goes into sleep mode when not in use and can be activated by saying "Start Listening".

macOS

To enable Voice Control on macOS, go to Apple menu > System Settings > Accessibility > Voice Control and turn on the toggle switch 5.

Accessibility Features Related to Voice Control

Voice control is an essential accessibility tool, enabling individuals with mobility impairments to operate their computers without the need for a keyboard or mouse. Voice-activated systems also enhance accessibility in physical spaces by allowing users to adjust lighting, temperature, and other environmental settings through voice commands.

On iOS devices, you can toggle Voice Control on or off by saying “Hey Siri.”

For Android devices, the Voice Access app provides multiple ways to activate voice control. You can enable the Activation button in the Voice Access settings, allowing you to start and stop voice control by tapping the button. Alternatively, you can use an activation key by connecting a physical key or Bluetooth switch to your device and configuring it within the Voice Access settings.

The Future of Voice Control in Computing

The future of voice control is likely to be influenced by advancements in AI technologies like Claude and EVI. Claude, Anthropic's advanced reasoning model, is being developed to autonomously control computers and take dictation. Together with Hume AI's Empathetic Voice Interface 2 (EVI 2), a voice-to-voice system that converts spoken commands into text, Claude can translate voice commands into actions using its Computer Use capabilities.

This synergy between Claude's reasoning power and EVI's voice interface has the potential to enable more intuitive and natural interactions with computers. Imagine communicating with your computer as though speaking to a human assistant—issuing complex instructions and receiving nuanced, context-aware responses. Such capabilities could revolutionize technology use, making it more accessible, efficient, and user-friendly for everyone.

Although Claude and EVI are still in development, their integration marks a significant step forward in the evolution of voice control. As these technologies continue to advance, they promise to deliver even more sophisticated and seamless voice control integration into our daily digital experiences.

Conclusion

Voice control is a transformative technology that has the potential to redefine how we interact with computers and other devices. While it still has limitations, it is advancing rapidly. Thanks to continuous improvements in AI and natural language processing (NLP), voice control is becoming increasingly accurate, reliable, and versatile. As this evolution continues, we can expect voice control to expand its presence in various areas of our lives, from managing computers and smart homes to engaging with businesses and accessing information.

The benefits of voice control are significant, particularly in terms of accessibility, productivity, and user experience. It enables individuals with disabilities to interact with technology in innovative ways while offering everyone a more intuitive and efficient means of controlling their devices. As the technology matures, it is likely to become an even more essential part of our digital ecosystem.

This is an exciting time for voice control innovation. We encourage you to explore the voice control features on your devices and discover the convenience and accessibility they bring to everyday tasks.

Subscribe

Sign up now to get notified of any updates or new articles.

Recent articles