Back to News Hub
⚙️IEEE Spectrum AI
May 21, 2026
Society & Culture

Māori Text-to-Speech Model Spurns Big Tech's Values

Overview

A new Māori text-to-speech model has been developed to preserve the indigenous language te reo Māori, which is often misrepresented by big tech companies. This initiative, led by Te Taka Keegan and his student Kingsley Eng, emphasizes the importance of community ownership and accurate representation of the language's unique features.

Key Takeaways

  • Te reo Māori is an indigenous language spoken by only 4.3 percent of New Zealand's population, yet about 30 percent can speak some words.
  • Big tech companies have used Māori language data without permission, leading to concerns about ownership and representation.
  • The new text-to-speech model focuses on the Waikato-Maniapoto dialect, ensuring local control and accurate pronunciation.
  • Challenges in developing AI voice models for te reo Māori include its low-resource status and unique linguistic features like vowel length.
  • The project aims to serve as a blueprint for other minority language communities seeking to develop their own digital tools.

Stats & Key Facts

  • #4.3 percent of New Zealand's population speaks te reo Māori fluently.
  • #30 percent of New Zealanders can speak some te reo Māori.
  • #The final dataset for the text-to-speech model consisted of 7 hours of recorded audio.
Māori Text-to-Speech Model Spurns Big Tech's Values

The Importance of Te Reo Māori

Te reo Māori is not just a language; it is a vital part of New Zealand's cultural identity.

  • Te reo Māori is one of New Zealand's three official languages.
  • The language serves as a primary conveyor of Māori knowledge and culture.

Despite its low number of fluent speakers, te reo Māori plays a crucial role in the identity of the Māori people. The language encapsulates cultural narratives and traditions that are essential for the community's heritage.

Challenges with Big Tech

The use of Māori language data by big tech companies raises significant ethical concerns.

  • AI models like ChatGPT and Claude use Māori language data without consent.
  • These companies do not provide ownership of the generated outputs to the Māori community.

The unconsented use of Māori language data by major tech firms highlights a broader issue of cultural appropriation and the need for indigenous voices in technology. This lack of representation can distort the language and its meanings.

Developing the Māori Text-to-Speech Model

The creation of a text-to-speech model for te reo Māori was a community-driven initiative.

  • Te Taka Keegan and Kingsley Eng led the development of the model.
  • The project focused on the Waikato-Maniapoto dialect to ensure cultural relevance.

The development process involved recording local dialects and ensuring that the synthetic voice accurately represented the language's unique features. By prioritizing community input, the project aimed to create a tool that reflects the true essence of te reo Māori.

Linguistic Features of Te Reo Māori

Te reo Māori has distinct linguistic features that pose challenges for AI models.

  • Vowel length is critical in distinguishing meanings of words.
  • Digraphs in te reo Māori are pronounced differently than in English.

For instance, the words 'keke', 'kēkē', and 'kekē' differ only by vowel length, which can completely change their meanings. Such nuances highlight the complexity of accurately representing the language in AI systems.

Future Implications for Minority Languages

The project has broader implications for minority language communities worldwide.

  • It serves as a model for other indigenous languages seeking digital representation.
  • The emphasis on local ownership can empower communities to control their linguistic resources.

By establishing a framework for developing AI tools that respect and preserve minority languages, this project could inspire similar initiatives globally. It underscores the importance of community involvement in technology that affects their cultural heritage.

Frequently Asked Questions

What is te reo Māori?

Te reo Māori is the indigenous language of the Māori people in New Zealand and one of the country's three official languages.

Why is the new text-to-speech model important?

The model is important because it ensures that the Māori community retains ownership and control over their language and its representation in technology.

What challenges do AI models face when working with te reo Māori?

AI models face challenges due to the low-resource status of te reo Māori and its unique linguistic features, such as vowel length and digraphs.

How was the dataset for the model created?

The dataset was created by recording a local speaker, Ngaringi Katipa, reading various texts, resulting in 7 hours of audio data.

Can this model be replicated for other languages?

Yes, the project aims to provide a blueprint for other minority language communities to develop their own digital tools.

This initiative represents a significant step towards linguistic sovereignty for the Māori people.

Continue Learning

Originally published by IEEE Spectrum AI
Read the original

Comments

Sign in to join the conversation