This page is under active development. We have made it public to facilitate conversation and engagement on this topic.

Approaches discussed here continue to evolve based on feedback from industry and academic leaders.

Empowering Communities Through

Localized Voice Technology

Introducing the Voice Initiative

The positive impact of computers and the internet in expanding access to opportunities, education, and knowledge is well established. Unfortunately, this benefit remains largely inaccessible to those who can’t read and write due to limited literacy skills.  In Pakistan this represents over 42% of the adult population, amounting to more than 142 million individuals.

To address this challenge, we are developing foundational technology that allows individuals to interact with computers through human-like voice conversations in their local languages.

Our state-of-the-art technology will be available to Pakistani enterprises and startups, empowering them to significantly extend their reach and impact on the Pakistani economy.

A teenager having a voice conversation with a computer in Urdu on the topic of antibiotics.

Existing Voice Technology

Advancements in machine learning have now made it possible for computers to engage in human-like conversations through voice. These voice technologies, already deployed at scale, are working with remarkable accuracy in high-resource languages such as English.

Due to recent breakthroughs in self-learning and transfer-learning approaches, it is now feasible to approach the same level of accuracy and robustness in lower resource languages like Urdu, Punjabi, Pashto, Sindhi and Balochi.

Below, we showcase a selection of Urdu voice technologies currently in development at different companies. Our motivation here is to demonstrate that a future where every Pakistani can effortlessly interact with technology through voice is not a distant dream but a very achievable reality.

Video Block
Double-click here to add a video by URL or embed code. Learn more

Text to Speech

Computers can speak formal Urdu text in human-like voice, however, current solutions can struggle with the nuances of informal conversational Urdu. For other Pakistani languages like Balochi there are no reasonable text-to-speech solutions.

We are committed to enhancing the state-of-the-art in text-to-speech for all Pakistani languages. This commitment involves dedicated efforts in data gathering and model training, in collaboration with regional communities and organizations, to ensure that our technology meets the diverse linguistic needs of Pakistan.

Video Block
Double-click here to add a video by URL or embed code. Learn more

Speech Recognition

Leading English speech recognition systems achieve over 95% accuracy in many practical settings, but speech recognition for Pakistani languages significantly lags behind. This shortfall is mainly due to the limited diversity in existing training data, which is largely composed of formal speech and does not reflect the everyday speech of the Pakistani population.

We believe we can quickly get to 90% accuracy or better in Pakistani speech recognition — making this technology usable in day-to-day settings. We're building the world’s largest speech dataset for Pakistani languages, comprising 54,000 hours of weakly supervised speech data per language — representing voices from all segments and regions of Pakistan.

Video Block
Double-click here to add a video by URL or embed code. Learn more

Translation

Leading translation systems for various Pakistani languages have already demonstrated remarkable results. However, enhancing their effectiveness in technical and educational settings, such as mathematics, physics, health, and agriculture, remains a challenge. These settings often involve multilingual scripts, non-standard terminology, and extensive contextual information, making it harder to achieve accurate and fluent translations into the target language.

Large language models show promise in solving this challenge, provided that we can generate substantial digital data demonstrating how these topics are taught in non-English Pakistani schools.

Video Block
Double-click here to add a video by URL or embed code. Learn more

Conversational AI

Existing AI systems have revolutionized access to information through intuitive dialog-based interactions. They are used daily by millions of people for a variety of topics, ranging from personal health and school homework to agriculture, tax laws, coding, and creative writing. Currently, these systems are optimized for English.

Integrating these AI systems with localized voice technology will expand their reach and make knowledge easily accessible to those who lack literacy skills. Developing workflows to regularly update these systems with the latest regional information — such as updates on disease outbreaks, navigation of welfare programs, and customized expert farming advice — will empower users to make informed decisions in critical matters and improve their lives.

Our Work

With the right technical investments, the diverse population of Pakistan will be able to access knowledge, education, healthcare, e-commerce, job opportunities and more through intuitive voice interfaces. We are committed to making this a reality at an accelerated pace.

Leading global software companies are making impressive advancements in voice technologies. However, we believe that true progress in this field hinges on a nuanced understanding of the local culture and on fostering close partnerships with local organizations and communities. Therefore, the most significant advancements will be made when these technologies are developed from within the communities they serve, rather than relying solely on external companies.

Our work is focused on the following key verticals:

Training and Evaluation Corpora

We are creating the world's largest corpora of training and evaluation data for Pakistani languages. These corpora cover a wide range of tasks, including speech recognition, translation, and language comprehension. To achieve this, we are assembling a workforce of tens of thousands of contributors.

Our approach, focused on curation, counters the limitations of automated data gathering methods like large-scale data mining. These methods often fail to recognize the significant underrepresentation of Pakistani knowledge, culture, and language online — a consequence of limited digital literacy and internet access in many regions. We believe our dedicated effort in data curation is essential and will play a pivotal role in accelerating the development of robust voice technology tailored for Pakistan.

Community and Cultural Adaptation

This vertical emphasizes understanding and integrating the cultural nuances and societal norms of Pakistani communities into our voice technology applications. It involves partnering with local organizations, social scientists, and cultural experts to develop AI systems that are not only linguistically accurate but also culturally resonant and contextually relevant.

This commitment to cultural alignment enhances our technology's market acceptance and lays the groundwork for sustainable adoption across diverse Pakistani communities.

Application Development for Key Sectors

Sector-specific voice-enabled applications, targeting critical areas like education, healthcare, e-commerce, and employment.

This vertical involves collaborating closely with experts and leaders in these sectors, working with them to build new solutions or enhance the reach and accessibility of their existing solutions using voice technology.

A new inexperienced farmer asking for advice

A kid learning how to ball reverse swing

A person wondering if there are easy ways to reduce motorbike fuel cost