Artificial Intelligence is beginning to profoundly impact the documentation and analysis of indigenous languages, with Google Translate taking on a diversity of African languages and dialects.
More African languages are finding their way to Google’s online translation service, as the giant search engine integrates Artificial Intelligence to learn closely related languages.
During 2024, the search engine has made its largest foray into the translation of African languages – and seen the highest number of new languages added to the service – ever.
“We’re using AI to expand the variety of languages we support. Thanks to our PaLM 2 large language model, we’re rolling out 110 new languages to Google Translate, our largest expansion ever,” said Google Translate Senior Software Engineer Isaac Caswell.
This development marks a pivotal moment in popularising indigenous languages and facilitating the development of a comprehensive local linguistic resource.
Nearly a quarter of all the recently added languages on the platform are African, and Africa now has more than 50 languages in the translation service.
The new language additions include Dholuo, spoken by Kenya’s fourth largest ethnic group, the Luo, with more than 4.2 million speakers across several Nilotic ethnic groups in Egypt, Sudan, South Sudan, Ethiopia, Northern Uganda, eastern DRC, and Tanzania.
Another is Afar, a tonal language spoken by 2.3 million people in Djibouti, Eritrea, and Ethiopia. Google noted that of all the languages in this launch, Afar had the most volunteer community contributions.
Another addition is N’Ko, a standardized form of the West African Manding languages which unifies many dialects into a common language. Its unique alphabet was invented in 1949, and it has an active research community that develops resources and technology for it today.
Tamazight (Amazigh), a Berber language spoken across North Africa, is another important new addition. Although there are many dialects, the written form is generally mutually understandable. It uses Latin and Tifinagh script, with Google Translate supporting both.
“Google Translate breaks down language barriers to help people connect and better understand the world around them. We’re always applying the latest technologies so more people can access this tool,” Caswell explained.
Other African languages added this year include Fon, Kikongo, Ga, Swati, Venda and Wolof.
In 2022, Google added 24 new languages worldwide using Zero-Shot Machine Translation, where a machine learning model learns to translate into another language without seeing an example.
While Google said languages have an immense amount of variation, ranging from regional varieties to dialects and different spelling standards, making it almost impossible to pick the “right” variety, its approach prioritized the most commonly used varieties of each language.
“PaLM 2 was a key piece to the puzzle, helping Translate learn languages closely related to each other more efficiently. As technology advances, and as we continue to partner with expert linguists and native speakers, we’ll support even more language varieties and spelling conventions over time,” explained Caswell.
According to Google, these new languages represent more than 614 million speakers, opening up translations for around 8% of the world’s population. Some of these are major world languages with over 100 million speakers, while small Indigenous communities speak others. A few languages have almost no native speakers but are undergoing active revitalization efforts.
Swahili is the most widely spoken African language, with the United Nations placing the number of speakers at over 200 million. In 2021, the UN designated July 7 as World Kiswahili Language Day. Kenya hosts this year’s event under the theme “Kiswahili, Multilingual Education and the Enhancement of Peace.”
Organisers of the event, the East Africa Community and Kenya government, said the annual event offers a platform for Kiswahili stakeholders to share knowledge, research-based evidence, best practices, experiences, and worldviews on the role of Kiswahili education in promoting a culture of peace.
The East Africa Community Deputy Secretary General (DSG), Andrea Aguer Ariik, who oversees Infrastructure, Productive, Social and Political Sectors, emphasized the significance of language diversity and unity in the EAC.
“Kiswahili, as a widely spoken language in East Africa, not only bridges communication gaps but also represents a common identity among the member states of the EAC,” said Ariik in a statement.
And Google is not alone in this field. Young African scholars studying abroad are also rising to the challenge with similar initiatives leveraging the power of AI.
Ife Adebara, a programmer and scholar at the University of British Columbia’s linguistics department, is among those leading initiatives to deploy AI in preserving local languages, focusing on African languages.
Her project, Afrocentric Natural Language Processing, aims to raise awareness and develop tools and programs accessible to speakers of African languages such as Swahili and Zulu.
The project has already birthed two language identification programs online. SERENGETI, Massively Multilingual Language Models for Africa and AfroLID, a neural Language ID toolkit covering 517 African languages and varieties, utilises a multi-domain web dataset manually curated from 14 language families and five orthographic systems.
There are over 2,000 living languages in Africa. Nigeria is home to the most, with 522 languages, according to research firm Statista.
The research firm places Cameroon (with 275 languages) and the Democratic Republic of Congo (with 217) as countries with the second and third most languages used and spoken by people on the continent.
Credit: Conrad Onyango, Bird Story Agency