Cyprus Builds Its First AI That Understands Cypriot Greek

Cyprus Builds Its First AI That Understands Cypriot Greek

A three-person team develops the island’s first speech-recognition system for a dialect ignored by big tech

Cyprus has reached a major technological milestone: for the first time, artificial intelligence can accurately understand and transcribe Cypriot Greek, a dialect that global AI systems have consistently failed to recognise.

The achievement comes from a small local team led by AI product manager Igor Akimov, who worked with two university interns over the summer to build what had not existed anywhere in the world—no dataset, no model, no tools, not even ten hours of publicly available Cypriot audio. The entire project was completed on a budget of just $150, in contrast with the billions invested by major technology companies that still do not support the dialect.

Their system, called Voice of Cyprus, marks a breakthrough for a linguistic community that until now has been invisible to artificial intelligence.

A Dialect Missing From the AI World

Cypriot Greek is widely spoken across the island yet has remained out of reach for AI voice assistants, automated phone systems and translation tools. Siri, Alexa and most commercial speech-to-text engines struggle so significantly with the dialect that Cypriots routinely switch to standard Greek or English just to be understood.

“Imagine calling a social service to ask for an appointment,” Akimov explained. “These systems simply couldn’t help Cypriots unless they changed the way they naturally speak. Voice-based AI services were not built for the local population.”

Akimov felt this challenge personally. As a non-native speaker learning Greek, he often found it difficult to follow Cypriot dialect at conferences, cultural performances and even parent-teacher meetings. Attempts to combine existing translation technologies produced limited results. “All of them completely failed with Cypriot Greek,” he said. “It was a disaster.”

He also heard similar frustrations from locals whose voice assistants could not recognise their speech. That is when the idea became a mission: build the missing technology.

Building From Zero: No Data, No Models, No Precedent

The team, Akimov, Hussein Khadra and Nikita Markov, came together through a Research and Innovation Foundation initiative that provided access to two interns for the summer. None of them was Cypriot, but local speakers volunteered their time to validate transcriptions and guide linguistic accuracy.

The biggest obstacle was immediate and daunting: there was simply no data.

The team searched across universities, libraries, broadcasters and research organisations but found little usable material. Some researchers reported losing older data. Others requested significant fees. Many declined to share what they had. Even Meta, which collected data for 1,600 languages, had zero hours of Cypriot speech.

“So, we had nothing to start with,” Akimov recalled. “We decided to gather everything available, TV shows, radio broadcasts, podcasts, audiobooks, and slowly built the largest Cypriot Greek speech collection ever assembled.”

After the model’s release, Mozilla confirmed that it had ten hours of Cypriot Greek recorded with academic involvement, but the dataset was not available to the team during development as it was being prepared for a global release.

Despite the scarcity, Akimov’s team managed to collect around 300 hours of audio and launch an open platform, voiceofcyprus.org, where anyone can help refine the model by validating transcripts.

Training the First Cypriot Greek AI

With limited high-quality paired data (audio plus verified transcription), the team designed a multi-phase training process:

  • First, the model absorbed natural Cypriot speech patterns: tone, rhythm, accent, and unique phonetics.

  • Then it was refined with clear recordings from news and radio programmes.

  • A language model, KenLM, acted as an intelligent reading assistant, suggesting the most likely words and improving accuracy.

  • Finally, human corrections from volunteers fed back into the training loop, helping the system gradually learn the dialect more faithfully.

The entire operation cost just $150, made possible by affordable cloud GPUs that can now be rented for as little as $2 per hour. “What once required a supercomputer can now be done in days,” Akimov said.

While the current version is not production-ready, it proves the technology works. With more validated data, the team says a world-class model is within reach.

Transformative Applications Across Cyprus

The implications of a fully developed Cypriot Greek AI are far-reaching:

  • Healthcare – Automatic transcription of patient speech, especially valuable for older adults, directly into medical systems without manual typing.

  • Education and culture – Digitisation of oral archives, preservation of the dialect, and support for linguistic research.

  • Business and customer service – Voice agents and automated phone systems that can finally understand Cypriots speaking naturally.

  • Government services – More accessible public hotlines and digital services that no longer require standard Greek or English.

  • Conferences and events – Real-time transcription and translation for speakers, especially where simultaneous interpretation is unavailable or costly.

Akimov plans to showcase the system at the Cyprus AI Forum in Limassol, offering live speech transcription with multilingual translation—a demonstration of how quickly the technology could enhance local events.

A Vision for Cyprus and Beyond

If resources were unlimited, Akimov knows exactly where he would start.

“I would record dialect speech from every corner of Cyprus. Collect poems, audio recordings, TV shows—everything that hasn’t been digitised yet,” he said. With a small development team, he believes Cyprus could soon have:

  • a state-of-the-art speech-to-text engine,

  • a Cypriot Greek text-to-speech system, and

  • a fine-tuned language model that understands the country, its culture and its identity.

“This would unlock any AI service—from voice assistants to chatbots and AI agents that truly serve the country,” he added. “Without good data, we’re just copying what exists elsewhere, without Cypriot identity.”

The methodology also serves as a blueprint for other underrepresented languages. “We wanted to understand how to work with dialects that don’t have data. This can be replicated anywhere.”

Call for Public Participation

The Voice of Cyprus platform now depends on community involvement. Validating even a few minutes of audio can meaningfully improve the model.

“Please visit voiceofcyprus.org,” Akimov urged. “Ten or fifteen minutes really does make a difference. We want every Cypriot to speak naturally, and still be understood by technology.”

The team is also open to collaborations with AI researchers, universities, companies and enthusiasts who want to help transform this foundational work into fully fledged national technology.

What the Team Has Released So Far

Video explaining how it was built and why it matters:
🇬🇧 https://youtu.be/zN_FMIWRSLA
🇬🇷 https://youtu.be/hcoXFNVP6L4

Loader