Omniscien Technologies
Industry | Localisation, eCommerce, Travel, Enterprise and Government |
---|---|
Founder | Gregory Binger, Dion Wiggins, Bob Hayward |
Headquarters | Singapore |
Number of locations | Singapore, Thailand, The Netherlands |
Key people | Gregory Binger, Dion Wiggins, Philipp Koehn, Andrew Rufener |
Products | Language Studio Machine Translation and Language Processing Platform |
Services | Automated translation, custom machine translation engines, language processing |
Website | http://www.omniscien.com, http://www.languagestudio.com |
Omniscien Technologies (formerly Asia Online) is a privately owned company delivering machine translation and language processing software and services. The company is backed by individual investors and institutional venture capital. Omniscien Technologies is headquartered in Singapore, with R&D operations in Bangkok, Thailand, and European operations based out of The Hague, The Netherlands. The firm was founded in 2007 by Prof. Dr. Philipp Koehn, a leading scientist in the field, Gregory Binger a technologist and IT/IP lawyer, and former Gartner senior analysts Bob Hayward and Dion Wiggins.[1]
The firm delivers professional machine translation solutions for the localisation industry as well as government, eCommerce and large Enterprise customers based on statistical machine translation (SMT) technology as well as the emerging neuronal machine translation (NMT) technology. Omniscien Technologies supports in excess of 540 global language pairs in 12 industry domains.
The firm's statistically and neuronal based translation software employ recent advances in automated translation as well as extensive data manufacturing technologies. Until the early 1990s, almost all production-level machine translation technology relied on collections of linguistic rules to analyze the source sentence, and then map the syntactic and semantic structure into the target language. Its current approach uses statistical and / or neuronal techniques from cryptography, applying machine learning algorithms that automatically acquire statistical models from existing parallel collections of human translations, in the same way as Google Translate and the systems made using Koehn's own open source Moses tool for SMT.
Differences from other approaches
Google, Microsoft and SDL Language Weaver and others have also created SMT and more recently NMT systems, some publicly accessible. The specific difference in Omniscien Technologies approaches are:
- Clean data: The traditional approach leveraged content found on the web in corporate sites, news articles and other similar sources where the same content was available in multiple languages: this gives low-quality data. Asia Online has focused machine and human resources in this area to ensure that the data is as clean and as accurate as possible. The company's data is sourced from high-quality translations provided by book publishers and translation companies, and is aligned at the segment level (usually sentences) and converted into a consistent format in order to be processed by the learning software. This step includes extracting segments from files and documents if they are not in a TMX format. Then the extracted sequence are aligned—and processed by machines, with humans used to validate the accuracy.The data is converted to a base UTF-8 encoding for training the SMT system, small subsets are extracted to guide training, and finally the data is reviewed, cleaned, and analyzed.
- Multiple domains: the system allows for training in many domains, by extending a base set of information with multiple additional learning sources, including tuning for specific writing style
- Real-time corrections
The firm currently has more than 540 language pairs available in a baseline form and is progressively deploying 12 domains across each language pair. In addition, Omniscien Technologies offers in excess of 100 Industry Engines that can be used "off the shelf". Currently supported languages are the Asian languages Arabic, Chinese, Hindi, Japanese, Bahasa Indonesian, Bahasa Malay, Korean, and Thai; and the European languages Bulgarian, Czech, Danish, Dutch, English, Estonian, Finnish, French, German, Greek, Hungarian, Irish, Italian, Latvian, Lithuanian, Maltese Norwegian, Polish, Portuguese, Romanian, Slovak, Slovene. Spanish, Swedish, Russian, Ukrainian, Tagalog, Bahasa Indonesian, Burmese, and Vietnamese. The additional Asian languages Bengali, Gujarati, Punjabi, Tamil, and Urdu are under development.
Their systems are currently used to build customized translation systems for corporate and language service provider (LSP) customers who add their bilingual parallel corpus to the existing data to create higher quality translation systems.
The company characterizes its products as a "platform", a suite of independent tools and products that can work independently and together. Some are locally installed and some are only available in their SaaS. This is described in the CSA blog entry.
The Language Studio product suite was reviewed by Common Sense Advisory, a translation industry market research firm, in their Global Watchtower blog shown in the link below.
See also
References
- ↑ https://www.omniscien.com
External links
- Omniscien Technologies Homepage
- Omniscien Technologies Corporate Profile
- Language Studio Platform Overview
- Omniscien Technologies changes its name
- Omniscien Technologies releases Industry Engines
- CSA Global Watchtower Blog entry on Language Studio Platform
- CSA Global Watchtower Blog entry - The Largest Translation Project…So Far
- TAUS Technology Review of Language Studio
- GizMag Article on Asia Online