Our new spellchecking framework, HULQ (which stands for Harmonized Universal Language Quality) has been launched and updated to include support for 195 locales. Diversity and inclusivity for low-diffusion languages is the main focus of this initiative, including languages which are traditionally left out of the digital conversation such as Guaraní, Navajo, Palauan and Tshiluba. Many more are in the pipeline, with our roadmap aiming to complete support for 300 locales by the end of 2025.
lexiQA’s brand new Terminology Assistant (TerAs) is an AI-powered tool that helps you identify and fix all those issues that prevent your glossaries from being reliable resources. Whether it’s bilingual or multilingual glossaries, repositories of terms from disparate sources that you don’t have the time or the expertise to deal with, we can help you get them in shape so that you can make the most of them both in CAT tools or in stand-alone terminology applications.
A not uncommon issue when translating out of English into languages that use transliteration is how to treat proper names. Transliteration often means that a proper name needs to be phonetically transcribed using the target language’s script, and when you can’t be sure how the name sounds in English this can be a problem. Our new feature in our Chrome extension for XTM users helps you deal with this problem by highlighting in the source text area the words that you would like to hear and then with a single click on a headphone icon play a recording of what those words sound like in English.
One-to-one terminology checks are officially a thing of the past for all our supported locales. After last year’s introduction of our new morphological terminology engine, another major makeover of this proprietary hybrid mechanism means that now we are now combining its input with a privately-hosted LLM for increased accuracy and better performance across the board. Moving forward, language support will be expanding in parallel with the rest of our portfolio for QA and spellchecking. The new engine will be on our production servers on the 1st of November of this year.
Major dictionary updates across all our supported locales, including retraining of all our models.
A very common problem in morphologically challenging languages is the extensive use of transliteration for all sorts of proper names but also common names. The lack of standardisation in how transliteration is done means that in Cyrillic, Indic and SE Asian languages spellcheckers will produce an abundance of warnings which will be hard to decipher; this is the problem our new transliteration engine aims to address.
Three more languages now have morphology support for terminology checks: German, Danish and Dutch. The next ones in line are Norwegian, Swedish and Finnish.
Major dictionary updates across all our supported locales.
After more than 18 months of R&D, lexiQA’s glossary checks get a complete makeover with a new terminology mechanism that respects each language’s grammatical and terminological principles, taking accuracy and performance to the next level. This first release provides support for English, French, Italian, Spanish, and Portuguese - with more languages to be added to the portfolio in the coming releases.
lexiQA’s checks are now available to users of Gridly is a multi-language content management and localization platform focusing mainly on games and other digital products. A major new challenge was addressed with this integration: how to run QA in a dynamic environment with multiple language pairs simultaneously. The lexiQA add-on is now available to Professional and Enterprise clients in Gridly. More details here
Fine-tuned our NER model for all supported entities.
Updated dictionaries and retrained spellcheckers for most of our supported locales.
Support for another five new locales has been added: Armenian (hy-AM), Haitian Creole (ht-HT), Kazakh (kk-KZ), Kirundi (rn-BI) and Sorani (cb-IQ). As with the locales added in our last two releases, QA support for these locales comes bundled with brand new neural spellcheckers using our proprietary technology. We now provide locale-specific checks in a landmark 150 locales (in 68 languages).
Various improvements made for an array of locale-specific checks, including word repetitions in Bengali, spacing around foreign characters in Chinese locales, and handling exceptions for file extensions in non-Latin scripts.
Major dictionary updates across all our supported locales.
We have added support for five new locales: Azerbaijani (az-AZ), Belarusian (be-BY), Kurmanji (ku-TR), Kyrgyz (ky-KG) and Nepali (ne-NP). Our QA support for these locales comes bundled with brand new neural spellcheckers using our proprietary technology. The total number of supported locales now comes to 145 (in 63 languages).
Our API now also supports JWT (JSON Web Token) as an authentication method. This approach safeguards the exchange of JSON files between lexiQA and its users and provides a simple framework for securely transmitting information.
lexiQA’s Quality Evaluation API is now connected to the Phrase API and our scoring mechanism can also be applied on projects shared via Phrase workflows.
A new integration in Crowdin’s translation editor now allows users to benefit from lexiQA’s checks in real time. The lexiQA app is now available in the Crowdin app store
Updated dictionaries and retrained spellcheckers for many of our supported locales.
Support for six new locales has been added: Assamese (as-IN), Georgian (ka-GE), Javanese (jv-ID), Oromo (om-ET), Sinhala (si-LK) and Uzbek (uz-UZ). Our QA support for these locales comes bundled with brand new neural spellcheckers using our proprietary technology. This brings the total number of supported locales to 140 (in 58 languages), and we will continue to expand our portfolio with other low-resource locales for the rest of 2022.
In light of the upcoming consolidation of Memsource with Phrase, we have updated our integration with the latest of Memsource’s API updates.
Updated dictionaries and retrained spellcheckers for many of our supported locales.
We have developed a permanent exclusion list for Serbian vocabulary that should not be used in Croatian translations. From now on, if one of those blacklisted words shows up in a Croatian target text, a warning will alert the user to take the necessary steps to amend it.
You can now set attributes for Content Type and Client Name when creating a new project via API. These labels can be used to filter processed content in order to refine and more accurately track analytics for clients who manage multiple content streams.
Our integration with Transifex has been updated to reflect the changes recently implemented with Transifex’s API 3.0, on which you can find more details here
Major dictionary updates across all our supported locales.
Our range of Chrome extensions now includes versions available for Lokalise and XTM users. We have also extended the authentication system in our extension for Memsource, which now allows TWB Kató users to use our extension with the same credentials.
We have taken our error typology a step further and now our automated QA process is directly connected to the process of LQA. This new mechanism is available exclusively via API and it can be used independently of our QA checks to produce scorecards, using predetermined weighted-severity scales for each lexiQA-supported error type. The weighting can be calibrated for different content types - and this is the only manual step in the whole process. This is another first for lexiQA, bringing together quality assurance and quality assessment into a single, holistic quality management workflow.
Major dictionary updates across all our supported locales.
We took a new approach to our existing integration, and created a browser extension that is easier to incorporate into different workflows.
Our custom Named Entity Recognition (NER) model is now in production on demand. lexiQA’s NER is a sophisticated NLP algorithm that will learn and discover acronyms and named entities, like placenames and names of organisations, and remove them from spellcheck. This leads to a significant reduction in the noise produced by spellcheckers who can’t recognise such entities in the translation, even though they also exist in the source text.
Number handling has been redesigned in lexiQA’s conversion checks, yielding higher accuracy rates and fewer false positives.
Hybrid spellchecking mechanisms are now available for Bengali (RC1), Hindi, Marathi, Gujarati, Burmese, and Tamil. The brand new architecture is orders of magnitude more effective in detecting actual misspelled words in those challenging locales. More locales to be covered in the coming months.
We have also done major dictionary updates across all our supported locales.
We have extended the work that we reported in our previous release, by updating the spellchecking dictionaries in all supported locales with variants, inflected forms and common loan words.
We have introduced a number of improvements to our standard checks, especially for camelcase brandnames and capitalisation, abbreviations, and capitalised dates.
We have started developing (currently in alpha version) a sophisticated NLP algorithm that will learn and discover acronyms and named entities, like placenames and names of organisations, and remove them from spellcheck. This leads to a significant reduction in the noise produced by spellcheckers who can’t recognise such entities in the translation, even though they also exist in the source text.
Following work of many months in the background, we have put in production an innovative spellchecker for Thai which was built on the foundation of a new segmenter to address Thai’s continuous script and lack of punctuation. This will be continuously improved on and refined in the coming months, with the aim of eventually developing a similar system for Burmese.
We have updated the dictionaries in a number of supported languages to increase accuracy and address more edge cases, including inflected forms and common loan words from English.
We have refined our checks in a number of error classes, particularly for issues that have to do with months, quotation marks and non-breaking spaces.
Support for three new locales has been added: Persian (fa-IR) and Urdu (ur-IN and ur-PK), bringing the total number of supported locales to 134 (in 52 languages). Work continues for a few more challenging locales, like Basque (eu-ES), Punjabi (pa-IN), and Telugu (te-IN).
We have introduced a new granular check for Thai to reduce the number of false positives for punctuation marks which are not matched in this language (for example question marks don’t have a corresponding mark).
The dictionaries available in all the languages we support have been rehauled and a number of new languages have been added, thus increasing accuracy and coverage. Work is under way to extend coverage for more exotic locales which are currently lacking in “traditional” spellcheckers.
We have introduced a number of improvements to our checks, especially for numeral conversions, times and localised indexes.
The following new locales have been added to our portfolio: Bosnian (bs-BA), Catalan (ca-ES) and Macedonian (mk-MK), together with two more English locales (en-IN and en-ZA). This brings the total number of locales we support to 131 (in 50 different languages – another landmark for us). In the coming weeks we aim to conclude the necessary work for Basque (eu-ES), Persian (fa-IR), Punjabi (pa-IN), Telugu (te-IN) and Urdu (ur-IN and ur-PK).
We are now checking for unpaired/unmatched quotation marks in all the locales we support. API users can specify their preferred style of quotation marks which are used to check for correctly matched locale-specific marks. More information on how to use the API addition here
We have introduced a wide range of granular improvements to our checks involving month and day names (in order to address exceptions where a name might also have a different meaning in the same locale) and also inflected forms for our checks which target numerical conversions and indexes.
Support has been added for the following new locales: Afrikaans (af-ZA, af-NA), Bulgarian (bg-BG), Croatian (hr-BA, hr-HR), Gujarati (gu-IN), Hindi (hi-IN), Marathi (mr-IN) and Serbian for both Latin and Cyrillic scripts (sr-RS, sr-Cyrl-RS, sr-Latn-RS, sr-BA, sr-ME, sr-XK, sr-Latn-BA, sr-Latn-ME, sr-Latn-XK, sr-Cyrl-BA, sr-Cyrl-ME, sr-Cyrl-XK). This brings the total number of locales we support to 126 (in 47 different languages). Work is ongoing for a few other challenging locales, like Persian (fa-IR), Punjabi (pa-IN) and Telugu (te-IN) and from now on we will be turning our attention almost entirely to low-resource locales, especially from Asia and Africa.
Two new locale-specific checks have been added for the names of months and days of the week, covering long-date formats, consistency and inflections between any of the supported locales. This will now complement our existing checks for dates which already cover a wide range of potential errors.
Following the “pull-push” integration that we’ve had in place for over two years already with Transifex, we have completed work on a deeper level integration which now allows Transifex users to access lexiQA’s reports directly from the Transifex translation editor. This integration eliminates the need to pull and push files from within lexiQA’s UI, creating a seamless user experience and allowing multiple collaborators to review a resource with minimal set-up.
We have applied another round of stylistic changes across all areas of our UI in order to improve stylistic consistency. These changes are also now reflected in our documentation.
We have introduced a consistency check that matches segments where all words are fully capitalised. This is an issue that comes up regularly in legal and technical texts, where this style normally needs to be maintained in the translation as well, so we hope that this check will help our users find and correct such inconsistencies.
We have applied some minor stylistic changes to enhance stylistic consistency across all areas of our UI.
A great number of major improvements have been made in our locale-specific checks for Japanese and all locales of Chinese, especially for issues relating to numerals, indexes and conversions (numbers-to-words, dates, times, etc.).
The following new locales are now being supported: Burmese (my-MM), Hausa (ha-GH, ha-NE and ha-NG), Hebrew (he-IL) and Tamil (ta-IN, ta-LK, ta-MY and ta-SG). With the additional inclusion of Flemish (nl-BE) and Swiss Italian (it-CH), the total number of supported locales now comes to 106 (in 40 different languages). Work is progressing well for the development of Afrikaans (af-ZA), Gujarati (gu-IN), Hindi (hi-IN), Marathi (mr-IN) and Persian (fa-IR), among others, so our next release will include a number of challenging new locales.
A new mode has been introduced to our UI, which now allows users to batch-process (archive or delete) multiple projects in their account. This can be particularly useful if you’re managing numerous projects for different clients and locale combinations.
We have applied further stylistic and functional changes to ensure styling consistency across our UI. The main ones are the following:
A lot of additional work has been done to improve on the output of checks supported, particularly for right-to-left languages and Japanese. Improvements have been made on various error classes, with a focus on times and currencies.
We have now added support for Japanese (ja-JP) and three new locales of Chinese (zh-TW, zh-HK and zh-SG). We have also expanded our range to include all locales of Arabic (ar-AE, ar-BH, ar-DZ, ar-EG, ar-IQ, ar-JO, ar-KW, ar-LB, ar-LY, ar-MA, ar-OM, ar-QA, ar-TN, ar-YE) and Spanish (es-AR, es-BO, es-CL, es-CO, es-CR, es-DO, es-EC, es-GT, es-HN, es-NI, es-PA, es-PE, es-PR, es-PY, es-SV, es-UY, es-VE). The total number of locales supported now comes to 92 (in 36 different languages). We are currently working on Bulgarian (bg-BG), Burmese (my-MM), Hausa (ha-NG) and Hebrew (he-IL), among others, to be included in our next release.
As an exclusive feature for our users, we now have an online user guide, featuring all the locale-specific rules we apply in every locale we support and all the checks we have in place for punctuation, conversions, special characters, etc. The guide can be accessed from our Editor and QA report and it provides an inside look at how lexiQA’s engine runs.
Our API has been augmented to support appending segments to existing projects. More information can be found at https://api.lexiqa.net
We have applied further stylistic and functional changes in order to consolidate the work we started in the previous release, in order to ensure there is consistency across the board for user experience and convenience in our UI. We have also applied a few changes which were suggested to us by users (in our Editor and the Inconsistencies report). User suggestions are always welcome!
A lot of additional work has been done to improve on the output of checks supported, particularly for Asian locales such as Bengali, Chinese, Japanese, Korean and Thai. Improvements have been made on various error classes, with a focus on locale-specific conversions.
Support has been added for the following new locales: Bengali (bn-BD, bn-IN), Czech (cs-CZ), Hungarian (hu-HU), Slovak (sk-SK), Slovene (sl-SI), Tagalog (tl-PH), Vietnamese (vi-VN) and Swahili (sw-CD, sw-KE, sw-TZ, sw-UG). Along with the addition of another French locale (fr-BE), the total number of the locales we support comes to a total of 55. Preliminary work has already begun for more, amongst which we are looking at Afrikaans (af-ZA), Burmese (my-MM), Croatian (hr-HR), Hebrew (he-IL) and Hindi (hi-IN), and we are currently testing alpha versions for ja-JP (Japanese) and zh-TW (Traditional Chinese).
We have published a comprehensive knowledge base for lexiQA, which includes detailed process descriptions and instructional videos for our UI, proprietary Editor, QA report, productivity features, supported locales and API integrations, and also a refined search engine to make the experience of our users even smoother. The documentation is accessible here
We have taken the stylistic and functional changes we introduced in our previous release a step further to ensure that navigation and project management is now even easier for any user, and particularly those users that have dozens or hundreds of projects active at any time.
We have introduced a new section in our QA report which now features error statistics in an easily accessible dashboard with bar graphs for total, corrected and ignored errors. All statistics are updated in real time as your revision work progresses in lexiQA’s Editor, thus giving you the full picture every step of the way.
The QA report associated with projects in MateCat and our documentation guide have been revamped with all the information about our checks and supported languages.
When a Smartcat project is revised in our Editor and the revision is marked as completed, all updated segments in lexiQA are now automatically confirmed after being pushed back into Smartcat.
We have introduced a range of improvements to our algorithms supporting various errors classes, and especially numerical conversions and localization. These cover our full range of supported locales and this work will continue for future releases.
Following the end of May when the new GDPR came into effect, we have consolidated our security processes and all aspects of your work in lexiQA’s online environment are secure and protected. See here for more details.
A feature that has been in the works for a while, has now been tested exhaustively and is available in this release. In projects consisting of multiple files, you can now bulk review all files in a single Editor instance. This feature allows you to move from one file to the next without ever leaving the Editor and thus have better control of the corrections made during review in any of the project files. You can access the glued file in the Editor with a link available from the individual page of any project that has multiple files.
The following new locales have been added to our portfolio: ar-SA (Arabic), ko-KR (Korean) and th-TH (Thai), as well as a number of locales for English, French and German (such as en-AU, fr-CH and de-AT). The total number of the locales we support now has gone up to 42. A number of new ones are already in the works, including cs-CZ (Czech), ja-JP (Japanese), sk-SK (Slovak), sl-SI (Slovene) and zh-TW (Traditional Chinese).
Apart from now being able to support the mxliff bilingual format (which is native to Memsource), we have also revamped the interface which we provide for users to access their Memsource account and locate the files they would like to check in lexiQA. Especially for accounts with a large number of projects, it is now easier to look for specific projects with our filtered search and also organise and display your projects with our new paginated navigation.
A new feature has been added in the Inconsistencies report, making it easier to know exactly what part of a segment causes the inconsistency. A raw/diff button allows you to hide or visualise (highlighted) parts of the segment which trigger the inconsistency to begin with. These visual cues can speed up the process of deciding whether something needs to be changed and where in the segment.
Every menu and every button in our UI is now accessible through the keyboard, thus improving the experience of keyboard-only users.
A number of stylistic and functional changes have been applied in every page of our UI, to improve on UX and make information more easily accessible. The most important ones can be found on lexiQA’s homepage, where you can now also archive older projects and use toggle filters (‘in progress’, ‘completed’, ‘archived’, ‘pagination’) to streamline the content of your project table according to your preferences.
It is now easier to create a new project in lexiQA. All you need to do is drag and drop your files into the drag box and the system will automatically detect and analyse them to give you an overview of the basic information before you configure the other settings of your project.
If you have duplicate entries in a project glossary (where a different target language term corresponds to each of the duplicate source language entries), our terminology tooltip will now display all the available options in the Editor. This way you can see straight away what options you have for the translation you are reviewing.
We have taken batch-processing a step further with this new feature that allows you to check for inconsistent translations across multiple project files. Listed and filtered as source and target inconsistencies, inconsistent segments are grouped together (even when they come from different files) and you can see where each segment is coming from (with “Segment info”) or see its surrounding text in the file (with “See context”). More importantly, you can act on any of these segments without even having to access the Editor, by using “Apply selection”; this function allows you to apply a specific translation to all inconsistent segments in that group. Once again, no need for manual corrections!
The moment a reviewer completes a job in the Editor (by marking the review as “Complete”), the project owner receives an email notification telling them that the status of the job has been updated. This way, a project manager can always keep track of updates and manage the flow of information more easily.
This is a new feature which helps contain user actions to their area of responsibility. When a reviewer marks a review as “Complete” in the Editor, this file gets locked and can no longer be edited further. A modal will in fact redirect the user to the QA report for this file. At the other end, the project owner is the only user who can unlock this file (in the Project page). This way, project managers have full control of when a job is completed and no changes can be made to a file without them knowing it.
Added the following locales to our portfolio: sq-AL (Albanian), id-ID (Indonesian), ms-MY (Malay) and ro-RO (Romanian), bringing the total count of currently supported locales to 29. More to come in our next release, including ja-JP (Japanese) and ko-KR (Korean).
Following up on a client suggestion, we have added a new punctuation check. It is now possible to detect spaces after an opening tag or before a closing tag – these are of course redundant and should be eliminated. This is apparently a common issue with machine translation engines which arbitrarily add spaces in or around tags, so this check can be very useful for MT users.
Using the API link with your Memsource account, you can now also use the glossaries associated with your Memsource project and load them directly to your lexiQA project. Less time spent for file management, more time for actual QA.
We have changed the underlying technology of our Editor in order to handle much bigger files in a fraction of the time. It is now significantly quicker to analyse and process thousands of segments at a time, while still providing the same level of scrutiny with all the QA checks we support.
We have added a way to update your project’s parameters after creation. You can visit api.lexiqa.net and check /project/v1/update for more information.
Tags which are normally contracted in a CAT tool editor are now expanded in lexiQA’s Editor, making them fully visible and editable for all those cases when you want to make sure every little detail in your localization project is done right.
We have redesigned the way you can create a default profile with your preferred user settings and you can now have in your account multiple profiles with the same locale pair.
Connect through a modal to your account with any of these online translation platforms, select an active project and revise it in lexiQA’s Editor. Once you’re done revising, push all the updates back into the original file without ever leaving lexiQA’s environment.
Added the following locales to our portfolio: el-GR (Greek), et-EE (Estonian), lt-LV (Lithuanian), lv-LV (Latvian), nl-NL (Dutch), pl-PL (Polish), tr-TR (Turkish) and zh-CN (Simplified Chinese), bringing our supported locales to a total of 25.
Shareable online error report, featuring detailed project statistics and error classification which can also be downloaded for offline processing.
In every project page you now get segment statistics (total, initial with errors, current with errors, corrected) which help you keep track of how much work needs to be done before you start working on a revision.
When creating a new project, you can now get lexiQA to analyse your input files and look for potential untranslatables which can be selected in one of two modes (automatic or manual) to help keep the noise down – that is, the spellchecking false positives you might get in a project.
Two exciting additions to the Editor’s functionality. Have you found an error that repeats itself multiple times in the same file? You can choose to fix all instances of this error using the same correction, or you can choose to ignore all instances in one go if you think it’s not really an error. A real time-saver in long documents.
After completing a revision on a project you created by using a MateCat link, you can push all the updates directly back to your live MateCat project. Remember to refresh your MateCat project page in order for the updates to take effect.
When creating a new project you can now select multiple input files in one go, both from your computer or even across multiple project folders from one of our integration modals. Just remember that for the input to work correctly all files need to have the same locale settings.
Projects can be sorted by name, date or project status (when applicable) and you can also search by project or file name to get what you need more quickly.