By: Shannon Zimmerman
CEO, Sajan
The storage, portability and retrieval of translation memory (TM) are key to lowering costs, increasing quality and consistency, and shortening cycle time when obtaining translations. Language Service Providers (LSP) must be able to ensure their clients that they provide the technology to store translations for future re-use, as well as provide the tools to import and cleanse existing translation memory, and providing direct access to that memory. In this issue of Forms Talk we will examine the different elements that make up advanced data management in translation memory and why these elements are important to the corporate buyer.
Translation Memory
Most LSP’s now use some form of Translation Memory (TM). A translation memory, or TM, is a type of database that is used in software programs designed to aid human translators. It houses previously translated content with the intention of reusing said content. Some software programs that use translation memories are known as translation management systems (TMS). Translation memories are also commonly used in conjunction with dedicated computer assisted translation (CAT) desktop tools.
Translation memory was first created to serve the linguist. Database files were used by the linguists performing the translation tasks in order to select previously translated segments for re-use. From there, most LSP’s want to manage translation memory on behalf of their clients and they began to store translation memory and create context based files, as well as exchanging translation memory files with their linguists - extracting the appropriate translation memory files from a file library and emailing them to the linguist on a project-by-project basis. This created version control issues, as the LSP housed multiple TM’s in addition to the linguists saving those TM’s on their personal desktops. LSP’s simply needed to better manage the TM.
Translation Memory, although it has served its purpose, has proven to be insufficient as a translation technology solution. It was created for Translators to maximize productivity and increase revenues. Translation technology now has to be used at an enterprise level to be successful in the global market; which means that enterprise solutions, not desktop tools, must be employed.
Mandy LSP’s who claim "centralized" translation memory, are simply referencing the storing in one location of many different (disparate) TM files. True centralization is the storing of all translation memory data into one file, with everyone accessing that same file. Using the traditional TM method, single source segments can still be stored in multiple translation memory files, which therefore diminish the integrity of the "context" of the segments.
Consider the probability of the same source segment residing in several of those different database files. The linguist is then burdened with determining for the client what the intent of the source segment is in any one instance. Then consider the likelihood that there will be inconsistent translations of those same source segments. This (traditional) method of TM brings with it the following risks: linguists determining context resulting in higher cost and extended turnaround times for the linguists, inconsistent translations, lengthened cycle time and higher translation costs.
When one considers the use of words with multiple meanings, context becomes very important to accurately convey your intended meaning. For example:
"Make a better impression."
The word "impression" could refer to: an emotion or feeling, an image retained as a consequence of experience, the act or process of impressing, a humorous imitation of the voice and mannerisms of another person, an initial or single coat of paint, or an imprint of the teeth and surrounding tissues for use in making dentures.
With an enterprise-class Translation Management system that uses one central repository, this phrase can exist in both contexts without the worry of one getting used inaccurately, and without the linguist taking the time to determine which translation best suits the context. Automated processes can determine that prior to the linguist receiving the project.
Advanced translation memory data management serves to optimize translation memory in terms of re-use, accuracy and compliance. This type of data management is leading edge in the industry, and offers clients greater re-use through the centralized storage, and more accurate re-use through the contextual indexing. The greater re-use dramatically decreases translation costs, reduces project cycle time and significantly improves brand consistency. Furthermore, this advanced data management allows clients the ability to conduct trend analysis, compliance analysis, and the tracking of all iterations of each segment within their central repository of language translations. Many LSP’s are challenged in the latter due to inefficient data management capabilities. It is important to discuss this in detail with your LSP as this will directly impact your costs and cycle times.
Portability
Due to the different databases residing on different servers, a variety of file formats have emerged. Proprietary formats generally emerged from LSP’s thus the exchange between formats becomes extremely important. From this, the Translation Memory Exchange (TMX) standard was established in 1998. The TMX standard allows clients flexibility and control over their TM asset, and also provides access to a broader pool of service providers; promoting competition.
Clients should be sure to confirm with their service providers that they are using the TMX standard within their TM. It is worth noting, however, that a vendor can technically comply with the TMX standard, but include proprietary tags that can degrade when passed to another vendor.
Multilingual Search
The key to maximizing the value of translation memory is aligning the data you search with the algorithms you use to search it.
Fundamentally, all algorithms in the translation space that mine translation memory for matches are trying to accomplish the same thing.
- Find exact matches: Matches that are 100% the same as the search parameter.
- Find approximate matches: Matches that are nearly the same and can be adapted easily by a linguist at a lower rate to say the exactly the same thing as the search parameter (typically called "fuzzy matches).
Finding exact matches is relatively simple, and most databases and search algorithms find exact matches well. The differentiation is in the algorithms ability to find linguistically useful approximate matches. Scientifically you can define approximate matching as: "The task of searching for sub-strings of text which are within a predefined edit distance threshold of a given pattern."
In the translation industry, the most common approximate matching algorithm used is the Levenshtein edit distance, usually represented not as a raw calculation but as a percentage of the overall length of the string.
The Levenshtein algorithm is calculated by counting the minimum number of insertions, deletions or substitutions required to make one string (or sentence) the same as another string.
In some cases the Levenshtein algorithm does not properly measure the value of a candidate as an approximate match. Specifically the Levenshtein algorithm (especially when used as a percentage value):
- Over-penalizes some strings based on length
- Doesn’t consider non-translatable sub strings
- Doesn’t consider word order or linguistic relevance
- Doesn’t consider context
- Scalability concerns
An advanced search and match engine optimized for the language translation industry designed to out-perform traditional translation memory technologies such as Sajan’s TMate™ Search Technology can increase accuracy, consistency and compliance as well as decreasing cost and increasing capacity for high volume.
- Mate™ Search Technology has a learning agent which improves traditional search methods, making them relevant to the translation industry.
- The learning agent in TMate™ increases fuzzy match rates by 10-60% over traditional translation search and match methods.
- It is a more advanced algorithm designed specifically for the reuse of multilingual content.
- TMate™’s high performance multilingual index was built to support the enterprise.
Translation Memory is a necessity in going global. But the old adage "garbage in, garbage out" is still applicable. There are better methods out there for storing and retrieving translation memory, it is up to the client to make sure their preferred LSP is using the most innovative technology to provide best of breed storage (centralized) and open access to their TM.
Coming up in the next issue of Forms Talk, Sajan will examine transparency to the entire translation process through Business Intelligence (BI) tools.
Recent Comments