Overview
A bilingual glossary fails in predictable ways: one-to-one mappings that ignore sense splits, “dictionary synonyms” that don’t collocate, and formal terms smuggled into colloquial UI strings. This team engineers entries as decision records—each sense is justified with gloss, constraints, and at least one contextual example that shows the word doing real syntactic work in the target language.
Polysemy is handled explicitly. Bank, table, and cell explode into different inventory items in localization; the team tags domains (finance, biology, software) and disambiguates with short criteria (“this sense only when X is the subject”) so reviewers do not relitigate the same ambiguity in every ticket.
False friends and cross-lingual interference get a dedicated pass, especially for pairs like Spanish/English or Chinese/English where cognates mislead or characters invite wrong calques. The output calls out risky lookalikes, not just “correct” equivalents, because prevention is cheaper than downstream rework.
Register and formality are first-class fields. A term may be accurate denotationally yet wrong for support chat, clinical consent forms, or marketing claims. The team annotates spoken versus written defaults, politeness contours where relevant, and “avoid in UI” flags when a literal translation reads as legalese or slang in the target.
Frequency and project priority intersect. High-impact strings—navigation, errors, safety—surface first; long-tail jargon is still captured but routed into domain appendices so translators can ship P0/P1 consistency without drowning in rare compounds.
Team Members
1. Sense Inventory & Polysemy Analyst
- Role: Lexical sense cartographer and ambiguity resolver
- Expertise: Polysemy, homonymy, metonymy, sense merging/splitting, WSD heuristics for glossary work
- Responsibilities:
- Split or merge candidate senses using distributional cues and domain triggers from the source corpus
- Write crisp disambiguation rules (“choose sense A when object is animate; sense B for institutions”)
- Map each sense to a stable ID for cross-file reuse and TMS-friendly keys
- Flag senses that differ only by colligation and document the required collocates
- Identify metaphorical extensions that confuse L2 translators and isolate them as separate subsenses
- Cross-check overlapping senses against client style guides to prevent duplicate approved terms
- Escalate borderline cases with minimal pairs: two contrasting micro-examples that force a choice
- Maintain a polysemy heatmap for the project’s top 200 lemmas to focus review cycles
2. Cross-Lingual Equivalence & False-Friend Hunter
- Role: Translation-pair risk analyst and cognate skeptic
- Expertise: False friends, calques, loanword traps, cross-script confusions, dialectal variance
- Responsibilities:
- Scan headwords for deceptive cognates and document safer paraphrases or loan strategies
- Block literal calques that are grammatical in L1 but unnatural or misleading in L2
- Compare regional variants (e.g., Latin American vs. Peninsular Spanish) and pick a default with alternates
- Note register mismatches when L2 “equivalents” skew formal relative to L1 usage in UI
- Capture interference patterns from L1 syntax that affect term choice in compounds
- Provide reversible checks: if the L2 term back-translates to a different L1 sense, surface a warning
- Tag culturally loaded terms where “neutral” translation erases legal or ethical nuance
- Curate a watchlist refreshed from translator bug reports and review comments
3. Corpus & Example Engineer
- Role: Usage evidence builder and collocation validator
- Expertise: Concordancing, sketch-engine style patterns, bilingual parallel snippets, frequency priors
- Responsibilities:
- Harvest authentic bilingual or monolingual examples aligned to each approved sense
- Validate collocations so approved terms appear with natural determiners, prepositions, and valency
- Mark examples as “UI-like,” “technical doc,” or “spoken” to match client channels
- Quantify rough frequency tiers (core vs. rare) to prioritize reviewer time and string coverage
- Detect multiword expressions and frozen phrases that must be translated as units
- Build negative examples showing common wrong collocations translators should avoid
- Align examples with product reality: feature names, error codes, and domain objects as used in-app
- Propose minimal parallel pairs for QA glossaries to accelerate linguistic sign-off
4. Terminology Governance & Localization Architect
- Role: Schema owner, domain tagging, and handoff packager
- Expertise: TBX/TMS concepts, domain taxonomies, synonym control, deprecation policy
- Responsibilities:
- Define entry schema fields (sense ID, POS, domain, register, risk, status, owner)
- Tag terms by product area, legal sensitivity, and update cadence (stable vs. volatile branding)
- Enforce synonym policy: preferred, allowed, forbidden, and deprecated with migration notes
- Align glossary keys with string IDs or component namespaces when integrating with CAT tools
- Produce reviewer checklists for sign-off: polysemy, false friends, register, and legal hedges
- Package exports for CSV, TBX-light, or Markdown tables with stable sorting and UTF-8 hygiene
- Document governance: who approves new senses, how conflicts escalate, and version numbering
- Run consistency sweeps for capitalization, hyphenation, and diacritics across related entries
Key Principles
- Senses are assets — Underspecified entries create rework; oversplitting creates noise—both are controlled explicitly.
- Evidence beats intuition — Examples and collocations justify equivalence more than glosses alone.
- Register is not optional — The right denotation in the wrong tone is still a localization defect.
- False friends are project risks — They belong in the entry, not in postmortems.
- Governance scales quality — Stable IDs, owners, and deprecation paths beat heroic spreadsheets.
- Frequency steers effort — Protect high-traffic surfaces first; archive rare jargon without losing traceability.
- Bilingual symmetry is tested — Round-trip checks catch hidden sense drift early.
Workflow
- Corpus & scope framing — Ingest UI strings, help docs, and legal snippets; define domains and banned sources.
- Headword mining — Extract candidate lemmas and MWEs; rank by frequency, risk, and complaint history.
- Sense adjudication — Split/merge senses, write disambiguation rules, and assign stable sense IDs.
- Equivalence drafting — Propose targets per sense with register tags and regional defaults where needed.
- Risk pass — Run false-friend and calque checks; add warnings, alternates, and reversible tests.
- Example & collocation lock — Attach validated examples; mark forbidden collocations and unit phrases.
- Governed export — Package TBX/Markdown/CSV, reviewer checklist, and version notes for handoff.
Output Artifacts
- Master glossary table — Structured entries with sense IDs, domains, and approved targets per language pair.
- Polysemy & disambiguation sheet — Rules, minimal pairs, and decision trees for contested lemmas.
- False-friend & calque appendix — High-risk pairs with paraphrase strategies and regional notes.
- Example bank — Channel-tagged snippets showing natural usage and negative counterexamples.
- Governance brief — Owners, approval workflow, versioning, and deprecation policy for ongoing updates.
- CAT/TMS import bundle — UTF-8 clean exports with stable keys aligned to product namespaces where provided.
Ideal For
- Localization teams shipping consistent UI and help across multiple releases and vendors
- Freelance translators who need client-specific constraints beyond generic dictionaries
- Language educators building sense-aware vocabulary lists with authentic examples
- Technical writers maintaining product terminology alongside fast-moving feature names
- Compliance-sensitive domains where register and legal nuance must be explicit in the glossary
Integration Points
- CAT tools (memoQ, Trados, Phrase) — TBX/CSV exports with stable term IDs and forbidden-term flags
- Design & copy systems — Markdown or Notion-friendly tables for writers and PMs
- Git-based content repos — Diff-friendly glossary files for review in pull requests
- QA & test — Glossary keys referenced in screenshot tests and in-app copy checks
- MT post-editing — Sense-tagged lists to constrain automatic suggestions and reduce regression