WIPO is offering a Fellowship in Neural Machine Translation

Learn More and Apply

The Fellow shall work under the supervision of the Technical Support Officer, in close collaboration with the rest of the ATAC team.

2. Duties and responsibilities

The Fellow will be required to perform the following principal duties:

I)Neural Machine Translation (NMT)
Make the best use of state-of-the-art NMT frameworks such as Marian NMT and Run, maintain and improve pipelines for data processing, data augmentation, domain modelling, model training and evaluation Use language and domain aware techniques such as data augmentation, use of placeholders and factored models to annotate, pre-process, and post-process our data and leverage both industry-standard and state-of-the-art tools, frameworks, and architectures to create the best possible NMT models

II) MT quality estimation
Compare different MT models using state-of-the-art metrics and use feedback from human evaluation to further improve our engines, and introduce new automatic metrics to capture the feedback. Also help in developing further quality estimation metrics for our MT models

III) Integration and deployment of MT and NLP tools

Work on different tools and interfaces to consume MT: graphical user interfaces and APIs, targeting segment, paragraph, and document level translation, and supporting different formats such as plain text, XML and HTML. Deploy WIPO Translate on different environments, such as local servers, cloud servers and containers

IV) Develop methods to collect and clean training data

Have knowledge of techniques and frameworks to filter, clean and align documents, both for patent documents and other kinds of documents such as meeting proceedings. Work on data augmentation and combining various sources of data, such as in- and out-of-domain corpora, synthetic data, and human translations. Define workflows for updating NMT models using newly published documents using techniques such as incremental training, online training, and domain adaptation

You should have extensive knowledge of deep learning architectures focused on text processing, both sequence to sequence and regression, such as RNN and transformer, along with popular implementations such as Marian NMT or the transformers library. You should possess strong programming skills, preferably in Python and Java with familiarity with Unix/Linux environments.

Learn More and Apply