The Fellow shall work under the supervision of the Technical Support Officer, in close collaboration with the rest of the ATAC team.
2. Duties and responsibilities
The Fellow will be required to perform the following principal duties:
I)Neural Machine Translation (NMT)
Make the best use of state-of-the-art NMT frameworks such as Marian NMT and Run, maintain and improve pipelines for data processing, data augmentation, domain modelling, model training and evaluation Use language and domain aware techniques such as data augmentation, use of placeholders and factored models to annotate, pre-process, and post-process our data and leverage both industry-standard and state-of-the-art tools, frameworks, and architectures to create the best possible NMT models
II) MT quality estimation
Compare different MT models using state-of-the-art metrics and use feedback from human evaluation to further improve our engines, and introduce new automatic metrics to capture the feedback. Also help in developing further quality estimation metrics for our MT models
III) Integration and deployment of MT and NLP tools
Work on different tools and interfaces to consume MT: graphical user interfaces and APIs, targeting segment, paragraph, and document level translation, and supporting different formats such as plain text, XML and HTML. Deploy WIPO Translate on different environments, such as local servers, cloud servers and containers
IV) Develop methods to collect and clean training data
Have knowledge of techniques and frameworks to filter, clean and align documents, both for patent documents and other kinds of documents such as meeting proceedings. Work on data augmentation and combining various sources of data, such as in- and out-of-domain corpora, synthetic data, and human translations. Define workflows for updating NMT models using newly published documents using techniques such as incremental training, online training, and domain adaptation
You should have extensive knowledge of deep learning architectures focused on text processing, both sequence to sequence and regression, such as RNN and transformer, along with popular implementations such as Marian NMT or the transformers library. You should possess strong programming skills, preferably in Python and Java with familiarity with Unix/Linux environments.