Euro-BioImaging ERIC is a European research infrastructure providing life scientists open access to advanced imaging technologies, expertise, training and data services through almost 300 imaging facilities distributed among 40 Nodes across Europe. The Euro-BioImaging Hub is distributed across three sites: the legal Seat (Finland), the Med-Hub (Italy), and the Bio-Hub hosted by EMBL in Heidelberg.
The Euro-BioImaging Access Portal is the infrastructure interface that enables users to browse, request and access imaging and image data technologies from an extensive and rich service portfolio. It is also the platform through which Euro-BioImaging and the service providing facilities manage the incoming access requests and user interactions. The AI4Access project will develop a “Research Navigator” (an LLM application) integrated into the Euro-BioImaging Access Portal to guide researchers to the right services, workflows and resources across the infrastructure. A key foundation is a structured, maintained ontology-aligned knowledge base that can be queried reliably by the Navigator.
Your role
We are looking for a technically strong colleague who will design, set up and maintain the database and data ingestion pipeline underlying the AI4Access ontology-driven knowledge base, work with the Euro-BioImaging Bio-Hub team and community on the development of relevant ontologies and knowledge graphs, and develop robust technical interfaces between the knowledge base and the Research Navigator together with the Euro-BioImaging Seat team and with the Euro-BioImaging Nodes’ own information and management systems.
Responsibilities include but are not limited to:
- Design and implement an ontology-aligned data model, its technical representation (database + schema, versioning, documentation) and the minimum viable service.
- Work with the Bio-Hub imaging specialists and Node-facing colleagues to translate real service descriptions into a maintainable structured form via community-agreed ontologies and/or controlled vocabularies.
- Support integration of semantic and metadata elements (controlled vocabularies, identifiers/PIDs, provenance) together with domain experts.
- Develop and maintain automated ETL workflows to handle data ingestion pipelines (data collection inputs, annotations, transformations, validation/QA rules, repeatable updates) adapted to a diverse range of data and data sources .
- Implement robust APIs and access layers optimized for LLM applications to query services, capabilities and constraints with high reliability and transparency.
- Collaborate with the Seat team on integration, testing, release planning, and technical documentation.
You have
- We are looking for a motivated, structured and hands-on colleague who enjoys building reliable systems that will be used by a broad scientific community.
- Degree (MSc / PhD or equivalent experience) in a relevant field (data/computer science, bioinformatics, information systems, or similar).
- Familiarity with metadata, ontologies, controlled vocabularies and persistent identifiers.
- Advance Python programming skills and proven experience designing and maintaining production-ready databases for structured content (e.g., Postgres/graph stores) and building APIs for downstream applications.
- Experience with CI/CD data pipelines (ingestion, transformation, validation) and data quality practices including experience with version control frameworks (GitHub/GitLab)
- Ability to work effectively with multiple stakeholders (technical and non-technical) in an international setting.
- Fluency in written and spoken English.
You may also have
- Experience in bioimaging or life sciences data and metadata handling
- Knowledge graph and semantic web experience working with Linked data standards (RDF, SPARQL, SHACL) and/or hybrid retrieval approaches (GraphRAG) used in AI applications.
- Experience with graph databases (Neo4J)
- Experience with research infrastructures, FAIR data practices, or management of service catalogues in scientific environments.
- Hands-on experience with containerization (Docker) and orchestration (Kubernetes).
Contract length: 3 years
Salary: Grade 5-6 depending on relevant experience, monthly salary from 4.031 EUR after tax and before the 13% EMBL social security deductions, plus financial allowances based on family circumstances.