- Company Name
- Arc Institute
- Job Title
- Infrastructure Engineer
- Job Description
-
**Job Title**
Infrastructure Engineer
**Role Summary**
Design, implement, and maintain a hybrid cloud infrastructure platform for computational biology. Build scalable data pipelines, database systems, and data discovery tools to support AI and bioinformatics projects.
**Expectations**
- Deliver high‑availability compute, networking, and storage solutions across public, private, and on‑premise environments.
- Enable researchers to access and analyze large scientific datasets efficiently.
- Collaborate cross‑functionally to translate scientific requirements into technical architecture.
- Continuously optimize performance, reliability, and cost of infrastructure.
**Key Responsibilities**
- Architect and deploy scalable data pipelines for single‑cell genomics and other experimental datasets using Nextflow, Prefect, or GCP Cloud Workflows.
- Build ExperimentDB from inception: design schema, implement PostgreSQL storage, and expose REST/GraphQL APIs.
- Develop catalog, metadata, and governance systems to support data discovery and access.
- Automate bioinformatics workflows, reducing manual steps through orchestration and agentic design.
- Optimize query performance and troubleshoot distributed data systems.
- Establish best practices for data quality, versioning, documentation, and reproducibility.
- Partner with scientists and engineers to understand data requirements and deliver tailored solutions.
**Required Skills**
- Experience with workflow orchestration platforms (Nextflow, Prefect, Airflow, or similar).
- Proficiency in Python and SQL for ETL, data transformation, and API development.
- Strong background with distributed data technologies: analytics warehouses, relational databases (PostgreSQL), object storage, parallel/distributed file systems.
- Database design and optimization for large, scientific workloads.
- Familiarity with bioinformatics formats (FASTQ, BAM, Cell Ranger, single‑cell analysis) is highly desirable.
- Excellent troubleshooting, problem‑solving, and communication skills.
- Ability to work in a hybrid onsite arrangement.
**Required Education & Certifications**
- Bachelor’s degree in Computer Science, Data Engineering, Bioinformatics, or a related field.
- (No specific certifications required.)