Biohub

Software Engineer

Biohub, Redwood City, California, United States, 94061

Join to apply for the

Software Engineer

role at

Biohub .

Biohub is leading a new era of AI‑powered biology to cure or prevent disease through its 501(c)(3) medical research organization, supported by the Chan Zuckerberg Initiative.

The Team Biohub supports science and technology that will help scientists cure, prevent, or manage all diseases by the end of this century. While ambitious, biomedical science has made tremendous strides in the last 100 years to understand biological systems, advance human health, and treat disease.

Grand Challenges

Building an AI‑based virtual cell model to predict and understand cellular behavior

Developing state‑of‑the‑art imaging systems to observe living cells in action

Instrumenting tissues to better understand inflammation, a key driver of many diseases

Engineering and harnessing the immune system for early detection, prevention, and treatment of disease

The Opportunity The Data Services team manages and processes scientific datasets designed to enable biological modeling. It handles over 89 million unique cells of single‑cell transcriptomic data, 15,000 cryoET tomograms, and large imaging datasets. Our resources provide structured open‑source data used by tens of thousands of scientists each month to query hypotheses on how genetic variants impact disease risk, define drug toxicities, and discover better therapies.

As a software engineer on the Data Engineering team, you will implement data needs for our platforms—CELLxGENE Discover, CryoET, and a new AI‑focused platform—so scientists can interrogate our large and growing corpus without downloading data or requiring computational expertise. You will collaborate on multidisciplinary teams to accelerate workflows and scientific discovery.

What You’ll Do

Design, build, and maintain robust, scalable data pipelines for ingesting, processing, and storing large volumes of structured and unstructured data.

Develop and optimize ETL processes, ensuring data quality, validation, and consistency across diverse sources.

Implement and manage data storage solutions, including data warehouses, data lakes, and distributed databases, ensuring secure and performant handling of massive single‑cell transcriptomics and imaging data.

Monitor and troubleshoot data pipelines, building proactive exception handling and ensuring high reliability and uptime of production systems.

Document processes, maintain data models, and support data governance, lineage, and compliance initiatives.

Utilize modern tools and technologies such as Argo Workflows, Kubernetes, AWS, Docker, and CI/CD pipelines.

Actively contribute to team problem‑solving, project planning, and process improvements with a mindset for innovation and social impact.

Create user‑friendly APIs to enable researchers and scientists to access and explore curated data.

Develop scalable, maintainable, and testable software systems and participate in team conversations and efforts on engineering excellence.

Collaborate with data scientists, computational biologists, researchers, analysts, and other engineers to understand data requirements and deliver practical solutions that drive analytics, research, and AI/ML applications.

Opportunity to learn about scientific data and technologies, no prior experience required.

What You’ll Bring

2+ years of experience as a Software Engineer building data pipelines.

Proficiency in programming languages (Python, Java) and SQL.

Experience with big data and AWS (EC2, S3, EKS, IAM, SQS, etc.), Docker, and Argo Workflows.

Strong data modeling, database design, and data integration skills, including ETL and pipeline orchestration tools.

Strong fundamentals in systems design, data structures, algorithms, and object‑oriented programming principles.

Familiarity with CI/CD, data governance, and observability/monitoring tools.

Excellent communication, teamwork, and analytical problem‑solving abilities.

Passion for the CZI mission, innovation, and an open, collaborative culture.

Computer Science Engineering degree.

Strong problem solving and analytical skills.

Excellent written and verbal communication skills.

Enthusiasm to ramp up on technologies and learn a new science domain.

Self‑driven and comfortable supporting data needs of multiple systems and products.

Nice to Have

Experience working with biology, imaging or sequencing data.

Experience with data formats related to biodata and solving challenges with that data.

Experience building AI agents related to data movement or ETL.

Compensation The Redwood City, CA base pay range for a new hire in this role is $153,000–$230,000. New hires are typically hired into the lower portion of the range, enabling employee growth over time. Actual placement is based on job‑related skills and experience, as evaluated throughout the interview process.

Work Mode This is a hybrid position requiring onsite presence at least 60% of the working month (~3 days a week), with specific in‑office days determined by the team manager. The exact schedule will be at the hiring manager’s discretion and communicated during the interview process.

Benefits For The Whole You

Generous 401(k) employer match to support future planning.

Paid time off to volunteer at an organization of your choice.

Funding for select family‑forming benefits.

Relocation support for employees who need assistance moving.

We still encourage you to apply if your previous experience doesn’t perfectly align with each qualification; you may be the perfect fit for this or another role.

#J-18808-Ljbffr