LangChain
Overview
LangChain is seeking Platform and Infrastructure Engineers with deep expertise in Kubernetes, cloud platforms, and modern deployment technologies to build and maintain the infrastructure that powers AI applications in our cloud and customer environments. You will architect and operate critical systems powering customers' AI observability and deployments, working with cutting-edge technologies at the intersection of AI and distributed systems. Responsibilities Design and Scale Infrastructure: Build and maintain scalable, high-throughput infrastructure solutions using Kubernetes, Helm, Docker, and multi-cloud environments (AWS, Azure, GCP) to support flagship SaaS products like LangSmith and LangGraph Platform. Drive Reliability and Performance: Ensure platform reliability, security, and performance through robust monitoring, alerting, automated recovery systems, and proactive maintenance, including performance tuning and database optimization. Contribute to Platform Strategy: Influence infrastructure strategy, tooling, and operational practices as the organization scales from startup to enterprise. Enable Secure, Efficient Operations: Implement security best practices, compliance requirements, and infrastructure cost optimization strategies while architecting for high availability, disaster recovery, and resource efficiency. Develop Automation and CI/CD Pipelines: Build and optimize CI/CD pipelines, infrastructure as code, and deployment automation strategies to streamline application delivery. Support Customer Deployments: Create and maintain deployment solutions and monitoring tools for customer-hosted environments, and collaborate with engineering teams on application rollout and support. Participate in Incident Response: Take part in the on-call rotation with a focus on learning, automation, and continuous improvement of incident response processes. Document and Evolve Best Practices: Maintain comprehensive infrastructure documentation and stay up to date with emerging technologies and best practices in cloud-native systems.
Qualifications
Experience: 3+ years building and operating production systems at scale Programming proficiency: Strong hands-on software engineering skills (Python, Go, Rust) Infrastructure expertise: Deep knowledge of Kubernetes, containerized infrastructure, cloud platforms (AWS, Azure, GCP) Observability mastery: Hands-on experience with observability stacks (Datadog, Prometheus/Grafana, OpenTelemetry or similar) Proficiency in infrastructure as code tools (Terraform, CloudFormation, etc.) Database expertise: Production experience with OSS datastores (PostgreSQL, Redis, Kafka) Experience with CI/CD pipelines and automation tools Strong communication skills for cross-functional collaboration with other engineers and customers
Nice to Have
Proficiency with analytical databases (e.g. ClickHouse) Background in high-growth startups Previous experience in AI/ML infrastructure
Compensation & Benefits
Competitive salary and equity stake for role and stage of company. Commensurate with experience. Annual salary range: $145,000-$195,000 USD for Senior Engineers
Details
Seniority level: Mid-Senior level Employment type: Full-time Job function: Information Technology Industries: Technology, Information and Internet
#J-18808-Ljbffr
LangChain is seeking Platform and Infrastructure Engineers with deep expertise in Kubernetes, cloud platforms, and modern deployment technologies to build and maintain the infrastructure that powers AI applications in our cloud and customer environments. You will architect and operate critical systems powering customers' AI observability and deployments, working with cutting-edge technologies at the intersection of AI and distributed systems. Responsibilities Design and Scale Infrastructure: Build and maintain scalable, high-throughput infrastructure solutions using Kubernetes, Helm, Docker, and multi-cloud environments (AWS, Azure, GCP) to support flagship SaaS products like LangSmith and LangGraph Platform. Drive Reliability and Performance: Ensure platform reliability, security, and performance through robust monitoring, alerting, automated recovery systems, and proactive maintenance, including performance tuning and database optimization. Contribute to Platform Strategy: Influence infrastructure strategy, tooling, and operational practices as the organization scales from startup to enterprise. Enable Secure, Efficient Operations: Implement security best practices, compliance requirements, and infrastructure cost optimization strategies while architecting for high availability, disaster recovery, and resource efficiency. Develop Automation and CI/CD Pipelines: Build and optimize CI/CD pipelines, infrastructure as code, and deployment automation strategies to streamline application delivery. Support Customer Deployments: Create and maintain deployment solutions and monitoring tools for customer-hosted environments, and collaborate with engineering teams on application rollout and support. Participate in Incident Response: Take part in the on-call rotation with a focus on learning, automation, and continuous improvement of incident response processes. Document and Evolve Best Practices: Maintain comprehensive infrastructure documentation and stay up to date with emerging technologies and best practices in cloud-native systems.
Qualifications
Experience: 3+ years building and operating production systems at scale Programming proficiency: Strong hands-on software engineering skills (Python, Go, Rust) Infrastructure expertise: Deep knowledge of Kubernetes, containerized infrastructure, cloud platforms (AWS, Azure, GCP) Observability mastery: Hands-on experience with observability stacks (Datadog, Prometheus/Grafana, OpenTelemetry or similar) Proficiency in infrastructure as code tools (Terraform, CloudFormation, etc.) Database expertise: Production experience with OSS datastores (PostgreSQL, Redis, Kafka) Experience with CI/CD pipelines and automation tools Strong communication skills for cross-functional collaboration with other engineers and customers
Nice to Have
Proficiency with analytical databases (e.g. ClickHouse) Background in high-growth startups Previous experience in AI/ML infrastructure
Compensation & Benefits
Competitive salary and equity stake for role and stage of company. Commensurate with experience. Annual salary range: $145,000-$195,000 USD for Senior Engineers
Details
Seniority level: Mid-Senior level Employment type: Full-time Job function: Information Technology Industries: Technology, Information and Internet
#J-18808-Ljbffr