Logo
GEICO

Senior Staff Software Engineer, AI Agent Platform (Remote)

GEICO, Seattle, Washington, us, 98127

Save Job

Overview Senior Staff Software Engineer, AI Agent Platform (Remote) – GEICO. Location: Remote. Base pay range: $115,000.00/yr - $300,000.00/yr.

This range is provided by GEICO. Your actual pay will be based on your skills and experience — talk with your recruiter to learn more.

At GEICO, we offer a rewarding career where your ambitions are met with endless possibilities.

Every day we honor our iconic brand by offering quality coverage to millions of customers and being there when they need us most. We thrive through relentless innovation to exceed our customers’ expectations while making a real impact for our company through our shared purpose.

When you join our company, we want you to feel valued, supported and proud to work here. That’s why we offer The GEICO Pledge: Great Company, Great Culture, Great Rewards and Great Careers.

The GEICO AI Agent Platform team is seeking an exceptional Senior Staff Software Engineer to build the next generation enterprise Agent OS and SDKs.

This role combines deep technical expertise in platform engineering, application design and agentic workflows with strong leadership and mentoring capabilities. You will be responsible for designing, implementing, and maintaining scalable, reliable frontend and backend systems that enable our business, product and engineering teams to build, test and deploy their AI agents and workflows. The candidate must have excellent communication skills and a proven track record of delivering business value via technical excellence.

Key Responsibilities

Platform Engineering: Architect and implement scalable multi-tenant backend systems for building AI agent workflows, including agent configuration, offline evaluation, synthetic data generation, workflow simulation, and agent marketplace, using Azure Kubernetes Service (AKS), FastAPI, etc., to ensure economy of scale and cost control.

Frontend collaboration: Collaborate with the Design team to architect and implement frontend experiences and workflows for onboarding both technical and non-technical stakeholders, maximizing user adoption and successful AI agent development.

Observability and reliability: Develop observability frameworks to ensure 99.9%+ uptime for AI agent platforms through robust monitoring, alerting, and incident response procedures.

GenAI framework integration: Evaluate and, if desirable, integrate cutting-edge GenAI frameworks, libraries and vendors to maintain a state-of-the-art technology stack, including hybrid cloud solutions with AWS/GCP as backup or for specialized use cases.

ML platform: Architect and implement scalable, high-performance machine learning platforms and systems capable of processing large data volumes and supporting real-time decision making and workflows.

DevOps / Application Lifecycle Management

Oversee the end-to-end lifecycle of AI agent applications, ensuring robust testing, deployment, and ongoing monitoring. Ensure adherence to production readiness standards, security protocols, and regulatory compliance throughout the development lifecycle. Continuously optimize platform performance, reducing latency and improving throughput for AI agent workloads. Design and implement backup, recovery, and business continuity plans for hosted platform applications and services. Design and maintain robust CI/CD pipelines for ML model deployment using Azure DevOps, GitHub Actions, and MLOps tools.

Technical Leadership

Act as the tech lead across multiple sub-teams/initiatives, setting technical direction and ensuring consistency in design principles and best practices. Provide hands-on mentorship and guidance during design reviews, code assessments, and performance tuning. Lead by example in tackling complex technical challenges and driving system-wide architectural improvements. Establish and champion engineering standards for ML infrastructure, deployment practices, and operational procedures. Create technical documentation, runbooks, and deliver internal training sessions on platform capabilities.

Cross-Functional Collaboration

Work closely with data scientists, software engineers, and product teams to seamlessly deploy ML systems into production environments. Translate complex technical concepts into actionable insights for both technical and non-technical stakeholders. Foster a collaborative environment that encourages innovation and sharing of best practices across teams. Present technical solutions and platform roadmaps to leadership and cross-functional stakeholders.

Qualifications

Educational background: Bachelor’s degree in computer science, engineering, mathematics, or related field; an advanced degree (master’s or Ph.D.) is highly desirable.

10+ years of hands-on experience designing, implementing, and maintaining multi-tenant AI/ML systems in production environments.

10+ years of experience with cloud platforms such as Azure and AWS.

Extensive expertise in designing and deploying large-scale data pipelines and real-time inference systems and managing the end-to-end AI agent and/or AI/ML system development lifecycles, including configuration, evaluation, monitoring, observability and AuthN/AuthR considerations.

8+ years of experience with common backend systems and tools (Kubernetes, Temporal, OpenSearch, PostgreSQL, Redis, Neo4J, etc.). Deep understanding of Docker, container optimization, and multi-stage builds. Experience with Prometheus, Grafana, OpenTelemetry and distributed tracing.

4+ years of experience building front-end web applications using frameworks such as React and/or Next.js.

Proficiency in programming languages such as Python, Java, Go, etc., with a strong emphasis on coding excellence. Extra credit for effectively utilizing AI coding tools.

Proficiency in AI/ML frameworks such as TensorFlow, PyTorch, Langraph, etc.

Leadership Skills: Demonstrated track record of mentoring engineers and leading technical initiatives. Excellent verbal and written communication across diverse seniority levels.

Preferred specialized skills: Understanding of AI safety principles, model governance, regulatory compliance; background in regulated industries with data privacy requirements and cybersecurity review processes; deep experience operating and/or building AI agent platforms and capabilities like Langsmith/Langraph, Autogen, N8N, Crew.ai, Dify.ai; experience building LLM-based AI agent workflows via no-code/low-code and traditional high-code development environments; experience with both open-source LLMs (e.g., llama, Qwen, Mistral) and proprietary LLMs (GPT, Claude).

#J-18808-Ljbffr