IFS
Staff Platform Engineer Agentic AI Systems IFS The Loops
IFS, San Francisco, California, United States, 94199
Job Description
We’re seeking a
Staff Platform Engineer
to help shape the future of
agentic AI systems . In this role you will help design the backbone of our real-time, distributed systems. You’ll be at the forefront of building systems that orchestrate massive data flows, reactive services, and agentic workloads—systems that must adapt dynamically and operate reliably under heavy and unpredictable load. You’ll work with tools like
Kafka
,
Akka
,
stream processing frameworks
, and other core distributed technologies, and collaborate across engineering teams to deliver infrastructure that is elastic, fault-tolerant, and observable by design. If you’re passionate about high-performance computing, resilient architecture, and enabling real-time intelligence at scale, this role is for you. Responsibilities Design and implement scalable, distributed platform components with technologies like
Kafka
,
Akka (Typed)
,
gRPC . Architect and optimize data pipelines capable of handling
billions of messages/events per day
with low latency and high reliability. Lead efforts in
agentic scaling
– dynamically spawning, routing, and managing autonomous agents (services/functions) in response to workload or demand. Build resilient systems that
self-heal, auto-scale
, and degrade gracefully under pressure. Define and implement metrics, tracing, and observability for end-to-end system behavior and performance. Collaborate closely with infrastructure, SRE, and product teams to ensure platform scalability aligns with growth and reliability goals. Drive root-cause analysis of performance bottlenecks and propose long-term architectural improvements. Participate in on-call rotations, architecture reviews, and deep technical design sessions. Qualifications: Qualifications 5+ years of experience building
distributed systems
in a high-throughput production environment. Deep expertise with
Kafka
(topics, partitions, consumers, tuning, schema registry, stream processing). Strong experience with
Akka
or other actor-based concurrency models; familiarity with Akka Cluster, Sharding, Persistence, or Typed API. Solid programming skills in
Java
. Understanding of
agentic workloads
and dynamic system orchestration (e.g., microservices that represent intelligent agents). Experience designing
scalable APIs
, message protocols (e.g., Protobuf, Avro), and event-driven architectures. Familiarity with
cloud-native environments
(e.g., Kubernetes, service mesh, container orchestration). Preferred
Qualifications Experience with
serverless compute models
or
function-as-a-service
scaling paradigms. Contributions to open-source projects in the distributed systems ecosystem. Experience with
AI or ML-driven orchestration
or
agentic frameworks . Familiarity with
operational tooling
: Prometheus, Grafana, OpenTelemetry, Kafka monitoring tools, etc. Additional Information What We’re Offering Salary Range: $175,000-200,000 total comp Flexible paid time off, including sick and holiday Medical, dental, & vision insurance 401K with Company contribution Flexible spending accounts Life insurance and disability benefits Tuition assistance Community involvement and volunteering events M/F/Disabled/Vet VEVRAA Federal Contractor. We are a Drug-Free Workplace. Interested candidates should apply at: www.ifs.com/about/careers-at-ifs All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, or status as a protected veteran. VEVRAA Federal Contractor, Equal Opportunity Employer
#J-18808-Ljbffr
Staff Platform Engineer
to help shape the future of
agentic AI systems . In this role you will help design the backbone of our real-time, distributed systems. You’ll be at the forefront of building systems that orchestrate massive data flows, reactive services, and agentic workloads—systems that must adapt dynamically and operate reliably under heavy and unpredictable load. You’ll work with tools like
Kafka
,
Akka
,
stream processing frameworks
, and other core distributed technologies, and collaborate across engineering teams to deliver infrastructure that is elastic, fault-tolerant, and observable by design. If you’re passionate about high-performance computing, resilient architecture, and enabling real-time intelligence at scale, this role is for you. Responsibilities Design and implement scalable, distributed platform components with technologies like
Kafka
,
Akka (Typed)
,
gRPC . Architect and optimize data pipelines capable of handling
billions of messages/events per day
with low latency and high reliability. Lead efforts in
agentic scaling
– dynamically spawning, routing, and managing autonomous agents (services/functions) in response to workload or demand. Build resilient systems that
self-heal, auto-scale
, and degrade gracefully under pressure. Define and implement metrics, tracing, and observability for end-to-end system behavior and performance. Collaborate closely with infrastructure, SRE, and product teams to ensure platform scalability aligns with growth and reliability goals. Drive root-cause analysis of performance bottlenecks and propose long-term architectural improvements. Participate in on-call rotations, architecture reviews, and deep technical design sessions. Qualifications: Qualifications 5+ years of experience building
distributed systems
in a high-throughput production environment. Deep expertise with
Kafka
(topics, partitions, consumers, tuning, schema registry, stream processing). Strong experience with
Akka
or other actor-based concurrency models; familiarity with Akka Cluster, Sharding, Persistence, or Typed API. Solid programming skills in
Java
. Understanding of
agentic workloads
and dynamic system orchestration (e.g., microservices that represent intelligent agents). Experience designing
scalable APIs
, message protocols (e.g., Protobuf, Avro), and event-driven architectures. Familiarity with
cloud-native environments
(e.g., Kubernetes, service mesh, container orchestration). Preferred
Qualifications Experience with
serverless compute models
or
function-as-a-service
scaling paradigms. Contributions to open-source projects in the distributed systems ecosystem. Experience with
AI or ML-driven orchestration
or
agentic frameworks . Familiarity with
operational tooling
: Prometheus, Grafana, OpenTelemetry, Kafka monitoring tools, etc. Additional Information What We’re Offering Salary Range: $175,000-200,000 total comp Flexible paid time off, including sick and holiday Medical, dental, & vision insurance 401K with Company contribution Flexible spending accounts Life insurance and disability benefits Tuition assistance Community involvement and volunteering events M/F/Disabled/Vet VEVRAA Federal Contractor. We are a Drug-Free Workplace. Interested candidates should apply at: www.ifs.com/about/careers-at-ifs All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, or status as a protected veteran. VEVRAA Federal Contractor, Equal Opportunity Employer
#J-18808-Ljbffr