Cisco

Machine Learning Engineer

Cisco, Fremont, California, us, 94537

Meet the Team Splunk, a Cisco company, is building a safer, more resilient digital world with an end‑to‑end, full‑stack platform designed for hybrid, multi‑cloud environments.

The Splunk AI Platform and Services team provides the core runtime and developer experience that power AI across Splunk and Cisco. We manage large‑scale, multi‑tenant LLM inference across major cloud providers and build platform services to support these workloads. We also provide VectorDB/RAG services and MCP services that make AI workloads secure, observable, and cost‑efficient for product teams.

On top of this foundation, we deliver agentic frameworks, SDKs, tools, and evaluation/guardrail capabilities that help teams quickly build reliable GenAI assistants and automation features. You’ll join a group that sits at the intersection of distributed systems, ML, and developer experience, grounded in operational excellence and a culture of impact‑driven, cross‑functional collaboration.

Your Impact

Implement features for GenAI services and APIs that power chat assistants, and automation workflows across Splunk products.

Help build and maintain RAG pipelines: retrieval orchestration, hybrid search, chunking & embeddings, and grounding with logs/events/metrics.

Contribute to agentic and multi‑agent workflows using frameworks like LangChain or LangGraph, integrating with MCP tools, internal APIs, and external systems.

Develop and refine developer‑facing SDKs, templates, and reference apps (primarily Python/TypeScript) that make it simple for other teams to compose tools, chains, and agents on top of it.

Integrate with LangSmith or similar eval stacks to instrument prompts, capture traces, and run evaluations under the guidance of more senior engineers and scientists.

Collaborate with product managers and UX to turn user stories into GenAI experiences, iterate based on feedback, and ship features that move customer and business metrics.

Apply and advocate responsible AI practices in day‑to‑day work: grounding, guardrails, access controls, and human‑in‑the‑loop flows.

Minimum Qualifications

Bachelor's degree in computer science, Engineering, or equivalent practical experience.

5+ years of hands‑on experience building and operating backend or distributed systems in production or 2+ years of experience with a Master’s degree.

Proficiency in at least one modern programming language (e.g., Python, TypeScript/JavaScript, Go, or Java) and solid software design/debugging skills.

Some hands‑on experience with LLM APIs and ecosystems (e.g., OpenAI, Claude, Bedrock, or OSS models such as Llama) and related production features.

Familiarity with web APIs and microservices (REST/gRPC), including testing, deployment, and basic observability (logs/metrics).

Demonstrated ability to work end‑to‑end on features: collaborate on design, implement, write tests, help deploy, and iterate based on metrics or feedback.

Preferred Qualifications

Experience or strong interest in RAG systems and vector databases (Weaviate, Qdrant, Milvus, FAISS, etc.).

Exposure to agentic frameworks (LangChain, LangGraph, LlamaIndex, Semantic Kernel, or similar) and tool/agent orchestration patterns.

Familiarity with LangSmith or similar evaluation platforms, or experience instrumenting prompts/pipelines for quality and debugging.

Background contributing to platform or developer experience capabilities: internal libraries, SDKs, templates, or shared components that other engineers use.

Experience with full‑stack development for GenAI interfaces (React/TypeScript), including prompt UX or conversation flows, is a plus.

Understanding basic AI safety and governance concepts (guardrails, data privacy, RBAC) and how they apply in an enterprise environment.

Strong communication skills and a growth mindset, comfortable asking questions, giving/receiving feedback, and learning from more senior teammates.

#J-18808-Ljbffr