Stealth Startup
Principal Engineer – AI/ML Infrastructure & Generative Systems
Stealth Startup, Cambridge, Massachusetts, us, 02140
We are a team out of MIT incubated by UM6P Foundry reinventing how organizations capture and leverage their institutional knowledge. Our platform transforms fragmented information into a trusted resource that powers faster decisions and long‑term innovation. We are now hiring an experienced engineer to lead the build‑out of our AI/ML infrastructure and generative systems. This is a hands‑on role at the cutting edge of LLM deployment, GPU optimization, and retrieval‑augmented generation (RAG). You’ll own core components of the platform and collaborate directly with the founding team to shape the technical roadmap.
In this role you will
Design, build, and deploy retrieval‑augmented generation (RAG) pipelines using LLMs and vector databases. Develop secure backend APIs for data ingestion, indexing, and semantic search across enterprise systems (e.g., SharePoint, Teams, SQL). Manage GPU‑based inference environments optimized for scalability, latency, and cost. Implement MLOps best practices for training, fine‑tuning, evaluation, and deployment of generative AI models. Collaborate with founders on architecture and build‑vs‑buy decisions to accelerate roadmap. Own the full lifecycle from prototype → MVP → production, ensuring security, compliance, and enterprise readiness. Support prototyping of lightweight front‑end interfaces to showcase platform capabilities. This role is onsite and based in Cambridge, MA Required Qualifications
5+ years of experience in ML infrastructure, backend engineering, or AI platform development. Experience deploying LLMs and generative AI models in production, with fluency across multiple frameworks such as PyTorch, TensorFlow, Hugging Face, and Azure OpenAI. Hands‑on expertise in LLM post‑training, alignment, fine‑tuning, and deployment. Strong backend development skills in Python (FastAPI, Flask, or Django) and REST/GraphQL APIs. Hands‑on experience with GPU inference and performance tuning. Familiarity with vector databases (Pinecone, Weaviate, Milvus, or FAISS) and semantic search. Comfort working in an early‑stage startup environment and delivering under ambiguity. Preferred Qualifications
Master’s or PhD in Computer Science, ML, or related field. Experience fine‑tuning and aligning LLMs (RLHF, LoRA, adapters, prompt tuning). Experience with knowledge graphs, enterprise knowledge management, or large‑scale search systems. Familiarity with LLM orchestration frameworks (LangChain, LlamaIndex) or MCP protocol. Prior experience as a founding/early engineer at a startup. Why Join Us?
Be part of an innovative startup at the intersection of AI and enterprise solutions. Work in a collaborative, fast‑paced, rewarding and dynamic environment. Directly shape the future of the company and its products. Competitive salary, bonus, benefits package, and strong opportunities for leadership growth. Continuous learning and career growth opportunities.
#J-18808-Ljbffr
Design, build, and deploy retrieval‑augmented generation (RAG) pipelines using LLMs and vector databases. Develop secure backend APIs for data ingestion, indexing, and semantic search across enterprise systems (e.g., SharePoint, Teams, SQL). Manage GPU‑based inference environments optimized for scalability, latency, and cost. Implement MLOps best practices for training, fine‑tuning, evaluation, and deployment of generative AI models. Collaborate with founders on architecture and build‑vs‑buy decisions to accelerate roadmap. Own the full lifecycle from prototype → MVP → production, ensuring security, compliance, and enterprise readiness. Support prototyping of lightweight front‑end interfaces to showcase platform capabilities. This role is onsite and based in Cambridge, MA Required Qualifications
5+ years of experience in ML infrastructure, backend engineering, or AI platform development. Experience deploying LLMs and generative AI models in production, with fluency across multiple frameworks such as PyTorch, TensorFlow, Hugging Face, and Azure OpenAI. Hands‑on expertise in LLM post‑training, alignment, fine‑tuning, and deployment. Strong backend development skills in Python (FastAPI, Flask, or Django) and REST/GraphQL APIs. Hands‑on experience with GPU inference and performance tuning. Familiarity with vector databases (Pinecone, Weaviate, Milvus, or FAISS) and semantic search. Comfort working in an early‑stage startup environment and delivering under ambiguity. Preferred Qualifications
Master’s or PhD in Computer Science, ML, or related field. Experience fine‑tuning and aligning LLMs (RLHF, LoRA, adapters, prompt tuning). Experience with knowledge graphs, enterprise knowledge management, or large‑scale search systems. Familiarity with LLM orchestration frameworks (LangChain, LlamaIndex) or MCP protocol. Prior experience as a founding/early engineer at a startup. Why Join Us?
Be part of an innovative startup at the intersection of AI and enterprise solutions. Work in a collaborative, fast‑paced, rewarding and dynamic environment. Directly shape the future of the company and its products. Competitive salary, bonus, benefits package, and strong opportunities for leadership growth. Continuous learning and career growth opportunities.
#J-18808-Ljbffr