Logo
Fieldguide

Staff Machine Learning Engineer

Fieldguide, San Francisco, California, United States, 94102

Save Job

Staff Machine Learning Engineer

Fieldguide is establishing a new state of trust for global commerce and capital markets through automating and streamlining the work of assurance and audit practitioners specifically within cybersecurity, privacy, and ESG (Environmental, Social, Governance). Put simply, we build software for the people who enable trust between businesses. We're based in San Francisco, CA, but built as a remote-first company that enables you to do your best work from anywhere. We're backed by top investors including Bessemer Venture Partners, 8VC, Floodgate, Y Combinator, DNX Ventures, Global Founders Capital, Justin Kan, Elad Gil, and more. We value diversity

in backgrounds and in experiences. We need people from all backgrounds and walks of life to help build the future of audit and advisory. Fieldguide's team is inclusive, driven, humble and supportive. We are deliberate and self-reflective about the kind of team and culture that we are building, seeking teammates that are not only strong in their own aptitudes but care deeply about supporting each other's growth. As an early stage start-up employee, you'll have the opportunity to build out the future of business trust. We make audit practitioners' lives easier by eliminating up to 50% of their work and giving them better work-life balance. If you share our values and enthusiasm for building a great culture and product, you will find a home at Fieldguide. About the Role

As a Staff Machine Learning Engineer at Fieldguide, you will lead the development of next-generation AI-driven features on our platform, transforming the audit and advisory industry through cutting-edge generative AI solutions. You'll focus on applying advanced Machine Learning (ML) and Large Language Models (LLMs) to solve complex problems for our customers, while guiding the technical direction of our ML team in a high-growth startup environment. This role is both strategic and hands-on

you will set best practices for our generative AI efforts and also dive into coding and architecture as needed to drive critical projects from concept to production. In this role, you will be the go-to expert for generative AI at Fieldguide. You'll establish standards for prompt engineering, context management, and model evaluation, ensuring our use of LLMs is effective, safe, and scalable. As a Staff MLE, you will also act as a multiplier for the entire engineering team: reviewing architectures for AI features, mentoring other engineers, and fostering a culture of excellence in ML. You'll collaborate closely with cross-functional stakeholders

from product managers and designers to even high-profile clients

to translate business needs into technical solutions and to communicate how our AI-driven approach creates value. This is a unique opportunity to shape the future of Fieldguide's AI capabilities and establish yourself as a technical leader in the burgeoning field of generative AI. What You'll Do

Architect Generative AI Solutions: Design and oversee the architecture of systems that leverage LLMs and retrieval-augmented generation (RAG) techniques. You will make key decisions on how we integrate LLMs with our existing platform and data stores, including building agent-based frameworks where LLMs interact with tools and knowledge bases (e.g. creating AI "co-pilots" for auditors). You'll conduct rigorous architectural reviews and ensure our designs meet high standards for scalability, security, and reliability.

Establish Prompt Engineering Best Practices: Develop and codify best practices for prompt engineering and context management in our AI applications. You will guide the team in crafting effective prompts, choosing model parameters, and managing conversation context to optimize LLM performance. This includes building internal libraries or templates for prompts and educating engineers on how to avoid common failure modes. By setting this technical quality bar, you'll ensure consistency and excellence in how we build GenAI features.

Develop Evaluation Frameworks: Create and implement frameworks to evaluate generative AI outputs for quality, accuracy, bias, and safety. You will define ML performance metrics specific to generative models (e.g. factual correctness rates, relevance scores, user feedback loops) and possibly leverage tools or develop custom evaluators (such as automated prompts or human-in-the-loop reviews). These evaluation strategies will inform model improvements and help establish standards for GenAI system evaluation across the company.

Lead High-Impact ML Projects: Take ownership of our most critical AI projects from ideation to production. You will collaborate with stakeholders to identify high-impact opportunities where AI can solve business problems, then roadmap solutions and drive their execution. This could range from developing an NLP feature that auto-identifies risks in audit documents, to launching a new GPT-based analytics module for our platform. You'll coordinate across product and engineering teams to deliver these initiatives and clearly communicate their results and business impact.

Technical Leadership & Mentorship: Serve as a technical leader and mentor within the engineering org. You will guide more junior ML Engineers through code reviews, design discussions, and one-on-one mentorship, helping them level up their skills. You might lead an internal "ML Guild" or chapter, hosting knowledge-sharing sessions on topics like prompt tuning or vector databases. By instilling best practices and providing hands-on guidance, you'll raise the technical proficiency of the entire team.

Cross-Team and External Collaboration: Work closely with cross-functional teams and occasionally directly with customers to ensure our AI solutions meet real-world needs. You'll act as an expert liaison for high-profile or demanding clients when deep technical expertise is required to shape requirements or explain AI results. In these settings, you should be comfortable being "the most technical person in the room" and able to communicate complex ML concepts in a clear, business-aligned manner. Your ability to earn trust and align expectations with both internal and external stakeholders will be key.

Innovation and Thought Leadership: Stay at the forefront of ML and GenAI advancements, and bring new ideas into Fieldguide. You'll continuously research emerging techniques in LLMs, from fine-tuning methods to new open-source models, and assess how we can leverage them. You may also contribute to the broader tech community through publications, blog posts, or speaking at conferences, representing Fieldguide's technical work externally. While not required, we highly value this kind of thought leadership as it reinforces our credibility in the AI space.

ML Ops & Future Model Development: In addition to immediate project work, you will help shape our longer-term ML infrastructure. This includes guiding how we productionize models (monitoring, CI/CD for ML, data pipelines) and preparing for future needs such as custom model training. As we evolve to possibly fine-tune or train domain-specific models, you'll provide direction on the initial pipeline setup and best practices. Essentially, you'll make sure our ML systems and team processes scale effectively as usage grows (data flywheels, feedback loops, etc., as per our product growth).

Who You Are

8+ years of experience in applied machine learning, software engineering, or related fields, with around 3+ years in technical leadership roles (such as leading project teams or architecting major systems). You have a track record of delivering significant ML projects that drove business or industry impact.

Deep expertise in Generative AI: Extensive experience working with generative AI technologies. You have hands-on knowledge of modern LLMs (GPT-style models, etc.), including deploying them in production and optimizing their performance. Experience with other NLP techniques is a plus.

Prompt Engineering & LLM Evaluation: Strong familiarity with prompt engineering concepts and strategies for large language models. You understand how to craft and refine prompts to achieve desired outcomes, and how to manage context and memory in LLM applications. Additionally, you have experience evaluating AI models

whether through quantitative metrics, user studies, or tools

and using those evaluations to iterate on solutions.

Strong ML Engineering and Coding Skills: Fluency in Python and the ML/PyData ecosystem (NumPy, pandas, scikit-learn, TensorFlow/PyTorch, etc.). You write clean, efficient code and are experienced in building and maintaining data pipelines and ETL processes for ML. You are comfortable with version control (Git) and CI/CD pipelines for deploying ML models.

Architectural Design & Systems Thinking: Demonstrated ability to design complex software systems. You've worked with cloud-based architectures (e.g. using AWS or similar) and understand how to integrate ML components into larger products. Experience with RAG architectures (combining LLMs with vector databases or search indices) and building agent-based systems is highly valuable. You approach design with scalability, maintainability, and security in mind, and you're adept at conducting technical reviews and providing guidance to ensure high-quality delivery.

Leadership & Mentorship: Proven experience mentoring engineers or leading teams. You elevate those around you

for example