Logo
WorkBoard

Prompt & Evaluation Engineer

WorkBoard, Redwood City, California, United States, 94061

Save Job

Join to apply for the

Prompt & Evaluation Engineer

role at

WorkBoard WorkBoard's Strategy Execution Platform powers the digital operating rhythm for companies around the globe, providing clarity, alignment, and insights for growth. Enterprises like AstraZeneca, Ford, 3M, and Intel rely on our platform, playbook, and expertise to accelerate results by aligning OKRs, simplifying business reviews, and leveraging analytics – all with embedded AI. We're expanding our agentic AI capabilities and are looking for a

Prompt & Evaluation Engineer

to help design, optimize, evaluate, and scale the interaction layer between our AI models and enterprise users. The Opportunity As a Prompt & Evaluation Engineer, you'll specialize in crafting, testing, and evaluating prompts and structured workflows that make large language models (LLMs) reliable, accurate, and contextually aware. You'll collaborate with product, engineering, and AI research teams to turn business requirements into effective, repeatable agentic behaviors. If you thrive at the intersection of language, systems, and design — and you want to shape how enterprise users interact with AI agents — this role is for you. What You'll Do Design, test, and refine prompts for LLMs across a variety of enterprise use cases. Develop structured multi-step reasoning flows and evaluation harnesses for reliability. Integrate tool-calling strategies, safety checks, and contextual augmentation. Evaluation & Reliability Build and maintain eval frameworks to measure prompt effectiveness, accuracy, and safety. Define success metrics and benchmarks for tool-augmented LLM workflows. Run regression suites to detect degradation across model updates. Partner with research and product to scale continuous evaluation and feedback loops. Tool & Framework Integration Work with frameworks like LangChain, LangGraph, or similar to implement stateful agents. Collaborate with engineers to align prompts with backend APIs and MCP-based toolchains. Ensure tool discovery and calling is consistent, robust, and observable in production. Collaboration & Delivery Partner with Product, UX, and Engineering to translate workflows into AI behaviors. Deliver at least one end-to-end agent conversation (e.g., get_objective) as a template for scaling. Contribute to experimentation frameworks, logging, and evaluation dashboards. Knowledge Sharing Define best practices and guidelines for enterprise prompt engineering and evals. Mentor engineers and PMs in prompt design and evaluation techniques. What You Bring 3–5 years in applied AI, NLP, or related software engineering roles. Strong Python skills for building eval pipelines, data preprocessing, and experimentation. Solid data experience: analyzing logs, designing benchmarks, and leveraging datasets to improve prompt quality. Experience designing and iterating prompts for LLMs in production environments. Hands-on experience with LLM evaluation frameworks (e.g., LangSmith, custom eval harnesses). Familiarity with frameworks like LangChain, LangGraph, or semantic caching methods. Plus: knowledge of MCP toolchains and API-to-agent integration. Strong communication skills and the ability to collaborate across disciplines. A builder's mindset: creative, iterative, and outcome-driven. Within Your First 6 Months 1 Month: Ramp up on our agentic frameworks and learn existing prompt + eval libraries. 3 Months: Own and deliver new prompt-driven workflows with evaluation benchmarks. 6 Months: Establish yourself as a go-to expert in prompt engineering & evals, and publish internal best practices for scaling. We are a no-ego bunch, and super excited to build an awesome team in a category-creating company together! OUR VALUES - WE LIVE BY THE 4 Hs Humble experts Hungry for the opportunity Intellectually honest Operating as one happy team A FEW OF OUR AWESOME BENEFITS Discretionary Time off & sick days Paid holidays Health insurance 401K with employer matching Quarterly All-Hands Meetings And much more! We are proud to be an equal opportunity workplace committed to building a team culture that celebrates learning, diversity and inclusion. If you're hungry to grow your skills while growing a company, your sense of urgency matches the size of our market opportunity, and you value and enable team mates' contributions, then come join us! The US base salary range for this full-time position is $140K annual salary. Our salary ranges are determined by role, level and geographic location. Within the range, individual pay is determined by additional factors, including job-related skills, experience, and relevant education or training.

#J-18808-Ljbffr