Diverse Lynx

Senior AI-ML LLM Quality Engineer

Diverse Lynx, Sunnyvale, California, United States, 94087

Must Have Skills: 1. Strong experience in Python scripting, REST-APIs, YAML 2. Good hands-on testing experience with Gen AI / ML products / evaluating LLMs within a large-scale enterprise environment 3. Experience with LLM Testing Tools - LangSmith, Promptfoo 4. Strong understanding of LLM behavior 5. Proficiency with PyTest, Selenium or similar frameworks 6. Strong experience with testing automation - be able to guide customers on relevant technology and automation strategy. Nice To Have Skills: 1. Experience with Testing Frameworks 2. Experience testing RAG, LLM agent systems 3. Familiarity with LangChain, LlamaIndex, or Haystack 4. Knowledge of AI/ML model evaluation metrics 5. Experience with RED Teaming is a plus but not mandatory 6. Familiarity with AWS cloud platform and MLOps tooling (e.g., MLfloW etc.) Technical/Functional Skills Key Responsibilities • Support testing and validation of Large Language Model (LLM)-powered applications. • Help implement test strategies, execute evaluation workflows, and assist in model performance validation across diverse generative AI use cases. • Design and execute test cases for Gen AI / ML features and user workflows • Develop automated test frameworks to evaluate LLM outputs for accuracy, bias and safety • Perform manual and automated test execution on APIs and LLM-integrated user interfaces. • Conduct end-to-end testing of integrated generative AI solutions, including APIs, data pipelines, and user interfaces • Collaborate with ML engineers to validate fine-tuned models and optimize prompts for target scenarios • Analyze model failures, edge cases, and adversarial inputs to identify risks and improvement areas • Benchmark LLM performance against industry standards and product-specific KPIs • Strong analytical skills for dissecting model behavior, statistical performance, and failure modes • Familiarity with AWS cloud platform and MLOps tooling (e.g., MLfloW etc.) • Collaborate with product managers to convert requirements into test cases and test data • Write automation scripts to simulate user behavior and backend interactions • Document test plans, test reports, and AI evaluation metrics

Diverse Lynx LLC is an Equal Employment Opportunity employer. All qualified applicants will receive due consideration for employment without any discrimination. All applicants will be evaluated solely on the basis of their ability, competence and their proven capability to perform the functions outlined in the corresponding role. We promote and support a diverse workforce across all levels in the company.