Scale AI

Lead Cyber Security Evaluation Expert

Scale AI, Washington, District Of Columbia, United States, 20599

Lead Cyber Security Evaluation Expert

Scale is at the frontier of the AI industry, improving the world's leading Generative AI and Large Language Models through model evaluations, human-powered supervised fine-tuning (SFT) datasets, world-class Reinforcement Learning with Human Feedback (RLHF), and more. We are seeking a deeply experienced and cross-functional Lead Cybersecurity Evaluation Expert to advise and oversee the technical quality and strategic scope of cutting-edge Cyber Test & Evaluation (T&E) projects assessing Large Language Models (LLMs). This internal expert will serve as the lead advisor across multiple cyber domains, guiding dataset development efforts, validating expert contributions from subcontractors, and ensuring that benchmarks reflect real-world complexity, domain authenticity, and technical rigor. The ideal candidate will possess deep hands-on knowledge across multiple cybersecurity domainssuch as network exploitation, cryptographic systems, LLM adversarial testing, APT analysis, and cyber ethicsand have prior experience in red teaming, incident response, or threat intelligence. This role is pivotal to ensuring that all T&E artifacts generated by subcontracted experts meet the highest standards of realism, fidelity, and relevance. Key Responsibilities

Domain oversight: Provide strategic oversight across all cyber subdomains including but not limited to malicious network traffic, cryptographic systems, adversarial LLM prompts, threat intelligence, and cyber ethics. Scoping & strategy: Collaborate with the Program Manager (you) to define project goals, deliverable scopes, evaluation frameworks, and technical benchmarks. Expert vetting: Assess the technical credibility of cyber experts proposed by subcontractors; conduct interviews and review technical artifacts to validate expertise. Quality control: Review and validate the accuracy, depth, and applicability of all datasets and question-answer pairs produced by subcontracted experts. Standardization: Establish and enforce evaluation rubrics, scenario fidelity criteria, and documentation standards to ensure consistency across all workstreams. Cross-domain bridging: Identify cross-domain gaps, propose integrated benchmark scenarios, and ensure logical alignment between adjacent domains (e.g., how network behavior supports APT identification). Stakeholder communication: Provide subject-matter advice to internal and external stakeholders on technical feasibility, risks, and coverage completeness. Required Skills

8+ years of hands-on experience in cybersecurity, with demonstrated proficiency across multiple domains (e.g., red teaming, cryptography, network forensics, cyber threat intelligence, adversarial ML). Proven experience in one or more of the following: red-teaming LLMs, TTP identification using MITRE ATT&CK, cryptographic protocol evaluation, or creation of high-fidelity cyber scenarios. Familiarity with cybersecurity testing methodologies (e.g., penetration testing, adversarial simulation, red team exercises). Strong analytical, evaluative, and problem-solving abilities. Excellent communication skills with a strong technical writing background. Preferred Qualifications

Prior experience leading or advising multi-expert technical teams across multiple cybersecurity disciplines. Understanding of LLM architectures and AI model evaluation processes. Familiarity with T&E in government or defense settings (e.g., AFWERX, MITRE, DoD AI efforts). Certifications such as CISSP, OSCP, GCIH, GCIA, GPEN, or equivalent. The base salary range for this full-time position in the location of Washington DC is: $180,000 - $200,000 USD