SRS Consulting Inc

Staff Machine Learning Engineer / Principal ML Engineer (San Jose)

SRS Consulting Inc, San Jose, California, United States, 95199

Role: Staff Machine Learning Engineer Location: San Jose, CA (Onsite) Locals Duration: Long-term

Mode of Interview: Virtual & Final In-person

Why this role exists We're building privacypreserving LLM capabilities that help hardware design teams reason over Verilog/SystemVerilog and RTL artifactscode generation, refactoring, lint explanation, constraint translation, and spectoRTL assistance. We're looking for a Stafflevel engineer to technically lead a small, highleverage team that finetunes and productizes LLMs for these workflows in a strict enterprise dataprivacy environment. You don't need to be a Verilog/RTL expert to start; curiosity, drive, and deep LLM craftsmanship matter most. Any HDL/EDA fluency is a strong plus.

What you'll do (Responsibilities) Own the technical roadmap for Verilog/RTLfocused LLM capabilitiesfrom model selection and adaptation to evaluation, deployment, and continuous improvement. Lead a handson team of applied scientists/engineers: set direction, unblock technically, review designs/code, and raise the bar on experimentation velocity and reliability. Finetune and customize models using stateoftheart techniques (LoRA/QLoRA, PEFT, instruction tuning, preference optimization/RLAIF) with robust HDLspecific evals: o Compile/lint/simulatebased pass rates, pass@k for code generation, constrained decoding to enforce syntax, and doesitsynthesize checks. Design privacyfirst ML pipelines on AWS: o Training/customization and hosting using Amazon Bedrock (including Anthropic models) where appropriate; SageMaker (or EKS + KServe/Triton/DJL) for bespoke training needs. o Artifacts in S3 with KMS CMKs; isolated VPC subnets & PrivateLink (including Bedrock VPC endpoints), IAM leastprivilege, CloudTrail auditing, and Secrets Manager for credentials. o Enforce encryption in transit/at rest, data minimization, no public egress for customer/RTL corpora. Stand up dependable model serving: Bedrock model invocation where it fits, and/or lowlatency selfhosted inference (vLLM/TensorRTLLM), autoscaling, and canary/bluegreen rollouts. Build an evaluation culture: automatic regression suites that run HDL compilers/simulators, measure behavioral fidelity, and detect hallucinations/constraint violations; model cards and experiment tracking (MLflow/Weights & Biases). Partner deeply with hardware design, CAD/EDA, Security, and Legal to source/prepare datasets (anonymization, redaction, licensing), define acceptance gates, and meet compliance requirements. Drive productization: integrate LLMs with internal developer tools (IDEs/plugins, code review bots, CI), retrieval (RAG) over internal HDL repos/specs, and safe tooluse/functioncalling. Mentor & uplevel: coach ICs on LLM best practices, reproducible training, critical paper reading, and building securebydefault systems.

What you'll bring (Minimum qualifications) 10+ years total engineering experience with 5+ years in ML/AI or largescale distributed systems; 3+ years working directly with transformers/LLMs. Proven track record shipping LLMpowered features in production and leading ambiguous, crossfunctional initiatives at Staff level. Deep handson skill with PyTorch, Hugging Face Transformers/PEFT/TRL, distributed training (DeepSpeed/FSDP), quantizationaware finetuning (LoRA/QLoRA), and constrained/grammarguided decoding. AWS expertise to design and defend secure enterprise deployments, including: o Amazon Bedrock (model selection, Anthropic model usage, model customization, Guardrails, Knowledge Bases, Bedrock runtime APIs, VPC endpoints) o SageMaker (Training, Inference, Pipelines), S3, EC2/EKS/ECR, VPC/Subnets/Security Groups, IAM, KMS, PrivateLink, CloudWatch/CloudTrail, Step Functions, Batch, Secrets Manager. Strong software engineering fundamentals: testing, CI/CD, observability, performance tuning; Python a must (bonus for Go/Java/C++). Demonstrated ability to set technical vision and influence across teams; excellent written and verbal communication for execs and engineers.

Nice to have (Preferred qualifications) Familiarity with Verilog/SystemVerilog/RTL workflows: lint, synthesis, timing closure, simulation, formal, test benches, and EDA tools (Synopsys/Cadence/Mentor). Experience integrating static analysis/ASTaware tokenization for code models or grammarconstrained decoding. RAG at scale over code/specs (vector stores, chunking strategies), tooluse/functioncalling for code transformation. Inference optimization: TensorRTLLM, KVcache optimization, speculative decoding; throughput/latency tradeoffs at batch and token levels. Model governance/safety in the enterprise: model cards, redteaming, secure eval data handling; exposure to SOC2/ISO 27001/NIST frameworks. Data anonymization, DLP scanning, and code deidentification to protect IP.

What success looks like 90 days Baseline an HDLaware eval harness that compiles/simulates; establish secure AWS training & serving environments (VPConly, KMSbacked, no public egress). Ship an initial finetuned/customized model with measurable gains vs. base (e.g., +X% compilepass rate, Y% lint findings per K LOC generated). 180 days Expand customization/training coverage (Bedrock for managed FMs including Anthropic; SageMaker/EKS for bespoke/open models). Add constrained decoding + retrieval over internal design specs; productionize inference with SLOs (p95 latency, availability) and audited rollout to pilot hardware teams. 12 months Demonstrably reduce review/iteration cycles for RTL tasks with clear metrics (defect reduction, timetolintclean, % autofix suggestions accepted), and a stable MLOps path for continuous improvement.

How we work (Security & privacy by design) Customer and internal design data remain within private AWS VPCs; access via IAM roles and audited by CloudTrail; all artifacts encrypted with KMS. No public internet calls for sensitive workloads; Bedrock access via VPC interface endpoints/PrivateLink with endpoint policies; SageMaker and/or EKS run in private subnets. Data pipelines enforce minimization, tagging, retention windows, and reproducibility; DLP scanning and redaction are firstclass steps. We produce model cards, data lineage, and evaluation artifacts for every release.

Tech you'll touch Modeling: PyTorch, HF Transformers/PEFT/TRL, DeepSpeed/FSDP, vLLM, TensorRTLLM AWS & MLOps: Amazon Bedrock (Anthropic and other FMs, Guardrails, Knowledge Bases, Runtime APIs), SageMaker (Training/Inference/Pipelines), MLflow/W&B, ECR, EKS/KServe/Triton, Step Functions Platform/Security: S3 + KMS, IAM, VPC/PrivateLink (incl. Bedrock), CloudWatch/CloudTrail, Secrets Manager Tooling (nice to have): HDL toolchains for compile/simulate/lint, vector stores (pgvector/OpenSearch), GitHub/GitLab CI