Logo
GTN Technical Staffing

Staff Machine Learning Engineer, LLM Fine Tuning (Verilog/RTL Applications)

GTN Technical Staffing, Topeka, Kansas, United States

Save Job

Staff Machine Learning Engineer, LLM Fine‑Tuning (Verilog/RTL Applications) Highlights Location: San Jose, CA (Onsite/Hybrid)

Schedule: Full Time

Position Type: Contract

Hourly: BOE

Overview Our client is building privacy‑preserving LLM capabilities that help hardware design teams reason over Verilog/SystemVerilog and RTL artifacts – code generation, refactoring, lint explanation, constraint translation, and spec‑to‑RTL assistance. The role is a Staff‑level engineer who

technically leads

a small, high‑leverage team that fine‑tunes and productizes LLMs for these workflows in a strict enterprise data‑privacy environment.

You don't need to be a Verilog/RTL expert to start; curiosity, drive, and deep LLM craftsmanship matter most. Any HDL/EDA fluency is a strong plus.

Responsibilities

Own the technical roadmap

for Verilog/RTL‑focused LLM capabilities—from model selection and adaptation to evaluation, deployment, and continuous improvement.

Lead a hands‑on team

of applied scientists/engineers: set direction, unblock technically, review designs/code, and raise the bar on experimentation velocity and reliability.

Fine‑tune and customize models

using state‑of‑the‑art techniques (LoRA/QLoRA, PEFT, instruction tuning, preference optimization/RLAIF) with robust HDL‑specific evals:

Compile/lint/simulate‑based pass rates, pass@k for code generation, constrained decoding to enforce syntax, and "does‑it‑synthesize" checks.

Design privacy‑first ML pipelines on AWS :

Training/customization and hosting using

Amazon Bedrock

(including

Anthropic

models),

SageMaker

(or EKS + KServe/Triton/DJL) for bespoke training needs.

Artifacts in

S3

with

KMS

CMKs; isolated

VPC

subnets &

PrivateLink

(including Bedrock VPC endpoints),

IAM

least‑privilege,

CloudTrail

auditing, and

Secrets Manager

for credentials.

Enforce encryption in transit/at rest, data minimization, no public egress for customer/RTL corpora.

Stand up dependable model serving : Bedrock model invocation where it fits, and/or low‑latency self‑hosted inference (vLLM/TensorRT‑LLM), autoscaling, and canary/blue‑green rollouts.

Build an evaluation culture : automatic regression suites that run HDL compilers/simulators, measure behavioral fidelity, and detect hallucinations/constraint violations; model cards and experiment tracking (MLflow/Weights & Biases).

Partner deeply

with hardware design, CAD/EDA, Security, and Legal to source/prepare datasets (anonymization, redaction, licensing), define acceptance gates, and meet compliance requirements.

Drive productization : integrate LLMs with internal developer tools (IDEs/plug‑ins, code review bots, CI), retrieval (RAG) over internal HDL repos/specs, and safe tool‑use/function‑calling.

Mentor & uplevel : coach ICs on LLM best practices, reproducible training, critical paper reading, and building secure‑by‑default systems.

Minimum qualifications

10+ years

total engineering experience with

5+ years

in ML/AI or large‑scale distributed systems;

3+ years

working directly with transformers/LLMs.

Proven track record

shipping LLM‑powered features

in production and leading ambiguous, cross‑functional initiatives at Staff level.

Deep hands‑on skill with

PyTorch ,

Hugging Face Transformers/PEFT/TRL , distributed training (DeepSpeed/FSDP), quantization‑aware fine‑tuning (LoRA/QLoRA), and constrained/grammar‑guided decoding.

AWS expertise

to design and defend secure enterprise deployments, including:

Amazon Bedrock

(model selection,

Anthropic

model usage, model customization, Guardrails, Knowledge Bases, Bedrock runtime APIs, VPC endpoints)

SageMaker

(Training, Inference, Pipelines),

S3 ,

EC2/EKS/ECR ,

VPC/Subnets/Security Groups ,

IAM ,

KMS ,

PrivateLink ,

CloudWatch/CloudTrail ,

Step Functions ,

Batch ,

Secrets Manager .

Strong software engineering fundamentals: testing, CI/CD, observability, performance tuning; Python a must (bonus for Go/Java/C++).

Demonstrated ability to

set technical vision

and influence across teams; excellent written and verbal communication for execs and engineers.

Preferred qualifications

Familiarity with

Verilog/SystemVerilog/RTL

workflows: lint, synthesis, timing closure, simulation, formal, test benches, and EDA tools (Synopsys/Cadence/Mentor).

Experience integrating

static analysis/AST‑aware tokenization

for code models or grammar‑constrained decoding.

RAG at scale over code/specs (vector stores, chunking strategies), tool‑use/function‑calling for code transformation.

Inference optimization:

TensorRT‑LLM , KV‑cache optimization, speculative decoding; throughput/latency trade‑offs at batch and token levels.

Model governance/safety in the enterprise: model cards, red‑teaming, secure eval data handling; exposure to SOC2/ISO 27001/NIST frameworks.

Data anonymization, DLP scanning, and code de‑identification to protect IP.

What success looks like 90 days

Baseline an HDL‑aware eval harness that compiles/simulates; establish secure AWS training & serving environments (VPC‑only, KMS‑backed, no public egress).

Ship an initial fine‑tuned/customized model with measurable gains vs. Base (e.g., +X% compile‑pass rate, -Y% lint findings per K LOC generated).

180 days

Expand customization/training coverage (Bedrock for managed FMs including Anthropic; SageMaker/EKS for bespoke/open models).

Add constrained decoding + retrieval over internal design specs; productionize inference with SLOs (p95 latency, availability) and audited rollout to pilot hardware teams.

12 months

Demonstrably reduce review/iteration cycles for RTL tasks with clear metrics (defect reduction, time‑to‑lint‑clean, % auto‑fix suggestions accepted), and a stable MLOps path for continuous improvement.

Security & privacy by design

Customer and internal design data remain

within private AWS VPCs ; access via IAM roles and audited by CloudTrail; all artifacts encrypted with

KMS .

No public internet calls for sensitive workloads;

Bedrock

access via

VPC interface endpoints/PrivateLink

with endpoint policies;

SageMaker

and/or

EKS

run in private subnets.

Data pipelines enforce

minimization, tagging, retention windows , and reproducibility; DLP scanning and redaction are first‑class steps.

We produce

model cards ,

data lineage , and

evaluation artifacts

for every release.

Tech you’ll touch

Modeling:

PyTorch, HF Transformers/PEFT/TRL, DeepSpeed/FSDP, vLLM, TensorRT‑LLM

AWS & MLOps:

Amazon Bedrock

(Anthropic and other FMs, Guardrails, Knowledge Bases, Runtime APIs), SageMaker (Training/Inference/Pipelines), MLflow/Weights & Biases, ECR, EKS/KServe/Triton, Step Functions

Platform/Security:

S3 + KMS, IAM, VPC/PrivateLink (incl. Bedrock), CloudWatch/CloudTrail, Secrets Manager

Tooling (nice to have)

HDL toolchains for compile/simulate/lint, vector stores (pgvector/OpenSearch), GitHub/GitLab CI

"We are GTN -The Go To Network"

#J-18808-Ljbffr