ValidMind Inc
ValidMind empowers financial services organizations to bring more trust and transparency to the worlds AI/ML/LLM models. With the rapid evolution of AI, increased regulatory scrutiny, and lack of fit‑for‑purpose tooling, financial services’ Model Risk Management (MRM) and AI Governance functions are under enormous pressure to ensure compliance. We are passionate about helping these organizations seamlessly and confidently test, validate, and document their business’ AI models while ensuring compliance with domestic and international AI and model risk regulations.
Overview We’re looking for a skilled
Infrastructure Engineer
to design, build, and maintain reliable, scalable infrastructure that supports our engineering teams and product delivery. You’ll be responsible for managing cloud environments, implementing infrastructure-as-code practices, and ensuring high availability and observability of our systems.
What You’ll Do & Your Impact:
Design, deploy, and manage infrastructure
using
Docker ,
Kubernetes , and
Terraform
to support production and development environments.
Manage cloud infrastructure
on a major provider, preferably
AWS
(experience with GCP or Azure also considered).
Implement monitoring and observability solutions
using tools such as
Datadog ,
Splunk ,
Prometheus , or
Grafana
to ensure system reliability and performance.
Collaborate closely
with
backend and fullstack engineers
to support continuous integration, delivery, and deployment pipelines.
Participate in the on‑call rotation , respond to incidents, and help drive post‑incident reviews and reliability improvements.
Automate operational tasks
using scripting languages such as
Bash
and
Python .
Maintain and improve security and compliance
practices within infrastructure and deployment processes.
Document
infrastructure designs, processes, and procedures to promote transparency and knowledge sharing across the team.
Who You Are & What Makes You Qualified:
3+ years of professional experience
in infrastructure, DevOps, or SRE roles.
Strong experience with
containerization (Docker)
and
orchestration (Kubernetes)
in production environments.
Proven experience with
Terraform
or other infrastructure-as-code tools.
Hands‑on experience with
AWS
(EC2, ECS/EKS, S3, IAM, CloudWatch, etc.) or another major cloud platform.
Proficiency in
monitoring and logging tools
(e.g., Datadog, Splunk, Prometheus, ELK stack).
Comfortable writing
automation scripts
in
Bash
and
Python .
Experience supporting
CI/CD pipelines
and deployment workflows.
Strong communication skills and ability to
collaborate effectively
with cross‑functional teams.
Willingness to
participate in an on‑call rotation
and help improve system reliability and response processes.
Nice‑to‑Have(s):
Familiarity with
service mesh
or
networking within Kubernetes .
Experience with
security best practices
in cloud and containerized environments.
Understanding of
GitOps
workflows (e.g., ArgoCD, Flux).
Knowledge of
performance tuning ,
capacity planning , and
cost optimization
in cloud environments.
Why Join Us
Opportunity to have a direct impact on the stability and scalability of core systems.
Collaborative engineering culture with strong ownership and autonomy.
Exposure to a modern tech stack and opportunities for professional growth.
At ValidMind, we create the most efficient solution for organizations to automate testing, documentation, and risk management for AI and statistical models. Working here means being at the forefront of AI risk management, but it’s also more personal than that: we promote an inclusive culture where we value your ideas and creativity. We want you to have a sense of ownership over your work, to build mutual trust with your peers, and to feel supported in everything you do. There is ample room to grow as a VC‑backed company in the early stages of growth.
As set forth in ValidMind’s Equal Employment Opportunity policy, we do not discriminate on the basis of any protected group status under any applicable law.
#J-18808-Ljbffr
Overview We’re looking for a skilled
Infrastructure Engineer
to design, build, and maintain reliable, scalable infrastructure that supports our engineering teams and product delivery. You’ll be responsible for managing cloud environments, implementing infrastructure-as-code practices, and ensuring high availability and observability of our systems.
What You’ll Do & Your Impact:
Design, deploy, and manage infrastructure
using
Docker ,
Kubernetes , and
Terraform
to support production and development environments.
Manage cloud infrastructure
on a major provider, preferably
AWS
(experience with GCP or Azure also considered).
Implement monitoring and observability solutions
using tools such as
Datadog ,
Splunk ,
Prometheus , or
Grafana
to ensure system reliability and performance.
Collaborate closely
with
backend and fullstack engineers
to support continuous integration, delivery, and deployment pipelines.
Participate in the on‑call rotation , respond to incidents, and help drive post‑incident reviews and reliability improvements.
Automate operational tasks
using scripting languages such as
Bash
and
Python .
Maintain and improve security and compliance
practices within infrastructure and deployment processes.
Document
infrastructure designs, processes, and procedures to promote transparency and knowledge sharing across the team.
Who You Are & What Makes You Qualified:
3+ years of professional experience
in infrastructure, DevOps, or SRE roles.
Strong experience with
containerization (Docker)
and
orchestration (Kubernetes)
in production environments.
Proven experience with
Terraform
or other infrastructure-as-code tools.
Hands‑on experience with
AWS
(EC2, ECS/EKS, S3, IAM, CloudWatch, etc.) or another major cloud platform.
Proficiency in
monitoring and logging tools
(e.g., Datadog, Splunk, Prometheus, ELK stack).
Comfortable writing
automation scripts
in
Bash
and
Python .
Experience supporting
CI/CD pipelines
and deployment workflows.
Strong communication skills and ability to
collaborate effectively
with cross‑functional teams.
Willingness to
participate in an on‑call rotation
and help improve system reliability and response processes.
Nice‑to‑Have(s):
Familiarity with
service mesh
or
networking within Kubernetes .
Experience with
security best practices
in cloud and containerized environments.
Understanding of
GitOps
workflows (e.g., ArgoCD, Flux).
Knowledge of
performance tuning ,
capacity planning , and
cost optimization
in cloud environments.
Why Join Us
Opportunity to have a direct impact on the stability and scalability of core systems.
Collaborative engineering culture with strong ownership and autonomy.
Exposure to a modern tech stack and opportunities for professional growth.
At ValidMind, we create the most efficient solution for organizations to automate testing, documentation, and risk management for AI and statistical models. Working here means being at the forefront of AI risk management, but it’s also more personal than that: we promote an inclusive culture where we value your ideas and creativity. We want you to have a sense of ownership over your work, to build mutual trust with your peers, and to feel supported in everything you do. There is ample room to grow as a VC‑backed company in the early stages of growth.
As set forth in ValidMind’s Equal Employment Opportunity policy, we do not discriminate on the basis of any protected group status under any applicable law.
#J-18808-Ljbffr