Blankfactor

Site Reliability Engineer

Blankfactor, Trenton, New Jersey, United States

Get AI-powered advice on this job and more exclusive features.

Direct message the job poster from Blankfactor

Blankfactor is a global technology company specializing in full-stack development, cloud engineering, and modern software solutions. We partner with some of the world’s most recognized brands in fintech, financial services, and enterprise technology—building next-generation systems that are scalable, secure, and high-performing. Join us and be part of a team that thrives on engineering excellence and innovation.

About the Role As a Site Reliability Engineer (SRE) at Blankfactor, you will play a critical role in ensuring the reliability, availability, and performance of mission‑critical platforms. You will design scalable systems, develop robust automation, and leverage data‑driven operations to keep services resilient. This role works closely with development, cloud, infrastructure, and security teams to deliver high‑performing services that support how people live and work today.

What You’ll Do

Design and implement solutions that enhance application reliability, performance, scalability, and resilience.

Build and maintain monitoring, alerting, observability, and telemetry solutions for proactive issue detection and rapid incident response.

Lead incident management efforts and conduct root‑cause analysis, with actionable post‑mortem improvements.

Automate operational workflows using scripting, infrastructure‑as‑code, and configuration management tools.

Analyze capacity, performance, and usage to forecast demand and optimize cloud costs.

Collaborate across engineering teams to embed resilience, operability, and security into architectures and application design.

Support safe, reliable deployments through CI/CD pipelines, release governance, and change control.

Maintain operational documentation including runbooks, architecture diagrams, and production support guides.

Experience Required

Hands‑on experience managing Kubernetes and containerized workloads (EKS, AKS, GKE), including scaling, networking, upgrades, and orchestration.

Strong background working with public cloud platforms (AWS, Azure, or GCP) across compute, storage, networking, IAM, and cost governance.

Use of observability and APM tools such as Dynatrace, Splunk, Prometheus, Grafana, Datadog, ExtraHop, etc.

Experience implementing security and compliance controls in regulated environments (e.g., PCI DSS, SOC 2), including secrets management and vulnerability remediation.

Infrastructure‑as‑code experience (Terraform, CloudFormation, Ansible, or similar).

Experience designing and maintaining CI/CD pipelines (Jenkins, GitLab CI, GitHub Actions, or Azure DevOps).

Scripting and automation using Bash, PowerShell, or Python.

Bachelor’s degree or equivalent combination of experience, education, and/or military background.

Nice to Have

Professional certifications (AWS SysOps Administrator, AWS DevOps Engineer, Google Cloud DevOps Engineer, CKA).

Experience with Premier applications, IBM iSeries, and/or Unisys enterprise systems.

Hands‑on database operations and performance tuning (Oracle, SQL Server, PostgreSQL).

Experience leading major incident command, stakeholder communication, and cross‑team coordination.

Knowledge of ITIL and ServiceNow (change, problem, and configuration management).

Why Join Blankfactor

Work with global brands on high‑impact engineering projects

Growth opportunities, mentorship, and certifications

A multicultural environment with talented engineers across the world

Get notified about new Site Reliability Engineer jobs in

New Jersey, United States .

#J-18808-Ljbffr