Logo
Digital Technologies, LLC

Cloud Infrastructure Site Reliability Engineer

Digital Technologies, LLC, Berkeley Heights, New Jersey, us, 07922

Save Job

Overview Cloud Infrastructure Site Reliability Engineer

Location:

Duration: 14Months+ Extension

Hourly Rate: Depending on Experience (DOE)

Position Summary:

As a Cloud Infrastructure Site Reliability Engineer (SRE)

with expertise in multiple public cloud service provider platforms, you will be responsible for operating infrastructure solutions, following the principles and practices pioneered by Google’s SRE model. Your work will ensure our cloud services meet uptime, reliability, and performance targets, and you will drive automation and continuous improvement across our production environments. This role will involve collaborating with cross-functional teams to enhance our cloud reliability posture and streamline processes through automation.

Responsibilities

Operate infrastructure solutions following Google-inspired SRE practices to meet uptime, reliability, and performance targets.

Collaborate with cross-functional teams to enhance cloud reliability posture and drive automation across production environments.

Identify, implement, and improve automation to streamline processes and reduce toil.

Qualifications

Bachelor’s degree in Computer Science, Engineering, or a related technical field, or equivalent practical experience.

3+ years of experience in software development with proficiency in at least one programming language (e.g., Python, Go, Java, C++).

Experience administering cloud platforms (AWS, GCP, Azure), including networking, security, containerization, storage, data management, and serverless technologies.

Solid understanding of Linux systems, networking fundamentals, virtualized and distributed systems, file systems, system processes and configurations.

Deep understanding of observability (monitoring, alerting, and logging) tools in cloud environments; ability to set up and maintain monitoring dashboards, alerts, and logs.

Familiarity with CI/CD tools for automated testing, deployments, provisioning, and observability.

Ability to manage and respond to incidents, perform root cause analysis, and implement post-mortem reviews.

Understanding of setting, monitoring, and maintaining Service-Level Objectives (SLOs) and Service-Level Agreements (SLAs) for system reliability.

Additional Qualifications a Plus

Experience working with enterprise-scale financial services or other regulated industries

5+ years of experience in SRE, DevOps, infrastructure, or cloud engineering roles, preferably supporting large-scale, distributed systems.

Excellent problem-solving, troubleshooting, and communication skills.

Experience leading technical projects or mentoring junior engineers.

Certifications: Certified Engineer, DevOps, SRE, CSREF

DIGITAL TECHNOLOGIES LLC is an equal opportunity employer inclusive of female, minority, disability and veterans, (M/F/D/V). Hiring, promotion, transfer, compensation, benefits, discipline, termination and all other employment decisions are made without regard to race, color, religion, sex, sexual orientation, gender identity, age, disability, national origin, citizenship/immigration status, veteran status or any other protected status. DIGITAL TECHNOLOGIES LLC will not make any posting or employment decision that does not comply with applicable laws relating to labor and employment, equal opportunity, employment eligibility requirements or related matters. Nor will DIGITAL TECHNOLOGIES LLC require in a posting or otherwise U.S. citizenship or lawful permanent residency in the U.S. as a condition of employment except as necessary to comply with law, regulation, executive order, or federal, state, or local government contract

#J-18808-Ljbffr