Summit Human Capital

Sr. Platform Engineer

Summit Human Capital, Richmond, Virginia, United States, 23214

Job Description - Senior Platform Engineer We are seeking a highly skilled Senior Platform Engineer (DevOps Engineer) to drive the automation, reliability, and scalability of our infrastructure at an enterprise level. In this role, you will assume a leadership position-providing strategic direction, mentoring mid-level and junior team members, and implementing best practices that enhance operational excellence. You will work at the intersection of software development and systems operations, ensuring our applications are resilient, secure, and continuously improving.

Responsibilities

Technical Leadership & Strategy

Architectural Guidance: Provide strategic, high-level planning for infrastructure design and application architectures, ensuring alignment with business goals and industry standards. Mentoring & Coaching: Offer oversight and guidance to mid-level and junior engineers, reviewing code, sharing best practices, and fostering a culture of learning and continuous improvement. Technical Roadmapping: Collaborate with stakeholders (e.g., Development, Security, SRE) to prioritize projects and technology investments, focusing on scalability, cost optimization, and performance. Infrastructure & Automation

Cloud Infrastructure: Design, build, and maintain secure, high-availability environments in AWS, Azure, or GCP. Infrastructure as Code: Implement and maintain IaC solutions (e.g., Terraform, Ansible, CloudFormation) for consistent, scalable deployments. Automated Workflows: Develop, refine, and optimize CI/CD pipelines (Jenkins, GitHub Actions, GitLab CI, etc.) to improve deployment speed and reliability. Orchestration: Define and implement containerization strategies (Docker Swarm, Kubernetes) to streamline environment consistency. Reliability & Performance Optimization

Service Level Objectives (SLOs): Establish and refine SLOs and SLIs in collaboration with business and operational teams. Monitoring & Observability: Use tools (Prometheus, Datadog, Grafana, etc.) to monitor system health, identify performance bottlenecks, and resolve incidents proactively. Incident Response & Postmortems: Lead root cause analysis (RCA) and post-incident reviews, implementing preventative measures for future incidents. Capacity Planning & Chaos Engineering: Conduct regular capacity assessments and leverage chaos engineering principles to validate system resiliency. Security & Compliance

Security Best Practices: Enforce least-privilege access controls, secure secrets management, and automated security checks throughout the CI/CD process. Regulatory Compliance: Collaborate with security teams to ensure adherence to frameworks such as ISO 27001, SOC 2, or other industry standards. Vulnerability Management: Implement vulnerability scanning, penetration testing, and remediation protocols, ensuring rapid resolution of security risks. Collaboration & Culture

DevOps Evangelism: Champion DevOps principles (CI/CD, infrastructure-as-code, observability) to foster a shared culture of collaboration across teams. Cross-Functional Alignment: Work closely with Product Management, QA, and Security to align technical execution with strategic objectives. Documentation & Knowledge Sharing: Maintain comprehensive documentation on infrastructure and processes; create runbooks and lead training sessions for team members. Requirements

5+ years of experience in DevOps, Site Reliability Engineering, or a related role with senior-level responsibilities. Bachelor's degree in Computer Science, Engineering, or equivalent professional experience. Advanced Cloud Expertise (AWS, Azure, or GCP), including designing complex, production-grade infrastructure. Infrastructure as Code proficiency (Terraform, Ansible, CloudFormation) and automation scripting (Python, Go, Bash, etc.). Container & Orchestration experience (Docker, Kubernetes) in large-scale environments. CI/CD: Deep understanding of continuous integration and deployment pipelines (Jenkins, GitHub Actions, GitLab CI). Observability Tools: Hands-on with Prometheus, ELK Stack, Datadog, Grafana, or equivalent. Security & Networking: Strong knowledge of security best practices, compliance standards, and networking concepts (DNS, Load Balancing, VPNs). Preferred Qualifications

Certifications such as AWS Certified DevOps Engineer, Kubernetes CKA/CKAD, or HashiCorp Terraform. Exposure to serverless technologies (AWS Lambda, Google Cloud Functions, Azure Functions). Background in Site Reliability Engineering (SRE) methodologies (error budgets, advanced incident management). Experience implementing and maintaining compliance standards (PCI DSS, HIPAA, FedRAMP). By combining your deep knowledge of cloud infrastructure, DevOps practices, and leadership in a fast-paced environment, you will play a pivotal role in ensuring our platform remains robust, secure, and adaptable to evolving needs. This position offers the opportunity to make a significant impact on both technology strategy and team development, driving success across the organization.