Mindlance
General Information
Job Description: Position for Senior SRE - Location requested is Dallas, TX preferred. Also can sit in Westlake, TX.
Responsibilities include, but are not limited to: • Practice Site Reliability Engineering mindset and solve problems through automation, instrumentation, and simplicity • Partner with the Architects, Development Leads, Business Partners and other SREs in the team, to ensure implementations are architected and designed from the aspect of resiliency • Identify applications reliability and availability improvements, establish, and build solutions to continue to drive an improved experience • Perform production support, application deployments and provide a rapid response for critical trading applications • Proactively perform system monitoring, and review SLO / SLI Metrics and runbooks • Implement and collaborate on solutions that increase the monitoring and observability of systems at scale • Work with development teams to provide recommendations about system health upgrades and toil reduction • Advocate for Client's Reliability Engineering principles, guidelines, and standards • Foster a culture of learning through education and knowledge sharing around reliability practices, processes, and tools • Participate in On-Call escalations during Market and off-hours What you have Required Qualifications • 6+ years of experience with large-scale enterprise system administration, application support or incident handling in an SRE role • 6+ years of experience of RHEL Linux administration or Windows server administration • 6+ years of experience with proven track record of supporting enterprise production environment while adhering to various DevOps & SRE frameworks • 6+ years of experience building application dashboards for proactive monitoring, setting up Alerts, etc. • 6+ years of experience with logging/application monitoring tools (AppDynamics, Splunk, Dynatrace, Thousand Eyes) • 4+ years of experience supporting applications on Cloud operations such as GCP and Pivotal Cloud Foundry (PCF) • 4+ years of experience using Atlassian tools Jira, Confluence, Bamboo Preferred Qualifications • Experience researching and building dashboards for Grafana and Prometheus suite • Experience with Google Cloud Anthos and Kubernetes • Strong understanding & experience of Platform as a Service (PaaS) and Infrastructure as a Service (IaaS) such as Pivotal Cloud Foundry (PCF) • Experience with Continuous Integration/Continuous Delivery pipelines (CI/CD) • Understanding of High Availability Enterprise systems and leveraging tools to automate proactively and eventually predictive availability solutions • Receptive, approachable teammate, with the ability to positively interact with business partners, technology teams, offshore, and professional services • Strong advocate with excellent written and verbal communication skills
EEO: Mindlance is an Equal Opportunity Employer and does not discriminate in employment on the basis of Minority/Gender/Disability/Religion/LGBTQI/Age/Veterans.
Responsibilities include, but are not limited to: • Practice Site Reliability Engineering mindset and solve problems through automation, instrumentation, and simplicity • Partner with the Architects, Development Leads, Business Partners and other SREs in the team, to ensure implementations are architected and designed from the aspect of resiliency • Identify applications reliability and availability improvements, establish, and build solutions to continue to drive an improved experience • Perform production support, application deployments and provide a rapid response for critical trading applications • Proactively perform system monitoring, and review SLO / SLI Metrics and runbooks • Implement and collaborate on solutions that increase the monitoring and observability of systems at scale • Work with development teams to provide recommendations about system health upgrades and toil reduction • Advocate for Client's Reliability Engineering principles, guidelines, and standards • Foster a culture of learning through education and knowledge sharing around reliability practices, processes, and tools • Participate in On-Call escalations during Market and off-hours What you have Required Qualifications • 6+ years of experience with large-scale enterprise system administration, application support or incident handling in an SRE role • 6+ years of experience of RHEL Linux administration or Windows server administration • 6+ years of experience with proven track record of supporting enterprise production environment while adhering to various DevOps & SRE frameworks • 6+ years of experience building application dashboards for proactive monitoring, setting up Alerts, etc. • 6+ years of experience with logging/application monitoring tools (AppDynamics, Splunk, Dynatrace, Thousand Eyes) • 4+ years of experience supporting applications on Cloud operations such as GCP and Pivotal Cloud Foundry (PCF) • 4+ years of experience using Atlassian tools Jira, Confluence, Bamboo Preferred Qualifications • Experience researching and building dashboards for Grafana and Prometheus suite • Experience with Google Cloud Anthos and Kubernetes • Strong understanding & experience of Platform as a Service (PaaS) and Infrastructure as a Service (IaaS) such as Pivotal Cloud Foundry (PCF) • Experience with Continuous Integration/Continuous Delivery pipelines (CI/CD) • Understanding of High Availability Enterprise systems and leveraging tools to automate proactively and eventually predictive availability solutions • Receptive, approachable teammate, with the ability to positively interact with business partners, technology teams, offshore, and professional services • Strong advocate with excellent written and verbal communication skills
EEO: Mindlance is an Equal Opportunity Employer and does not discriminate in employment on the basis of Minority/Gender/Disability/Religion/LGBTQI/Age/Veterans.