General Information
Job Description: Position for Senior SRE - Location requested is Dallas, TX preferred. Also can sit in Westlake, TX.
Responsibilities include, but are not limited to:
• Practice Site Reliability Engineering mindset and solve problems through automation, instrumentation, and simplicity
• Partner with the Architects, Development Leads, Business Partners and other SREs in the team, to ensure implementations are architected and designed from the aspect of resiliency
• Identify applications reliability and availability improvements, establish, and build solutions to continue to drive an improved experience
• Perform production support, application deployments and provide a rapid response for critical trading applications
• Proactively perform system monitoring, and review SLO / SLI Metrics and runbooks
• Implement and collaborate on solutions that increase the monitoring and observability of systems at scale
• Work with development teams to provide recommendations about system health upgrades and toil reduction
• Advocate for Client's Reliability Engineering principles, guidelines, and standards
• Foster a culture of learning through education and knowledge sharing around reliability practices, processes, and tools
• Participate in On-Call escalations during Market and off-hours
What you have
Required Qualifications
• 6+ years of experience with large-scale enterprise system administration, application support or incident handling in an SRE role
• 6+ years of experience of RHEL Linux administration or Windows server administration
• 6+ years of experience with proven track record of supporting enterprise production environment while adhering to various DevOps & SRE frameworks
• 6+ years of experience building application dashboards for proactive monitoring, setting up Alerts, etc.
• 6+ years of experience with logging/application monitoring tools (AppDynamics, Splunk, Dynatrace, Thousand Eyes)
• 4+ years of experience supporting applications on Cloud operations such as GCP and Pivotal Cloud Foundry (PCF)
• 4+ years of experience using Atlassian tools Jira, Confluence, Bamboo
Preferred Qualifications
• Experience researching and building dashboards for Grafana and Prometheus suite
• Experience with Google Cloud Anthos and Kubernetes
• Strong understanding & experience of Platform as a Service (PaaS) and Infrastructure as a Service (IaaS) such as Pivotal Cloud Foundry (PCF)
• Experience with Continuous Integration/Continuous Delivery pipelines (CI/CD)
• Understanding of High Availability Enterprise systems and leveraging tools to automate proactively and eventually predictive availability solutions
• Receptive, approachable teammate, with the ability to positively interact with business partners, technology teams, offshore, and professional services
• Strong advocate with excellent written and verbal communication skills
EEO:
Mindlance is an Equal Opportunity Employer and does not discriminate in employment on the basis of Minority/Gender/Disability/Religion/LGBTQI/Age/Veterans.
Mindlance