Cloud DevOps / Site Reliability Engineer, Applied Machine Learning
Apple’s Applied Machine Learning team has built systems for a number of large-scale data science applications. We work on many high-impact projects that serve various Apple lines of business! Our teams use the latest in open source technology and as committers on some of these projects, we are pushing the envelope! Working with multiple lines of business, we manage many streams of Apple-scale data. We bring it all together and unleash business value.
We do all this with an outstanding group of software engineers, data scientists, dev-ops engineers, and managers. We are looking for talented and dedicated engineers to join our team, bringing passion for infrastructure and distributed systems, to build world-class platforms/products at a very large scale across cloud environments.
Responsibilities:
- Build & support CI/CD tools to port & manage applications on AWS & Kubernetes
- Understand application requirements (Performance, Security, Scalability, etc.) and assess appropriate services/topologies on AWS & Kubernetes
- Deploy & support applications on Kubernetes environments – On-prem K8s, AWS EKS
- Build automation for self-healing systems
- Create monitoring and alerting tools for high-performance, low-latency applications on AWS
- Troubleshoot application-specific, core network, system, & performance issues
- Support challenging, fast-paced projects that deliver innovative solutions for Apple’s business
- Monitor production, staging, test, and development environments for various applications in an agile organization
The ideal candidate should be self-motivated, proactive, and solution-oriented, with a strong background in setting up and supporting infrastructure for large-scale big data applications in public cloud environments like AWS.
#J-18808-Ljbffr