GreyOrange Pte. Ltd.
Senior Site Reliability Engineer
GreyOrange Pte. Ltd., California, Missouri, United States, 65018
#### Senior Site Reliability EngineerFull TimeRequired Experience5 - 8 YearsSkillsLinux,terraform ,GCP+ 7 moreWe are seeking a talented and motivated Senior Site Reliability Engineer (SRE) to join our organization.
The SRE team at GreyOrange is responsible for monitoring the stability and availability of mission-critical production systems, managing incidents for quicker resolution, and establishing BAU. The team also manages and maintains internal tools/infra which is consumed by other development teams.
The experienced SRE will play a crucial role in ensuring the reliability, scalability, capacity planning, and performance of our infrastructure and applications. The ideal candidate will have a strong background in software engineering, system administration, containerization, and cloud technologies.**Requirements*** Should have 5 to 8 years of experience* Well-versed with scripting/programming languages (Python/Bash/PowerShell, etc.) to automate manual work, particularly within cloud environments* Well-versed with Observability tools (Grafana, Splunk, Dynatrace) for monitoring, alerting, and logging solutions to identify and address potential issues, especially in cloud infrastructure* Working experience with automation tools (Jenkins, GitLab, Ansible/Chef for configuration management) and processes to streamline deployment, monitoring, and management of systems and applications in the cloud* Hands-on experience with containerization and orchestration technologies such as Docker, Kubernetes, or similar, particularly in cloud-native environments* Well aware of SLI, SLO, SLA, and Error Budget concepts and their implementations; provide on-call support and participate in incident management & response activities as needed* Expert with troubleshooting production issues and bugs.* Good knowledge of Unix systems, networking, web technologies, and databases.* Incident Management experience coupled with effective communication skills for production workload.* Working knowledge in any one of the cloud platforms (AWS or GCP)**What you'll do:*** Lead reliability engineering projects and drive them to closure.* Ensure system stability and high availability by proactively monitoring performance and troubleshooting issues* Design, build and maintain efficient, reliable, and scalable cloud-based infrastructure and services* Automate processes and find opportunities to improve the observability and availability of the Platform to reduce toil.* Implement and manage observability tools for comprehensive monitoring, alerting, and logging* Own end-to-end availability and performance of different services & tools.* Practice sustainable incident response and blameless postmortems.* Provide on-call support for incident management and participate actively in response activitiesAbout GreyOrangeGreyOrange is a global leader in AI-driven robotic automation software and hardware, transforming distribution and fulfillment centers worldwide. Our solutions increase productivity, empower growth and scale, mitigate labor challenges, reduce risk and time to market, and create better experiences for customers and employees. Founded in 2012, GreyOrange is headquartered in Atlanta, Georgia, with offices and partners across the Americas, Europe and Asia.Our SolutionsThe GreyMatter Multiagent Orchestration (MAO) platform provides vendor-agnostic fulfillment orchestration to continuously optimize performance in real time: the right order, with the right bot and agent, taking the right path and action. Currently operating more than 70 fulfillment sites across the globe (with deployments of 700+ robots at a single site), GreyMatter enables customers to decrease their fulfillment Cost Per Unit by 50%, reduce worker onboarding time by 90% and optimize peak season performance.EEOGreyOrange provides equal employment opportunities to all employees and applicants for employment and prohibits discrimination and harassment of any type without regard to race, color, religion, age, sex, national origin, disability status, genetics, protected veteran status, sexual orientation, gender identity or expression, or any other characteristic protected by federal, state or local laws. This policy applies to all terms and conditions of employment, including recruiting, hiring, placement, promotion, termination, layoff, recall, transfer, leaves of absence, compensation, and training.In retail stores, our gStore end-to-end store execution and retail management solution supports omnichannel fulfillment, real-time replenishment, intelligent workforce tasking and more. Using real-time overhead RFID technology, the platform increases inventory accuracy up to 99%, doubles staff productivity, and enables an engaging, seamless in-store experience.Senior Site Reliability En...Gurugram, Haryana, Ind...Posted OnSkills of Previous HiresDevOpsAWSKubernetesJenkinsS3DockerTerraformpipelinesprovisioningEC2 #J-18808-Ljbffr
The SRE team at GreyOrange is responsible for monitoring the stability and availability of mission-critical production systems, managing incidents for quicker resolution, and establishing BAU. The team also manages and maintains internal tools/infra which is consumed by other development teams.
The experienced SRE will play a crucial role in ensuring the reliability, scalability, capacity planning, and performance of our infrastructure and applications. The ideal candidate will have a strong background in software engineering, system administration, containerization, and cloud technologies.**Requirements*** Should have 5 to 8 years of experience* Well-versed with scripting/programming languages (Python/Bash/PowerShell, etc.) to automate manual work, particularly within cloud environments* Well-versed with Observability tools (Grafana, Splunk, Dynatrace) for monitoring, alerting, and logging solutions to identify and address potential issues, especially in cloud infrastructure* Working experience with automation tools (Jenkins, GitLab, Ansible/Chef for configuration management) and processes to streamline deployment, monitoring, and management of systems and applications in the cloud* Hands-on experience with containerization and orchestration technologies such as Docker, Kubernetes, or similar, particularly in cloud-native environments* Well aware of SLI, SLO, SLA, and Error Budget concepts and their implementations; provide on-call support and participate in incident management & response activities as needed* Expert with troubleshooting production issues and bugs.* Good knowledge of Unix systems, networking, web technologies, and databases.* Incident Management experience coupled with effective communication skills for production workload.* Working knowledge in any one of the cloud platforms (AWS or GCP)**What you'll do:*** Lead reliability engineering projects and drive them to closure.* Ensure system stability and high availability by proactively monitoring performance and troubleshooting issues* Design, build and maintain efficient, reliable, and scalable cloud-based infrastructure and services* Automate processes and find opportunities to improve the observability and availability of the Platform to reduce toil.* Implement and manage observability tools for comprehensive monitoring, alerting, and logging* Own end-to-end availability and performance of different services & tools.* Practice sustainable incident response and blameless postmortems.* Provide on-call support for incident management and participate actively in response activitiesAbout GreyOrangeGreyOrange is a global leader in AI-driven robotic automation software and hardware, transforming distribution and fulfillment centers worldwide. Our solutions increase productivity, empower growth and scale, mitigate labor challenges, reduce risk and time to market, and create better experiences for customers and employees. Founded in 2012, GreyOrange is headquartered in Atlanta, Georgia, with offices and partners across the Americas, Europe and Asia.Our SolutionsThe GreyMatter Multiagent Orchestration (MAO) platform provides vendor-agnostic fulfillment orchestration to continuously optimize performance in real time: the right order, with the right bot and agent, taking the right path and action. Currently operating more than 70 fulfillment sites across the globe (with deployments of 700+ robots at a single site), GreyMatter enables customers to decrease their fulfillment Cost Per Unit by 50%, reduce worker onboarding time by 90% and optimize peak season performance.EEOGreyOrange provides equal employment opportunities to all employees and applicants for employment and prohibits discrimination and harassment of any type without regard to race, color, religion, age, sex, national origin, disability status, genetics, protected veteran status, sexual orientation, gender identity or expression, or any other characteristic protected by federal, state or local laws. This policy applies to all terms and conditions of employment, including recruiting, hiring, placement, promotion, termination, layoff, recall, transfer, leaves of absence, compensation, and training.In retail stores, our gStore end-to-end store execution and retail management solution supports omnichannel fulfillment, real-time replenishment, intelligent workforce tasking and more. Using real-time overhead RFID technology, the platform increases inventory accuracy up to 99%, doubles staff productivity, and enables an engaging, seamless in-store experience.Senior Site Reliability En...Gurugram, Haryana, Ind...Posted OnSkills of Previous HiresDevOpsAWSKubernetesJenkinsS3DockerTerraformpipelinesprovisioningEC2 #J-18808-Ljbffr