Logo
RIT Solutions, Inc.

SRE Engineer

RIT Solutions, Inc., Iselin, New Jersey, us, 08830

Save Job

Title: SRE Engineer Location: Hybrid 3 times a week in Iselin, NJ

Needs: Openshift Kubernetes Development Experience(Java, Python, Golang) SRE Skills

Nice to Haves: Baremetal Cloud

Job Description: We are looking for a highly skilled Site Reliability and operations Engineer (SRE) with extensive experience in Kubernetes-based distributed caching and compute grid solutions. This role requires a strong foundation in software development, infrastructure automation, and reliability engineering. You will be responsible for designing, implementing, and maintaining high-performance distributed systems, ensuring reliability, scalability, and efficiency.

Development & Implementation: • Design, develop, and optimize distributed caching and compute grid solutions on Kubernetes/OpenShift • Understanding of microservices and containerized workloads using Kubernetes, Docker, and Helm. • Implement high-throughput compute grid solutions using IBM Spectrum Symphony, Tibco Grid Server or similar technologies. • Optimize application performance by leveraging parallel compute strategies, load balancing, and efficient data distribution.

Site Reliability Engineering (SRE): • Ensure high availability, scalability, and reliability of distributed systems. • Implement observability, logging, and monitoring using tools like Prometheus, Grafana, ELK, or OpenTelemetry. • Automate infrastructure provisioning and deployments using Ansible, and Helm Charts. • Understanding of CI/CD pipelines for seamless software deployment. • Troubleshoot and resolve incidents related to platform, infrastructure and distributed compute platforms, ensuring minimal downtime.

Required Skills & Qualifications: • Strong experience in Kubernetes (OpenShift and on-prem/cloud clusters). • Understanding of programming languages like Java, Go, or Python. - this will be the difference maker of the L4 vs L5 • Experience with containerization technologies (Docker, Helm, etc.). • Strong knowledge of CI/CD pipelines (Jenkins, ArgoCD, GitHub Actions). • Hands-on experience with observability tools (Prometheus, Grafana, Loki, Jaeger). • Understanding of networking, service meshes (Istio/Linkerd), and security best practices in Kubernetes. • Experience with multi-cluster and hybrid cloud Kubernetes deployments.