Logo
Aerospike, Inc.

Senior Site Reliability Engineer Bengaluru, India

Aerospike, Inc., Mountain View, California, us, 94039

Save Job

Aerospike is thereal-time databaseformission-critical use cases and workloads, includingmachine learning, generative, and agentic AI.Aerospike powers millions of transactions per second with millisecond latency, at a fraction of the total cost of ownership compared to other databases.

Global leaders, includingAdobe, Airtel, Barclays, Criteo, DBS Bank, Experian, Grab, HDFC Bank, PayPal, Sony Interactive Entertainment, The Trade Desk, and Wayfair,rely on Aerospike forcustomer 360, fraud detection, real-time bidding,profile stores, recommendation engines,and other use cases

Headquartered in Mountain View, California, Aerospike has a global presence with offices in London, Bangalore, and Tel Aviv.

In Bengaluru we follow hybrid models with mandate two days’ work from office.

Senior Site Reliability Engineer As a Senior Site Reliability Engineer (SRE) for Aerospike, you will be instrumental in designing, building, and optimizing a scalable, highly resilient cloud platform. You will focus on improving reliability, performance, and automation to ensure seamless delivery and operation of our cloud platform services. Your responsibilities will include developing robust infrastructure, implementing intelligent monitoring systems, and driving continuous improvement initiatives that enhance system efficiency, scalability, and overall platform stability.

Key Responsibilities

Designing, deploying, and optimizing large-scale Aerospike cloud platform infrastructure and services across multiple environments

Leading the development and enhancement of automation and infrastructure-as-code solutions to improve operational efficiency

Building and maintaining monitoring, alerting, and observability implementations to proactively detect and resolve system issues

Leading incident response activities, conducting post-mortems, and driving continuous improvement initiatives

Designing and enforcing security best practices for cloud infrastructure and access control

Collaborating with development teams to ensure reliable service delivery and alignment with SRE best practices

Participating in on-call rotation, responding to critical incidents and minimizing downtime through proactive mitigation strategies

Establishing documentation standards, runbooks, and system configurations for team knowledge sharing

Leading capacity planning and performance optimization efforts

Mentoring junior engineers and sharing knowledge to build team capabilities

Required Experience

6+ years of experience in Site Reliability Engineering (SRE), DevOps, or related fields, with a focus on building scalable, resilient, and automated cloud-based systems

Hands-on experience designing, deploying, and optimizing production-grade, business-critical systems in cloud environments

Expertise with at least one major public cloud provider (AWS, Google Cloud, or Azure), including cloud-native services and architectures

Strong proficiency in infrastructure-as-code (IaC) tools such as Terraform to enable automated and reproducible infrastructure

Experience in CI/CD pipeline design and implementation, enabling seamless, automated software delivery and infrastructure updates

Deep understanding of Linux/Unix systems, networking fundamentals, and distributed system architectures

Proficiency in scripting and software development using Python, Bash, or Go to build automation, tooling, and infrastructure enhancements

Experience with containerization and orchestration technologies such as Docker and Kubernetes for efficient service deployment and scaling

Hands-on experience with monitoring, logging, and observability tools (e.g., Prometheus, Grafana, Datadog, Elasticsearch, Kibana) to drive data-driven system improvements

Strong problem-solving skills with an engineering-first mindset for improving system reliability, scalability, and performance

Experience implementing security best practices for cloud infrastructure, access control, and data protection

Excellent English communication skills (verbal and written) to collaborate effectively across teams and document key processes

Preferred Skills and Qualifications

Hands-on experience managing and optimizing database deployments and services in production environments, ensuring high availability and performance

Familiarity with Aerospike or other distributed NoSQL databases

Advanced understanding of security practices and implementation in cloud environments

Relevant industry certifications, such as AWS Certified DevOps Engineer, AWS Certified Solutions Architect, Google Professional Cloud DevOps Engineer, or equivalent

Kubernetes certifications such as Certified Kubernetes Administrator (CKA), Certified Kubernetes Application Developer (CKAD), or Certified Kubernetes Security Specialist (CKS)

Proficiency with configuration management tools (Ansible, Terraform, or similar) in complex environments

Experience leading collaborative development practices and advanced version control workflows

Aerospike is an Equal Opportunity Employer. We are committed to providing an environment free from discrimination on the basis of race, religion, color, sex, gender identity, sexual orientation, age, non-disqualifying physical or mental disability, national origin, veteran status, or any other basis covered by appropriate law.

#J-18808-Ljbffr