Logo
Cockroach Labs

Member of Technical Staff (SRE)

Cockroach Labs, New York, New York, us, 10261

Save Job

Overview

CockroachDB provides the backbone of storing data on a global scale. Our core mission on the SRE team is to operate at scale a secure and reliable Cockroach Cloud product. We provide consultation, planning, architectural oversight, concrete designs, development, and implementation that improve the resilience, efficiency, performance, and availability of our Cloud Service. We also take pride in being good on-call engineers. We believe regular reflection on the experience of being on-call can contribute in the short, medium, and long term to improvements to the core product, including to CRDB itself. As a Site Reliability Engineer you’ll help manage and scale our CockroachCloud service, a fully managed global offering of CockroachDB spanning multiple cloud providers. You will oversee our production system, ensuring that we can provide stable and scalable infrastructure as we deliver CockroachDB to our customers. The Role CockroachDB provides the backbone of storing data on a global scale. Our core mission on the SRE team is to operate at scale a secure & reliable Cockroach Cloud product. We provide consultation, planning, architectural oversight, concrete designs, development, and implementation that improve the resilience, efficiency, performance, and availability of our Cloud Service. We also take pride in being good on-call engineers. We believe regular reflection on the experience of being on-call can contribute in the short, medium, & long term to improvements to the core product, including to CRDB itself. Responsibilities

Manage the infrastructure for cloud services, including running internal production systems and hosting CockroachDB for our external customers. Design, write and deliver software and systems to increase product reliability and operational efficiency. Develop custom tools as necessary. Keep a complex system running and solve problems relating to mission-critical services. Design, implement, operate, and troubleshoot the automation and monitoring of production clusters to maximize performance and availability. Drive the company through disaster recovery tests, where we manually turn down pieces of CockroachDB to test its overall resilience to failures. Participate in an on-call rotation for our production systems and hosted services. The Expectations

In your first 30 days, you will onboard and be exposed to our current internal and customer-facing production systems. Working with our existing SRE and engineering teams, you will pair on production operations and build out runbooks for the operation of different systems. We believe that it is essential for you to take this first month to become familiar with our technology and our company. After 3 months, you'll be fully integrated into the team. You will develop and own tooling for reliability, automation, and other issues related to CockroachCloud’s stability and scalability. You will identify new opportunities for automating processes, streamlining delivery, deploying new core functionality, and building great tools. You will help make CockroachCloud the best platform to host CockroachDB on by bringing your expertise to our database. You Have

Expertise in analyzing, monitoring, and troubleshooting large-scale distributed systems. Experience in software development using one or more of the following: Go, C, C++, Python, Java. Proficiency working with algorithms, data structures, and production troubleshooting. Expertise in working with major cloud providers (AWS, Azure, GCP, etc.) and Cloud APIs. Debugged and optimized code and to automate routine tasks. Working knowledge of web and network protocols and standards (HTTP, TLS, DNS, etc.). Prior on-call experience, exhibiting sense of ownership, attention to detail, and urgency. Experience building collaborative relationships with your colleagues. You enjoy being part of the code review process and partnering with your teammates on challenging problems. The Team

We are a group of software engineers first and foremost. The SRE team is currently distributed across North America (5) and India (4). Reporting Tom Schmidt - Sr. Manager, Engineering (Site Reliability Engineering) Tom recently joined Cockroach Labs as manager of Site Reliability Engineering and has taken responsibility for Cockroach Cloud’s production operations. Tom joined Cockroach Labs after 15 years at IBM where he contributed in a wide variety of technical leadership roles, focusing on quality and automation across compiler development, test frameworks, CICD, and more. He has been a key advocate of the SRE discipline and has helped establish formal SRE practices within IBM. Jordan Lewis - Senior Director of Engineering Jordan is the Head of Engineering for CockroachDB Cloud. He leads the teams that build, maintain and keep CockroachDB Cloud reliably serving its demanding customer base. Isaac Wong - EVP of Engineering Isaac leads the health of the engineering organization at Cockroach Labs and partners with teams to pursue quality and innovation. When not working, he enjoys drawing, playing the piano, and exploring NYC with his family. Cockroach Labs is proud to be an Equal Opportunity Employer building a diverse and inclusive workforce. If you need additional accommodations to feel comfortable during your interview process, please email accessibility@cockroachlabs.com. Benefits

Stock Options Medical Insurance Vision Insurance Dental Insurance Life and Disability Insurance Professional Development Funds Flexible Time Off Paid Holidays Paid Sick Days Paid Parental Leave Retirement Benefits Mental Wellbeing Benefits And more! This position will remain posted until filled. Applicants should apply via our Careers Page. Salary : Annual Anticipated Base Salary Range (U.S.) $179,000—$236,900 USD. Salaries for candidates outside the U.S. will vary. Cockroach Labs has a hybrid work model with in-office days on Mon/Tue/Thu. We are committed to collaboration and enabling our team to do their best work. We set standard ranges for all U.S.-based roles based on function, level, and geographic location, benchmarked against similar stage growth companies. Actual salaries may vary and fall outside of this range based on qualifications, location, skills, and experience. This position will remain posted until filled. Applicants should apply via our Careers Page. EOE statement: Cockroach Labs is an Equal Opportunity Employer. Referrals increase your chances of interviewing at Cockroach Labs by 2x.

#J-18808-Ljbffr