Anyscale
Engineering Manager, Observability (TLM)
Anyscale, San Francisco, California, United States, 94199
Engineering Manager, Observability (TLM)
Join to apply for the
Engineering Manager, Observability (TLM)
role at
Anyscale
About Anyscale:
At Anyscale, we’re on a mission to democratize distributed computing and make it accessible to software developers of all skill levels. We commercialize Ray, an open‑source project with an ecosystem of libraries for scalable machine learning. Companies such as OpenAI, Uber, Spotify, Instacart, Cruise, and many others use Ray to accelerate AI applications.
About the Role:
We are seeking a Manager to join our team focused on building user‑facing application features for the Anyscale AI platform. The role involves interacting with users, understanding their requirements, designing and implementing features, and maintaining and improving these features over time. The backend of the platform generally deals with implementing the core business logic of these features.
About the Team:
The Workspace & Observability Team is dedicated to empowering clients to create robust AI applications using our powerful platform built on Ray. We provide bespoke monitoring tools and integrations that enhance the development lifecycle.
A snapshot of projects you may work on:
The Ray Dashboard observability tool which gives users insight into their Ray application including what code is running in which machine, how much data is being moved between various machines, and the hardware utilization of each machine.
Library‑specific observability tools like the Ray Train dashboard or Ray Serve dashboard which accelerate our users’ ability to develop distributed training or model serving applications.
Unified log viewer, a tool that ingests logs across a Ray cluster and presents the ability to query those logs in meaningful ways, such as by function name, log level, timestamp, or machine.
Anomaly detection— the ability for the Anyscale platform to automatically detect performance bottlenecks or bugs in our users’ workloads and suggest or automatically fix these issues.
Work with a team of leading distributed systems and machine learning experts.
Communicate your work to a broader audience through talks, tutorials, and blog posts.
Help us build and shape a world class company.
We’d love to hear from you if you have:
Proficiency in backend or full stack development, including experience with web API frameworks and databases.
Proficiency in Python or an ability to quickly learn new programming languages.
Good understanding of AI and machine learning concepts.
Experience with observability tools and monitoring solutions (e.g., Datadog, Splunk, AWS CloudWatch).
Familiarity with Ray or similar distributed systems frameworks.
Solid background in debugging, architecture design, and coding.
Excellent problem‑solving skills and a collaborative mindset.
Passion for building tools that enhance user experience and optimize workflows.
Compensation: At Anyscale, we take a market‑based approach to compensation. This role is eligible to participate in Anyscale’s equity and benefits offerings, including Stock Options, Healthcare plans, 401k, Education & Wellbeing Stipend, Paid Parental Leave, Fertility Benefits, Paid Time Off, Commute reimbursement, and 100% in‑office meals covered.
Equal Opportunity Employer: Anyscale Inc. is an Equal Opportunity Employer. Candidates are evaluated without regard to age, race, color, religion, sex, disability, national origin, sexual orientation, veteran status, or any other characteristic protected by federal or state law. Anyscale Inc. is an E‑Verify company. Notice of E‑Verify participation and right to work posters are available.
#J-18808-Ljbffr
Engineering Manager, Observability (TLM)
role at
Anyscale
About Anyscale:
At Anyscale, we’re on a mission to democratize distributed computing and make it accessible to software developers of all skill levels. We commercialize Ray, an open‑source project with an ecosystem of libraries for scalable machine learning. Companies such as OpenAI, Uber, Spotify, Instacart, Cruise, and many others use Ray to accelerate AI applications.
About the Role:
We are seeking a Manager to join our team focused on building user‑facing application features for the Anyscale AI platform. The role involves interacting with users, understanding their requirements, designing and implementing features, and maintaining and improving these features over time. The backend of the platform generally deals with implementing the core business logic of these features.
About the Team:
The Workspace & Observability Team is dedicated to empowering clients to create robust AI applications using our powerful platform built on Ray. We provide bespoke monitoring tools and integrations that enhance the development lifecycle.
A snapshot of projects you may work on:
The Ray Dashboard observability tool which gives users insight into their Ray application including what code is running in which machine, how much data is being moved between various machines, and the hardware utilization of each machine.
Library‑specific observability tools like the Ray Train dashboard or Ray Serve dashboard which accelerate our users’ ability to develop distributed training or model serving applications.
Unified log viewer, a tool that ingests logs across a Ray cluster and presents the ability to query those logs in meaningful ways, such as by function name, log level, timestamp, or machine.
Anomaly detection— the ability for the Anyscale platform to automatically detect performance bottlenecks or bugs in our users’ workloads and suggest or automatically fix these issues.
Work with a team of leading distributed systems and machine learning experts.
Communicate your work to a broader audience through talks, tutorials, and blog posts.
Help us build and shape a world class company.
We’d love to hear from you if you have:
Proficiency in backend or full stack development, including experience with web API frameworks and databases.
Proficiency in Python or an ability to quickly learn new programming languages.
Good understanding of AI and machine learning concepts.
Experience with observability tools and monitoring solutions (e.g., Datadog, Splunk, AWS CloudWatch).
Familiarity with Ray or similar distributed systems frameworks.
Solid background in debugging, architecture design, and coding.
Excellent problem‑solving skills and a collaborative mindset.
Passion for building tools that enhance user experience and optimize workflows.
Compensation: At Anyscale, we take a market‑based approach to compensation. This role is eligible to participate in Anyscale’s equity and benefits offerings, including Stock Options, Healthcare plans, 401k, Education & Wellbeing Stipend, Paid Parental Leave, Fertility Benefits, Paid Time Off, Commute reimbursement, and 100% in‑office meals covered.
Equal Opportunity Employer: Anyscale Inc. is an Equal Opportunity Employer. Candidates are evaluated without regard to age, race, color, religion, sex, disability, national origin, sexual orientation, veteran status, or any other characteristic protected by federal or state law. Anyscale Inc. is an E‑Verify company. Notice of E‑Verify participation and right to work posters are available.
#J-18808-Ljbffr