Wellmark, Inc.
Overview
The Observability Platform Engineer is responsible for designing, building, and maintaining observability platform tools and frameworks that enable development and operations teams to monitor and improve the performance, availability, and reliability of systems. This role involves designing and implementing systems that monitor and analyze the performance/health of software applications and infrastructure, ensuring high availability and reliability. The engineer will collaborate closely with development, site reliability engineering, DevOps, and infrastructure teams to deliver a seamless observability ecosystem. Key responsibilities include architecting observability platforms, integrating monitoring tools into software pipelines, ensuring system health visibility, reducing mean time to detection (MTTD), and promoting a culture of proactive monitoring and reliability engineering.
What you will own:
Design, build, and maintain observability platforms with reusability across services in mind.
Develop scalable, automated pipelines for ingesting, transforming, and visualizing telemetry data.
Integrate observability tools with existing infrastructure and applications (e.g., Dynatrace, Splunk, Prometheus, Grafana, Datadog, New Relic, OpenTelemetry).
Enable root cause analysis through correlation of metrics, logs, and traces.
Analyze telemetry data to identify performance bottlenecks and optimize resource allocation for improved efficiency.
Define SLIs, SLOs, and error budgets with stakeholders for critical services.
Improve incident response by enhancing monitoring dashboards, alerts, and automated notifications.
Qualifications
Preferred:
3–5 years of experience in Site Reliability Engineering, DevOps, or Observability/Monitoring engineering roles.
Proven experience building or administering observability platforms in production environments.
Track record of improving system reliability and reducing mean time to resolution (MTTR).
Hands-on experience with one or more observability platforms: Dynatrace, Prometheus, Grafana, OpenTelemetry, Elastic Stack, Splunk, Datadog, New Relic, AppDynamics, Honeycomb.
Strong knowledge of observability concepts: metrics, logs, traces, SLOs/SLIs, error budgets.
Experience working within an Agile team environment.
Experience deploying and maintaining OpenTelemetry-based observability pipelines.
Prior experience working in highly regulated environments with compliance observability needs.
Contributions to observability open-source projects.
Familiarity with chaos engineering practices to validate monitoring and resilience.
Certifications from AWS, Microsoft Azure, or Google Cloud.
Demonstrated experience coaching/mentoring others by providing guidance and feedback to help individuals strengthen their knowledge and skills.
Excellent problem-solving skills with a strong analytical mindset.
Strong written and verbal communication skills, including the ability to explain complex technical topics to engineers and business stakeholders.
Proven experience with designing technical architecture and keeping abreast of existing and emerging technologies.
Experience consulting with stakeholders to understand needs and guide action and consensus.
Proficiency in programming or scripting languages (Python, Go, Java, Bash, etc.) for observability automation.
Experience with containerization and orchestration platforms (Docker, Kubernetes).
Deep knowledge of cloud platforms (AWS, Azure, GCP), observability/monitoring services, operating systems (Windows/Linux), networking, and containerization.
Strong understanding of distributed systems, microservices, and cloud-native architectures.
Proficiency in CI/CD pipelines and how observability integrates into DevOps workflows.
Knowledge of incident management and on-call practices.
Experience with supporting observability and monitoring for Artificial Intelligence agents.
Required
Bachelor's Degree or direct and applicable work experience
Minimum 7 years of experience to include any combination of the following: development experience (e.g., Angular 2+, NodeJS, TypeScript, C#, .NET, Java, SQL), and IT infrastructure, architecture design, and operations (minimum 4 years)
Proven ability to adapt when experiencing major changes in work tasks or work environment
Informal leadership experience typically gained through leading projects
Experience coaching/mentoring others to strengthen knowledge and skills
Proven experience with designing technical architecture and keeping abreast of new technologies
Experience consulting with stakeholders to provide advice and guidance
Demonstrated problem solving and troubleshooting skills with the ability to identify root cause and propose effective solutions
Demonstrated communication skills, both verbal and written, for diverse stakeholders
Additional Information
Lead the technical designs for highly integrated complex application platforms to optimize security, information leverage and reuse, integration, performance, and availability; ensure solutions align with architecture standards and SLAs.
Consult with Solution Architects and project teams in the creation and documentation of design deliverables for application platforms.
Oversee planning, development, and estimation of technical solutions when appropriate.
Collaborate with stakeholders to provide direction regarding process improvements and architectural governance.
Provide training and mentorship on technical design and solution implementation.
Build strong relationships with business stakeholders to align technical designs with business needs.
Research and recommend new technologies and best practices to add value to the organization.
Other duties as assigned.
All your information will be kept confidential according to EEO guidelines.
An Equal Opportunity Employer
The policy of the employer is to recruit, hire, train and promote individuals in all job classifications without regard to race, color, religion, sex, national origin, age, veteran status, disability, sexual orientation, gender identity or any other characteristic protected by law.
Applicants requiring a reasonable accommodation due to a disability at any stage of the employment application process should contact us at careers@example.com
Please inform us if you meet the definition of a Covered DoD official.
At this time, the employer is not considering applicants for this position that require immigration sponsorship now or in the future. For more information about work authorization please refer to the employer resources.
#J-18808-Ljbffr
The Observability Platform Engineer is responsible for designing, building, and maintaining observability platform tools and frameworks that enable development and operations teams to monitor and improve the performance, availability, and reliability of systems. This role involves designing and implementing systems that monitor and analyze the performance/health of software applications and infrastructure, ensuring high availability and reliability. The engineer will collaborate closely with development, site reliability engineering, DevOps, and infrastructure teams to deliver a seamless observability ecosystem. Key responsibilities include architecting observability platforms, integrating monitoring tools into software pipelines, ensuring system health visibility, reducing mean time to detection (MTTD), and promoting a culture of proactive monitoring and reliability engineering.
What you will own:
Design, build, and maintain observability platforms with reusability across services in mind.
Develop scalable, automated pipelines for ingesting, transforming, and visualizing telemetry data.
Integrate observability tools with existing infrastructure and applications (e.g., Dynatrace, Splunk, Prometheus, Grafana, Datadog, New Relic, OpenTelemetry).
Enable root cause analysis through correlation of metrics, logs, and traces.
Analyze telemetry data to identify performance bottlenecks and optimize resource allocation for improved efficiency.
Define SLIs, SLOs, and error budgets with stakeholders for critical services.
Improve incident response by enhancing monitoring dashboards, alerts, and automated notifications.
Qualifications
Preferred:
3–5 years of experience in Site Reliability Engineering, DevOps, or Observability/Monitoring engineering roles.
Proven experience building or administering observability platforms in production environments.
Track record of improving system reliability and reducing mean time to resolution (MTTR).
Hands-on experience with one or more observability platforms: Dynatrace, Prometheus, Grafana, OpenTelemetry, Elastic Stack, Splunk, Datadog, New Relic, AppDynamics, Honeycomb.
Strong knowledge of observability concepts: metrics, logs, traces, SLOs/SLIs, error budgets.
Experience working within an Agile team environment.
Experience deploying and maintaining OpenTelemetry-based observability pipelines.
Prior experience working in highly regulated environments with compliance observability needs.
Contributions to observability open-source projects.
Familiarity with chaos engineering practices to validate monitoring and resilience.
Certifications from AWS, Microsoft Azure, or Google Cloud.
Demonstrated experience coaching/mentoring others by providing guidance and feedback to help individuals strengthen their knowledge and skills.
Excellent problem-solving skills with a strong analytical mindset.
Strong written and verbal communication skills, including the ability to explain complex technical topics to engineers and business stakeholders.
Proven experience with designing technical architecture and keeping abreast of existing and emerging technologies.
Experience consulting with stakeholders to understand needs and guide action and consensus.
Proficiency in programming or scripting languages (Python, Go, Java, Bash, etc.) for observability automation.
Experience with containerization and orchestration platforms (Docker, Kubernetes).
Deep knowledge of cloud platforms (AWS, Azure, GCP), observability/monitoring services, operating systems (Windows/Linux), networking, and containerization.
Strong understanding of distributed systems, microservices, and cloud-native architectures.
Proficiency in CI/CD pipelines and how observability integrates into DevOps workflows.
Knowledge of incident management and on-call practices.
Experience with supporting observability and monitoring for Artificial Intelligence agents.
Required
Bachelor's Degree or direct and applicable work experience
Minimum 7 years of experience to include any combination of the following: development experience (e.g., Angular 2+, NodeJS, TypeScript, C#, .NET, Java, SQL), and IT infrastructure, architecture design, and operations (minimum 4 years)
Proven ability to adapt when experiencing major changes in work tasks or work environment
Informal leadership experience typically gained through leading projects
Experience coaching/mentoring others to strengthen knowledge and skills
Proven experience with designing technical architecture and keeping abreast of new technologies
Experience consulting with stakeholders to provide advice and guidance
Demonstrated problem solving and troubleshooting skills with the ability to identify root cause and propose effective solutions
Demonstrated communication skills, both verbal and written, for diverse stakeholders
Additional Information
Lead the technical designs for highly integrated complex application platforms to optimize security, information leverage and reuse, integration, performance, and availability; ensure solutions align with architecture standards and SLAs.
Consult with Solution Architects and project teams in the creation and documentation of design deliverables for application platforms.
Oversee planning, development, and estimation of technical solutions when appropriate.
Collaborate with stakeholders to provide direction regarding process improvements and architectural governance.
Provide training and mentorship on technical design and solution implementation.
Build strong relationships with business stakeholders to align technical designs with business needs.
Research and recommend new technologies and best practices to add value to the organization.
Other duties as assigned.
All your information will be kept confidential according to EEO guidelines.
An Equal Opportunity Employer
The policy of the employer is to recruit, hire, train and promote individuals in all job classifications without regard to race, color, religion, sex, national origin, age, veteran status, disability, sexual orientation, gender identity or any other characteristic protected by law.
Applicants requiring a reasonable accommodation due to a disability at any stage of the employment application process should contact us at careers@example.com
Please inform us if you meet the definition of a Covered DoD official.
At this time, the employer is not considering applicants for this position that require immigration sponsorship now or in the future. For more information about work authorization please refer to the employer resources.
#J-18808-Ljbffr