Logo
Jobs via Dice

Lead Devops Engineer - Capital one exp

Jobs via Dice, Richmond, Virginia, United States, 23214

Save Job

Involgix is seeking an experienced Observability Engineer to design and implement end?to?end observability solutions across cloud and hybrid environments. Key Skills & Tools Proficiency with monitoring, logging, and tracing tools Prometheus, Grafana, ELK Stack (Elasticsearch, Logstash, Kibana), Splunk, Datadog, New Relic, and AWS CloudWatch. Programming expertise in Python and Go for scripting and automation. Experience with cloud platforms (AWS, Google Cloud Platform, Azure) and Kubernetes orchestration. Infrastructure as Code knowledge Terraform and Ansible. Experience building CI/CD pipelines and automation with Jenkins. Strong background in system operations and software development. Optimizing cloud agent instrumentation; cloud certifications are a plus. Datadog foundation, APM & Distributed Tracing Fundamentals Datadog Demo Certification (Mandatory). Deep understanding of observability concepts (logs, metrics, tracing). Expertise in security & vulnerability management in observability. Minimum of 2 years experience in cloud?based observability solutions across AWS, Azure, and Google Cloud Platform.

Job Description

Design and implement comprehensive observability platforms that provide deep insights into complex systems through logs, metrics, and traces. Instrument applications, infrastructure, and services to collect telemetry data using frameworks such as OpenTelemetry. Develop dashboards, reports, and alerts with Prometheus, Grafana, and Splunk to visualize performance and detect issues. Collaborate with development, SRE, and DevOps teams to integrate observability best practices and align monitoring with business and operational goals. Automate monitoring configurations and telemetry collection using IaC tools like Ansible and Terraform. Manage full?stack observability with Datadog, ensuring seamless monitoring across infrastructure, applications, and services. Instrument agents for on?premise, cloud, and hybrid environments to enable comprehensive monitoring. Design and deploy key service monitoring, including dashboards, monitor creation, SLA/SLO definitions, and anomaly detection with alert notifications. Configure and integrate Datadog with third?party services such as ServiceNow, SSO enablement, and other ITSM tools.

#J-18808-Ljbffr