Kanak Elite Services
Job Title: DevOps Engineer - Lead
Location: Plano, TX 75024 and Richmond, VA 23238 (Hybrid) Duration: 12+ Months contract with Possible Extension/Conversion Need Ex- capital One. Key Skills & Tools
Observability Tools: Proficiency in monitoring, logging, and tracing tools, including Prometheus, Grafana, ELK Stack (Elasticsearch, Logstash, Kibana), Splunk, Datadog, New Relic, and cloud-native solutions like AWS CloudWatch. Programming Languages: Expertise in languages such as Python and Go for scripting and automation. Infrastructure & Cloud Platforms: Experience with cloud platforms (AWS, GCP, Azure) and container orchestration systems like Kubernetes. Infrastructure as Code (IaC): Familiarity with Terraform and Ansible for managing infrastructure and configurations. CI/CD & Automation: Experience with CI/CD pipelines and automation tools like Jenkins. System & Software Engineering: A strong background in both system operations and software development. Optimize cloud agent instrumentation, with cloud certifications being a plus. Datadog Fundamental, APM and Distributed Tracing Fundamentals & Datadog Demo Certification (Mandatory) Strong understanding of Observability concepts (Logs, Metrics, Tracing) Expertise in security & vulnerability management in observability Possesses 2 years of experience in cloud-based observability solutions, specializing in monitoring, logging, and tracing across AWS, Azure, and GCP environments. Job Description
Design & Implement Solutions: Build and maintain comprehensive observability platforms that provide deep insights into complex systems, incorporating logs, metrics, and traces. System Instrumentation: Instrument applications, infrastructure, and services to collect telemetry data using frameworks like OpenTelemetry. Data Analysis & Visualization: Develop dashboards, reports, and alerts using tools like Prometheus, Grafana, and Splunk to visualize system performance and detect issues. Collaboration: Work with development, SRE, and DevOps teams to integrate observability best practices and align monitoring with business and operational goals. Automation: Develop scripts and use Infrastructure as Code (IaC) tools like Ansible and Terraform to automate monitoring configurations and telemetry collection. Implement and manage full-stack observability using Datadog, ensuring seamless monitoring across infrastructure, applications, and services. Instrument agents for on-premise, cloud, and hybrid environments to enable comprehensive monitoring. Design and deploy key service monitoring, including dashboards, monitor creation, SLA/SLO definitions, and anomaly detection with alert notifications. Configure and integrate Datadog with third-party services such as ServiceNow, SSO enablement, and other ITSM tools. Seniority level
Mid-Senior level Employment type
Full-time Job function
Engineering and Information Technology Software Development
#J-18808-Ljbffr
Location: Plano, TX 75024 and Richmond, VA 23238 (Hybrid) Duration: 12+ Months contract with Possible Extension/Conversion Need Ex- capital One. Key Skills & Tools
Observability Tools: Proficiency in monitoring, logging, and tracing tools, including Prometheus, Grafana, ELK Stack (Elasticsearch, Logstash, Kibana), Splunk, Datadog, New Relic, and cloud-native solutions like AWS CloudWatch. Programming Languages: Expertise in languages such as Python and Go for scripting and automation. Infrastructure & Cloud Platforms: Experience with cloud platforms (AWS, GCP, Azure) and container orchestration systems like Kubernetes. Infrastructure as Code (IaC): Familiarity with Terraform and Ansible for managing infrastructure and configurations. CI/CD & Automation: Experience with CI/CD pipelines and automation tools like Jenkins. System & Software Engineering: A strong background in both system operations and software development. Optimize cloud agent instrumentation, with cloud certifications being a plus. Datadog Fundamental, APM and Distributed Tracing Fundamentals & Datadog Demo Certification (Mandatory) Strong understanding of Observability concepts (Logs, Metrics, Tracing) Expertise in security & vulnerability management in observability Possesses 2 years of experience in cloud-based observability solutions, specializing in monitoring, logging, and tracing across AWS, Azure, and GCP environments. Job Description
Design & Implement Solutions: Build and maintain comprehensive observability platforms that provide deep insights into complex systems, incorporating logs, metrics, and traces. System Instrumentation: Instrument applications, infrastructure, and services to collect telemetry data using frameworks like OpenTelemetry. Data Analysis & Visualization: Develop dashboards, reports, and alerts using tools like Prometheus, Grafana, and Splunk to visualize system performance and detect issues. Collaboration: Work with development, SRE, and DevOps teams to integrate observability best practices and align monitoring with business and operational goals. Automation: Develop scripts and use Infrastructure as Code (IaC) tools like Ansible and Terraform to automate monitoring configurations and telemetry collection. Implement and manage full-stack observability using Datadog, ensuring seamless monitoring across infrastructure, applications, and services. Instrument agents for on-premise, cloud, and hybrid environments to enable comprehensive monitoring. Design and deploy key service monitoring, including dashboards, monitor creation, SLA/SLO definitions, and anomaly detection with alert notifications. Configure and integrate Datadog with third-party services such as ServiceNow, SSO enablement, and other ITSM tools. Seniority level
Mid-Senior level Employment type
Full-time Job function
Engineering and Information Technology Software Development
#J-18808-Ljbffr