Chabez Tech LLC

Splunk Administration & Engineering

Chabez Tech LLC, Atlanta, Georgia, United States, 30383

Overview

Splunk Administration & Engineering Location: Atlanta, GA & Frisco, TX (3Days) Job Summary

We are looking for a highly skilled Splunk Subject Matter Expert (SME) and Enterprise Monitoring Engineer to lead the design, implementation, and optimization of our monitoring and observability ecosystem. The ideal candidate will be an expert in Splunk, with a strong background in enterprise IT infrastructure, system performance monitoring, and log analytics. You will play a pivotal role in ensuring end-to-end visibility across our systems, applications, and services. Mandatory Areas

Must have Skills: Splunk Subject Matter Expert (SME) and Enterprise Monitoring Engineer Skill 1: 7+ years of experience with Splunk Skill 2: 7+ years of experience in Enterprise Monitoring, SPL Skill 3: 5+ years of experience in DevOps, IT infrastructure, CI/CD Key Responsibilities

Serve as the SME for Splunk architecture, deployment, and configuration across the enterprise. Maintain and optimize Splunk infrastructure, including indexers, forwarders, search heads, and clusters. Develop and manage custom dashboards, alerts, saved searches, and visualizations. Implement and tune log ingestion pipelines using Splunk Universal Forwarders, HTTP Event Collector, and other data inputs. Ensure high availability, scalability, and performance of the Splunk environment. Create dashboards, reports, alerts, advanced Splunk search, visualizations, log parsing, and external table lookups. Expertise with SPL (Search Processing Language) and understanding of Splunk architecture, including configuration files. Wide experience in monitoring and troubleshooting applications using tools like AppDynamics, Splunk, Grafana, Argos, OTEL, etc. to build observability for large-scale microservice deployments. Creating dashboards for various applications to monitor health, network issues and configure alerts. Excellent problem-solving, triaging, and debugging skills in large-scale distributed systems. Establishing and documenting run books and guidelines for using the multi-cloud infrastructure and microservices platform. Experience in optimized search queries using summary indexing. Solid knowledge and experience in monitoring the Splunk infrastructure. Develop a long-term strategy and roadmap for AI/ML tooling to support the AI capabilities across the Splunk portfolio. Diagnose and resolve network-related issues affecting CI/CD pipelines, debug DNS, firewall, proxy, and SSL/TLS problems, and use tools like tcpdump, curl, and netstat for proactive maintenance. Enterprise Monitoring & Observability

Design and implement holistic enterprise monitoring solutions integrating Splunk with tools like AppDynamics, Dynatrace, Prometheus, Grafana, SolarWinds, or others. Collaborate with application, infrastructure, and security teams to define monitoring KPIs, SLAs, and alert thresholds. Build end-to-end visibility into application performance, system health, and user experience. Integrate Splunk with ITSM platforms (e.g., ServiceNow) for event and incident management automation. Operations, Troubleshooting & Optimization Perform data onboarding, parsing, and field extraction for structured and unstructured data sources. Support incident response and root cause analysis using Splunk for troubleshooting and forensics. Regularly audit and optimize search performance, data retention policies, and index lifecycle management. Create runbooks, documentation, and SOPs for Splunk and monitoring tool usage. Required Qualifications

5 years of experience in IT infrastructure, DevOps, or monitoring roles. 3 years of hands-on experience with Splunk Enterprise as an admin, architect, or engineer. Experience designing and managing large-scale, multi-site Splunk deployments. Strong skills in SPL (Search Processing Language), dashboard design, and alerting strategies. Familiarity with Linux systems, scripting (e.g., Bash, Python), and APIs. Experience with enterprise monitoring tools and integration with Splunk (e.g., AppDynamics, Dynatrace, Nagios, Zabbix, etc.). Understanding of logging, metrics, and tracing in modern environments (on-prem and cloud). Strong understanding of network protocols, system logs, and application telemetry. Preferred Qualifications: Splunk certifications (e.g., Splunk Certified Power User, Admin, Architect). Experience with Splunk ITSI, Enterprise Security, or Observability Suite. Knowledge of cloud-native environments (AWS, Azure, or Google Cloud Platform) and cloud monitoring integrations. Experience with log aggregation, security event monitoring, or compliance (e.g., PCI, HIPAA, SOX). Familiarity with CI/CD pipelines and GitOps practices.

#J-18808-Ljbffr