GTN
Cloud Engineer –Observability & Monitoring (Azure, Splunk, AKS)
Overview A technology-focused organization is seeking a highly skilled Cloud Engineer with deep expertise in monitoring distributed microservices within Azure environments. This role will lead the design and implementation of comprehensive observability solutions using Splunk, ensuring robust performance, scalability, and reliability across a microservices-based architecture.
The ideal candidate will play a key role in shaping monitoring strategies, automating incident response, and driving operational excellence for mission‑critical applications.
Key Responsibilities
Design, implement, and maintain monitoring solutions for distributed microservices on Azure Kubernetes Service (AKS)
Utilize Splunk for log ingestion, custom dashboards, alerting, and advanced analytics
Integrate Istio service mesh observability (telemetry, tracing, logging) into monitoring frameworks
Apply Twistlock (Prisma Cloud) policies for container and workload security monitoring
Monitor and configure Azure-native services including API Management (APIM), Cosmos DB, SQL Server, and Azure Networking
Use Terraform to manage infrastructure as code, embedding observability into deployments
Build and optimize CI / CD pipelines in Azure DevOps (AzDO) with integrated monitoring hooks
Leverage Azure Chaos Studio to test system resilience and incorporate findings into monitoring improvements
Support automated API and performance testing using Karate Labs alongside observability tools
Collaborate with development, security, and operations teams to define and track SLAs, SLOs, and SLIs
Participate in incident response, root cause analysis, and continuous improvement initiatives
Required Qualifications
Experience in DevOps practices and methodologies
Background in Site Reliability Engineering or Cloud Operations roles
Hands‑on experience with Splunk in a microservices environment
Proficiency with Azure Kubernetes Service (AKS) and Istio
Strong understanding of Azure services and architecture
Experience implementing Twistlock, Terraform, and Azure DevOps (AzDO) pipelines
Familiarity with Azure Chaos Studio and Karate Labs for testing and validation
Strong scripting and automation capabilities
Preferred Qualifications
Azure certifications (e.G., AZ-400, AZ-104, AZ-305)
Experience with other observability tools such as Prometheus, Grafana, or OpenTelemetry
Knowledge of DevSecOps practices and secure CI / CD pipeline implementation
#J-18808-Ljbffr
Overview A technology-focused organization is seeking a highly skilled Cloud Engineer with deep expertise in monitoring distributed microservices within Azure environments. This role will lead the design and implementation of comprehensive observability solutions using Splunk, ensuring robust performance, scalability, and reliability across a microservices-based architecture.
The ideal candidate will play a key role in shaping monitoring strategies, automating incident response, and driving operational excellence for mission‑critical applications.
Key Responsibilities
Design, implement, and maintain monitoring solutions for distributed microservices on Azure Kubernetes Service (AKS)
Utilize Splunk for log ingestion, custom dashboards, alerting, and advanced analytics
Integrate Istio service mesh observability (telemetry, tracing, logging) into monitoring frameworks
Apply Twistlock (Prisma Cloud) policies for container and workload security monitoring
Monitor and configure Azure-native services including API Management (APIM), Cosmos DB, SQL Server, and Azure Networking
Use Terraform to manage infrastructure as code, embedding observability into deployments
Build and optimize CI / CD pipelines in Azure DevOps (AzDO) with integrated monitoring hooks
Leverage Azure Chaos Studio to test system resilience and incorporate findings into monitoring improvements
Support automated API and performance testing using Karate Labs alongside observability tools
Collaborate with development, security, and operations teams to define and track SLAs, SLOs, and SLIs
Participate in incident response, root cause analysis, and continuous improvement initiatives
Required Qualifications
Experience in DevOps practices and methodologies
Background in Site Reliability Engineering or Cloud Operations roles
Hands‑on experience with Splunk in a microservices environment
Proficiency with Azure Kubernetes Service (AKS) and Istio
Strong understanding of Azure services and architecture
Experience implementing Twistlock, Terraform, and Azure DevOps (AzDO) pipelines
Familiarity with Azure Chaos Studio and Karate Labs for testing and validation
Strong scripting and automation capabilities
Preferred Qualifications
Azure certifications (e.G., AZ-400, AZ-104, AZ-305)
Experience with other observability tools such as Prometheus, Grafana, or OpenTelemetry
Knowledge of DevSecOps practices and secure CI / CD pipeline implementation
#J-18808-Ljbffr