Logo
HCL SINGAPORE PTE. LTD.

Infrastructure Engineer

HCL SINGAPORE PTE. LTD., West Islip, New York, United States, 11795

Save Job

Responsibilities Design, implement, and maintain observability solutions using Datadog tools including: Infrastructure Monitoring, Log Management, APM, RUM, Synthetic Monitoring, Network Monitoring, Dashboards, and Alerts Configure and maintain Datadog Agents, custom integrations, and support cloud-native (AWS, Azure, GCP) and on-premises workloads Develop custom dashboards and monitors to support SRE and DevOps teams Implement correlation of metrics, logs, traces, and events for end-to-end visibility Lead performance monitoring, outage analysis, and root cause investigations Work with application developers, infrastructure, and security teams to improve observability maturity Ensure optimal Kubernetes monitoring setup via Datadog Cluster Agent and integration with Helm, Prometheus, and custom metrics Automate deployment and configuration using IaC tools (Terraform preferred) Optimize usage and cost of Datadog through tagging strategy, metric hygiene, and data retention policies Requirements Bachelor's degree in computer science/ information technology or equivalent 3 years of hands-on experience with Datadog in a production environment Strong understanding of core observability concepts and distributed systems monitoring Cloud Platforms (AWS, GCP, or Azure) Kubernetes / EKS / GKE / AKS monitoring with Datadog Log aggregation and parsing (Groks, pipelines, enrichment) RUM and Synthetic Monitoring setup and tuning CI/CD and DevOps toolchains Proficiency with Datadog Agent, integrations, APIs, and CLI tools Scripting and automation skills (Python, Bash, or similar) Familiarity with Infrastructure as Code (Terraform, Ansible, or similar) Excellent problem-solving and communication skills Datadog certifications (e.g., Datadog Fundamentals, APM, Infrastructure) Experience working in regulated or high-compliance environments Background in SRE, DevOps, or Platform Engineering roles Knowledge of additional monitoring tools (e.g., Prometheus, Grafana, Splunk) #J-18808-Ljbffr