Highbrow Technology Inc
Grafana Architect – Multi-Cloud & On-Prem Observability and Monitoring
Highbrow Technology Inc, Basking Ridge, New Jersey, us, 07920
Grafana Architect – Multi-Cloud & On-Prem Observability and Monitoring
We are seeking a seasoned Grafana Architect with expertise in designing and implementing observability solutions across multi-cloud (AWS, Azure, GCP) and on-prem environments. The ideal candidate will have hands-on experience with Grafana, Prometheus, Loki, Tempo, and related telemetry sources. Responsibilities include developing end-to-end observability strategies, architecture governance, implementation, and promoting best practices across teams. Key Responsibilities
Design and deploy scalable observability solutions using Grafana OSS/Enterprise across hybrid and on-prem environments. Develop monitoring strategies, define SLOs/SLIs, dashboards, alerts, and reporting for infrastructure, applications, and services. Integrate Grafana with Prometheus, Loki, Tempo, InfluxDB, Elasticsearch, and cloud-native tools like AWS CloudWatch, Azure Monitor, GCP Operations Suite, as well as on-prem systems. Lead the creation of custom plugins, data sources, and dashboards for cross-platform observability. Standardize templates, alerting rules, and RBAC models within Grafana Enterprise. Collaborate with DevOps, SRE, Cloud, and Application teams to define observability needs and facilitate onboarding. Implement monitoring as code practices using Terraform and Ansible for infrastructure automation. Manage telemetry collection (logs, metrics, traces) for performance, cost-efficiency, and usability. Lead capacity planning, high availability/disaster recovery design, performance tuning, and upgrades. Provide thought leadership on OpenTelemetry, distributed tracing, log aggregation, and AIOps. Conduct training sessions, create documentation, and engage with internal communities around observability tools. Required Skills & Experience
5+ years of experience with Grafana, including dashboard creation, plugin development, and user management. Strong knowledge of Prometheus, Loki, Tempo, Alertmanager, and OpenTelemetry. Proven experience with multi-cloud observability frameworks (AWS, Azure, GCP). Experience integrating with on-prem systems like vSphere, SNMP, and legacy tools. Hands-on experience with Terraform, Helm, Ansible, and GitOps practices. Scripting and automation skills in Python, Bash, or similar. Deep understanding of telemetry formats such as Prometheus metrics, OTLP, and JSON logs. Proficiency in SRE principles including SLOs, SLIs, error budgets, and alerting strategies. Experience with RBAC, LDAP/SAML, and Grafana Enterprise features. Strong troubleshooting skills in distributed systems and observability pipelines. Excellent communication, stakeholder management, and leadership skills. Nice to Have
Experience with AIOps and ML-based anomaly detection. Knowledge of security and compliance standards like SOC2 and PCI. Exposure to SIEM tools such as Splunk, Chronicle, or Elastic Security. Experience with log forwarding pipelines like Kafka, Fluent Bit, or Vector. Certifications (Preferred)
Grafana Certified Observability Professional Interested candidates should share their resume with Ganapathikumar.m@highbrowtechnology.com
#J-18808-Ljbffr
We are seeking a seasoned Grafana Architect with expertise in designing and implementing observability solutions across multi-cloud (AWS, Azure, GCP) and on-prem environments. The ideal candidate will have hands-on experience with Grafana, Prometheus, Loki, Tempo, and related telemetry sources. Responsibilities include developing end-to-end observability strategies, architecture governance, implementation, and promoting best practices across teams. Key Responsibilities
Design and deploy scalable observability solutions using Grafana OSS/Enterprise across hybrid and on-prem environments. Develop monitoring strategies, define SLOs/SLIs, dashboards, alerts, and reporting for infrastructure, applications, and services. Integrate Grafana with Prometheus, Loki, Tempo, InfluxDB, Elasticsearch, and cloud-native tools like AWS CloudWatch, Azure Monitor, GCP Operations Suite, as well as on-prem systems. Lead the creation of custom plugins, data sources, and dashboards for cross-platform observability. Standardize templates, alerting rules, and RBAC models within Grafana Enterprise. Collaborate with DevOps, SRE, Cloud, and Application teams to define observability needs and facilitate onboarding. Implement monitoring as code practices using Terraform and Ansible for infrastructure automation. Manage telemetry collection (logs, metrics, traces) for performance, cost-efficiency, and usability. Lead capacity planning, high availability/disaster recovery design, performance tuning, and upgrades. Provide thought leadership on OpenTelemetry, distributed tracing, log aggregation, and AIOps. Conduct training sessions, create documentation, and engage with internal communities around observability tools. Required Skills & Experience
5+ years of experience with Grafana, including dashboard creation, plugin development, and user management. Strong knowledge of Prometheus, Loki, Tempo, Alertmanager, and OpenTelemetry. Proven experience with multi-cloud observability frameworks (AWS, Azure, GCP). Experience integrating with on-prem systems like vSphere, SNMP, and legacy tools. Hands-on experience with Terraform, Helm, Ansible, and GitOps practices. Scripting and automation skills in Python, Bash, or similar. Deep understanding of telemetry formats such as Prometheus metrics, OTLP, and JSON logs. Proficiency in SRE principles including SLOs, SLIs, error budgets, and alerting strategies. Experience with RBAC, LDAP/SAML, and Grafana Enterprise features. Strong troubleshooting skills in distributed systems and observability pipelines. Excellent communication, stakeholder management, and leadership skills. Nice to Have
Experience with AIOps and ML-based anomaly detection. Knowledge of security and compliance standards like SOC2 and PCI. Exposure to SIEM tools such as Splunk, Chronicle, or Elastic Security. Experience with log forwarding pipelines like Kafka, Fluent Bit, or Vector. Certifications (Preferred)
Grafana Certified Observability Professional Interested candidates should share their resume with Ganapathikumar.m@highbrowtechnology.com
#J-18808-Ljbffr