Digital Realty
Manager – Network Observability Platform and Automation
Digital Realty, Dallas, Texas, United States, 75215
Manager – Network Observability Platform and Automation
Location: Austin, Boston, Dallas, Ashburn, Chicago
Base pay range: $120,000.00/yr - $130,000.00/yr
Overview Position Title: Network Observability Platform and Automation
Your role: A Manager – Network Observability typically leads a team of engineers focused on maintaining and improving the reliability, performance, and availability of an organization\'s systems and infrastructure. This role involves a mix of technical leadership, people management, and strategic planning, ensuring systems meet business and user needs. You will oversee Digital Realty’s Observability stack and build and maintain a global observability infrastructure.
What You’ll Do
Team Leadership:
Manage and mentor a team of SREs, fostering their growth and development.
Set team goals, prioritize projects, and ensure alignment with organizational objectives.
Conduct performance reviews and provide constructive feedback.
Build a positive and collaborative team environment.
Technical Oversight:
Oversee the design, implementation, and maintenance of reliable infrastructure and services.
Collaborate with other teams to define requirements, standards, and best practices.
Identify and address performance bottlenecks and ensure system stability.
Implement and improve monitoring and observability frameworks.
Operational Excellence:
Manage on-call rotations and incident response to minimize downtime and ensure swift resolution.
Drive automation efforts to reduce manual tasks and improve efficiency.
Implement structured engineering and operations processes.
Analyze and evaluate existing processes to identify opportunities for improvement.
Strategic Planning:
Develop and implement the long-term reliability strategy for the organization.
Make decisions about build vs. buy for tools and technologies.
Ensure alignment with business goals and customer expectations.
Manage relationships with vendors and other stakeholders.
Communication and Collaboration:
Act as a bridge between technical teams and other departments.
Represent the SRE team to stakeholders and communicate effectively.
Collaborate with other engineering teams to ensure efficient workflows.
Foster a culture of blameless postmortems and continuous learning.
What You’ll Need Key Skills and Experience
Strong technical background in distributed systems, cloud computing, and related technologies.
Proven experience in managing and mentoring technical teams.
Excellent problem-solving and communication skills.
Experience with monitoring, automation, and incident management.
Understanding of SLOs, SLIs, and SLAs.
Familiarity with DevOps and Agile practices.
Qualifications
10+ years of operations and engineering experience
5+ years of team building and management
3+ years of network engineering in large-scale data center environments
Bachelor’s degree in computer science (or equivalent training) preferred
Expertise in Layer 3 routing (BGP, IS-IS) and Layer 2 switching (802.1Q, STP) protocols
Experience with virtual networking concepts such as EVPN, VXLAN, Open vSwitch
Experience working with automation tools (Ansible, Terraform, etc.)
Comfort with Python (or equivalent language)
Strong experience working with Linux systems and tools
Experience with virtual routing in Linux with FRR or similar software preferred
Experience with AWS preferred
A basic understanding of software development tools (GitHub, Jenkins, etc.) and software development practices
Ability to understand high-level network design and its impacts across the infrastructure
Ability to work independently on complex and unique enterprise engineering projects
Strong analytical and troubleshooting skills
Strong communication skills
About Digital Realty Digital Realty brings companies and data together by delivering the full spectrum of data center, colocation and interconnection solutions. PlatformDIGITAL, the company’s global data center platform, provides customers with a secure data meeting place and a proven Pervasive Datacenter Architecture (PDx) solution methodology for powering innovation and efficiently managing Data Gravity challenges. Digital Realty gives its customers access to the connected data communities that matter to them with a global data center footprint of 300+ facilities in 50+ metros across 28 countries on six continents.
To learn more about Digital Realty, please visit digitalrealty.com or follow us on LinkedIn and X.
About Our Digital Team Our IT team is at the heart of our business. We develop infrastructures, design and build networks, support servers and provide the first line of support by delivering rich connectivity for our customers. With new data centers coming online all the time, it’s a rapidly changing technical environment so our team is always ready to innovate and take the lead on projects. We constantly develop, deploy and support vital networks and data services that drive business performance and improve life for customers around the globe.
#J-18808-Ljbffr
Base pay range: $120,000.00/yr - $130,000.00/yr
Overview Position Title: Network Observability Platform and Automation
Your role: A Manager – Network Observability typically leads a team of engineers focused on maintaining and improving the reliability, performance, and availability of an organization\'s systems and infrastructure. This role involves a mix of technical leadership, people management, and strategic planning, ensuring systems meet business and user needs. You will oversee Digital Realty’s Observability stack and build and maintain a global observability infrastructure.
What You’ll Do
Team Leadership:
Manage and mentor a team of SREs, fostering their growth and development.
Set team goals, prioritize projects, and ensure alignment with organizational objectives.
Conduct performance reviews and provide constructive feedback.
Build a positive and collaborative team environment.
Technical Oversight:
Oversee the design, implementation, and maintenance of reliable infrastructure and services.
Collaborate with other teams to define requirements, standards, and best practices.
Identify and address performance bottlenecks and ensure system stability.
Implement and improve monitoring and observability frameworks.
Operational Excellence:
Manage on-call rotations and incident response to minimize downtime and ensure swift resolution.
Drive automation efforts to reduce manual tasks and improve efficiency.
Implement structured engineering and operations processes.
Analyze and evaluate existing processes to identify opportunities for improvement.
Strategic Planning:
Develop and implement the long-term reliability strategy for the organization.
Make decisions about build vs. buy for tools and technologies.
Ensure alignment with business goals and customer expectations.
Manage relationships with vendors and other stakeholders.
Communication and Collaboration:
Act as a bridge between technical teams and other departments.
Represent the SRE team to stakeholders and communicate effectively.
Collaborate with other engineering teams to ensure efficient workflows.
Foster a culture of blameless postmortems and continuous learning.
What You’ll Need Key Skills and Experience
Strong technical background in distributed systems, cloud computing, and related technologies.
Proven experience in managing and mentoring technical teams.
Excellent problem-solving and communication skills.
Experience with monitoring, automation, and incident management.
Understanding of SLOs, SLIs, and SLAs.
Familiarity with DevOps and Agile practices.
Qualifications
10+ years of operations and engineering experience
5+ years of team building and management
3+ years of network engineering in large-scale data center environments
Bachelor’s degree in computer science (or equivalent training) preferred
Expertise in Layer 3 routing (BGP, IS-IS) and Layer 2 switching (802.1Q, STP) protocols
Experience with virtual networking concepts such as EVPN, VXLAN, Open vSwitch
Experience working with automation tools (Ansible, Terraform, etc.)
Comfort with Python (or equivalent language)
Strong experience working with Linux systems and tools
Experience with virtual routing in Linux with FRR or similar software preferred
Experience with AWS preferred
A basic understanding of software development tools (GitHub, Jenkins, etc.) and software development practices
Ability to understand high-level network design and its impacts across the infrastructure
Ability to work independently on complex and unique enterprise engineering projects
Strong analytical and troubleshooting skills
Strong communication skills
About Digital Realty Digital Realty brings companies and data together by delivering the full spectrum of data center, colocation and interconnection solutions. PlatformDIGITAL, the company’s global data center platform, provides customers with a secure data meeting place and a proven Pervasive Datacenter Architecture (PDx) solution methodology for powering innovation and efficiently managing Data Gravity challenges. Digital Realty gives its customers access to the connected data communities that matter to them with a global data center footprint of 300+ facilities in 50+ metros across 28 countries on six continents.
To learn more about Digital Realty, please visit digitalrealty.com or follow us on LinkedIn and X.
About Our Digital Team Our IT team is at the heart of our business. We develop infrastructures, design and build networks, support servers and provide the first line of support by delivering rich connectivity for our customers. With new data centers coming online all the time, it’s a rapidly changing technical environment so our team is always ready to innovate and take the lead on projects. We constantly develop, deploy and support vital networks and data services that drive business performance and improve life for customers around the globe.
#J-18808-Ljbffr