Airspace Link
Job Summary
In a highly collaborative distributed agile team environment, this role will ensure the scalability, reliability and performance of Airspace Link’s systems and applications. This role in collaboration with the platform and software engineers will work on automating and improving operational processes.
Duties and Responsibilities
Reliability and Performance : Design, implement and maintain systems to ensure reliability, high availability and performance
Scalability : Optimize applications and infrastructure to handle growth
Monitoring and Alerting : Implement monitoring systems and performance metrics to proactively identify and address issues before the impact end users
Incident Response : Respond to incidents in a timely manner. Lead the efforts in resolving critical issues, prevent recurrence and run postmortems.
Automation : Develop tools and scripts to automate manual operational tasks, increasing efficiency
Infrastructure Management : Using Infrastructure as Code tools (IaC) like Terraform, Ansible etc
Capacity Planning : Analyze and forecast future infrastructure needs.
Change Management : Implement practices to safely release code (CI / CD, canary releases, feature flags). Reduce risk with increasing deployment velocity
Collaboration : Work closely with development teams to improve software quality and reliability
Disaster Recovery : Create disaster recovery plans to mitigate system failures
Security and Compliance : Implement security controls and conduct audits and vulnerability assessments. Ensure systems adhere to industry standards and regulations and conduct compliance audits and assessments
Position Type: Full-Time, 40 hours per week
Status: Exempt
Location: Hybrid
Requirements
B.S. in Computer Science or equivalent years to relevant experience or education
3+ years of professional experience in a similar SRE or DevOps role
Strong programming skills (Python, Go, Java)
Experience with cloud platforms (Azure, AWS, GCP)
Experience with containerization technologies (Docker, Kubernetes)
Experience with monitoring and logging tools (Prometheus, Grafana)
Knowledge of system administration
Strong problem solving and analytical skills
Great teamwork skills
An eagerness to learn and adapt to the needs of a greenfield industry
Part 107 or another pilot’s license a plus
#J-18808-Ljbffr
Duties and Responsibilities
Reliability and Performance : Design, implement and maintain systems to ensure reliability, high availability and performance
Scalability : Optimize applications and infrastructure to handle growth
Monitoring and Alerting : Implement monitoring systems and performance metrics to proactively identify and address issues before the impact end users
Incident Response : Respond to incidents in a timely manner. Lead the efforts in resolving critical issues, prevent recurrence and run postmortems.
Automation : Develop tools and scripts to automate manual operational tasks, increasing efficiency
Infrastructure Management : Using Infrastructure as Code tools (IaC) like Terraform, Ansible etc
Capacity Planning : Analyze and forecast future infrastructure needs.
Change Management : Implement practices to safely release code (CI / CD, canary releases, feature flags). Reduce risk with increasing deployment velocity
Collaboration : Work closely with development teams to improve software quality and reliability
Disaster Recovery : Create disaster recovery plans to mitigate system failures
Security and Compliance : Implement security controls and conduct audits and vulnerability assessments. Ensure systems adhere to industry standards and regulations and conduct compliance audits and assessments
Position Type: Full-Time, 40 hours per week
Status: Exempt
Location: Hybrid
Requirements
B.S. in Computer Science or equivalent years to relevant experience or education
3+ years of professional experience in a similar SRE or DevOps role
Strong programming skills (Python, Go, Java)
Experience with cloud platforms (Azure, AWS, GCP)
Experience with containerization technologies (Docker, Kubernetes)
Experience with monitoring and logging tools (Prometheus, Grafana)
Knowledge of system administration
Strong problem solving and analytical skills
Great teamwork skills
An eagerness to learn and adapt to the needs of a greenfield industry
Part 107 or another pilot’s license a plus
#J-18808-Ljbffr