Logo
IC Defense

Sr Software Engineer AI Systems Infrastructure

IC Defense, Columbia, Maryland, United States, 21046

Save Job

Description:

We are seeking a highly experienced and driven Sr. Software Engineer to join our full stack LLM integration and delivery team. The ideal candidate will have a strong background in

building scalable AI-powered applications

and the infrastructure that supports them, with a focus on delivering exceptional user experiences. You will play a crucial role in architecting and developing the systems that power our

cutting-edge LLM applications , ensuring they perform reliably at enterprise scale while enabling rapid iteration and deployment. Responsibilities: Lead the design and development of scalable LLM-powered applications and services. Architect infrastructure solutions that support rapid iteration and deployment of AI features Collaborate directly with product teams to translate user needs into technical solutions. Build and maintain the platforms that enable your team to ship AI features quickly and reliably. Develop and manage automation tools to improve system reliability and development efficiency. Implement and maintain monitoring, alerting, and logging systems. Conduct capacity planning and performance tuning for AI workloads. Lead and participate in incident response and post-mortem analyses. Mentor junior team members and contribute to the overall growth of the engineering team. Continuously identify and implement improvements to our systems and development processes. Skills Requirements: SWE with AI experience LLM,RAG and MCP Active and current TS.SCI w FSP 12+ years of experience in software engineering with focus on scalable systems. Strong full-stack development experience with user-facing applications. Strong programming skills in languages such as Python, Go, or Java. Extensive experience with cloud platforms (e.g., AWS, GCP, Azure) and their services. Proficiency in containerization technologies (Docker, Kubernetes). Experience with infrastructure-as-code tools (e.g., Terraform, Ansible, Puppet). Expertise in monitoring and observability tools (e.g., Prometheus, Grafana, ELK stack). Familiarity with CI/CD pipelines and practices. Strong problem-solving skills and ability to troubleshoot complex systems. Excellent communication skills and ability to work in a collaborative environment. Experience building products that prioritize user experience and product-market fit. Nice to Haves: Experience working with Large Language Models (LLMs) and related infrastructure. Experience with AI/ML model serving and optimization. Background in product-focused engineering environments. Familiarity with machine learning operations (MLOps) practices. Experience with A/B testing and feature flagging for AI features. Contributions to open-source projects. Experience with distributed systems and microservices architectures. Knowledge of security best practices and compliance requirements. Experience with real-time data processing and streaming platforms (e.g., Apache Kafka, Apache Flink). Familiarity with chaos engineering principles and tools. This position is 100% on-site.

Applicants for positions requiring security clearance will be automatically rejected for candidates not meeting the Security Clearance requirement.