Troveo AI

Lead Software Engineer

Troveo AI, San Francisco, California, United States, 94199

About Troveo

Troveo is building the next-generation data platform to train AI video models. We offer the world’s largest library of AI video training data—featuring millions of hours of licensed video content. Our end-to-end data pipeline connects creators, rights holders, and AI research labs, enabling scalable, compliant, and innovative uses of video for AI applications and model development.

We are an early-stage, high-growth venture backed by forward-thinking investors, and we’re seeking a

Lead Software Engineer

to help build the distributed systems powering Troveo’s data and AI infrastructure.

Role Overview The

Lead Software Engineer

will design and scale the foundational backend systems that enable Troveo’s massive data delivery, compute, and model-training operations. You’ll collaborate closely with product, DevOps, and frontend teams to architect resilient, performant microservices that support large-scale AI video data processing.

This is a hands-on, high-impact role for an engineer who thrives on technical depth - balancing reliability, scalability, and efficiency in distributed environments. You’ll shape how data moves, transforms, and powers Troveo’s AI ecosystem.

You will lead the

architecture of Troveo’s data pipelines, systems, and applications , working at the intersection of large-scale data, cloud infrastructure, and applied AI. The ideal candidate combines deep systems experience with strong communication, precision, and a startup mindset.

Key Responsibilities Architecture & Systems Design

Lead the architecting of Troveo’s

data pipelines, systems, and applications

for scalability and reliability.

Partner with product, frontend, and DevOps teams to co-design scalable backend architectures.

Architect and deploy

microservices

in production environments, ensuring orchestration, auto-scaling, and fault tolerance across hybrid or multi-cloud setups.

Build resilient distributed systems addressing challenges like

eventual consistency ,

service mesh (Istio) , and

event-driven architectures

with

Kafka

or

NATS .

Collaborate across teams as a

player-coach , mentoring other engineers while delivering hands-on code and system design.

Data Infrastructure & Optimization

Design and optimize data pipelines that process massive video datasets for AI workloads.

Dive deep into

database internals —execution and storage engines,

sharding ,

replication , and

vector search

techniques—to ensure efficiency at scale.

Extensive experience with

AWS , especially

S3 , for large-scale data processing and storage.

Strong knowledge of

SQL

(PostgreSQL preferred);

Snowflake SQL

experience is a plus.

Collaborate with ML and data engineering teams to embed AI/ML models directly into backend services, maintaining contextual awareness of video AI tradeoffs.

Reliability, Observability & Operations

Implement comprehensive

monitoring, logging, and tracing

frameworks (Prometheus, Grafana, Jaeger) to maintain 99.99% uptime.

Build and maintain CI/CD with

GitHub Actions ,

ArgoCD , or

Tekton , security scans and automated testing for zero-downtime deployments.

Profile and optimize backend services for

low latency ,

cost efficiency , and

high throughput

under load.

Ensure operational excellence under pressure—especially during tight delivery windows—while maintaining clear communication with leadership.

Security & Compliance

Enforce

zero-trust

security principles, encryption at rest and in transit, and compliance with

GDPR/CCPA .

Work with the platform team to ensure all deployments meet Troveo’s data protection and reliability standards.

Cross-Functional Collaboration & Soft Skills

Exhibit

meticulous attention to detail , ensuring deliverables adhere precisely to contract terms and customer expectations.

Communicate effectively under pressure, providing updates and clarity during time-sensitive project deliveries.

Demonstrate strong

lateral and technical communication , sharing customer delivery learnings across the engineering org to strengthen platforms and systems company-wide.

Partner directly with

Product

to translate requirements into scalable, reliable backend solutions.

Qualifications & Experience

8+ years of backend software engineering experience, including system architecture and distributed systems design.

Deep expertise in

Go ,

Python , or

Node.js , with production microservices experience.

Strong understanding of

Kubernetes ,

container orchestration , and

cloud-native

architectures.

Hands-on experience with

Kafka ,

NATS , or similar event-driven platforms.

Proven experience operating at scale with a

startup mentality

- fast-moving, adaptable, and pragmatic.

Familiar with

video AI/ML systems

- not leading their development, but understanding the tradeoffs that impact system design and performance.

Experience implementing observability and CI/CD pipelines in production.

Excellent communicator and mentor; capable of leading by example and elevating team technical standards.

Nice to Have

Prior experience in AI/ML infrastructure or large-scale data processing.

Exposure to

vector databases ,

Elasticsearch , or

real-time analytics

systems.

Contributions to open-source backend or infrastructure projects.

Experience in multi-cloud or hybrid environments.

Location & Compensation Location:

Strong preference for candidates based in the

San Francisco Bay Area . Compensation:

$200,000 – $300,000 base salary + meaningful equity participation.

Why Join Troveo?

Build the distributed backbone that powers the world’s largest AI video dataset.

Tackle complex systems challenges at the intersection of data, AI, and infrastructure.

Collaborate with a world-class engineering and research team shaping the future of AI video.

High autonomy, high impact—your code will define the reliability and scale of Troveo’s platform.

Competitive compensation with significant equity upside.

#J-18808-Ljbffr