Sixtyfour
What you’ll do
Design and ship agentic systems (tool calling, multi-agent workflows, structured outputs) that reliably fetch, extract, and normalize data across the web and APIs.
Own robust web scraping: directory crawling, CAPTCHA handling, headless browsers, rotating proxies, anti-bot evasion, and backoff/retry policies.
Develop backend services in
Python + FastAPI
with clean contracts and strong observability.
Scale workloads on
AWS + Docker
(batch/queue workers, autoscaling, fault tolerance, cost control).
Parallelize external API requests safely (rate limits, idempotency, circuit breakers, retries, dedupe).
Integrate third-party APIs for enrichment and search; model and cache responses; manage schema evolution.
Transform and analyze data using
Pandas
(or similar) for normalization, QA, and reporting.
Pitch in across the stack: billing (Stripe), and occasional front-end changes to ship end-to-end features.
Minimum requirements
Hands‑on experience with
agentic architectures
(tool calling, structured outputs/JSON, planning/execution loops) and prompt engineering.
Proven
web scraping
expertise: solving CAPTCHAs, session/auth flows, proxy rotation, stealth techniques, and legal/ethical constraints.
AWS + Docker
in production (at least two of: ECS/EKS, Lambda, SQS/SNS, Batch, Step Functions, CloudWatch).
Building
high-throughput
data/IO pipelines with concurrency (asyncio/multiprocessing), resilient retries, and rate‑limit aware scheduling.
Integrating diverse
external APIs
(auth patterns, pagination, webhooks); designing stable interfaces and backfills.
Strong data wrangling with
Pandas
or equivalent; comfort with large CSV/Parquet workflows and memory/perf tuning.
Excellent ownership, product sense, and pragmatic debugging.
Nice to have
Entity resolution/record linkage at scale (probabilistic matching, blocking, deduping).
Experience with
Langfuse , OpenTelemetry, or similar for tracing/evals; task queues (Celery/RQ), Redis, Postgres.
Search relevance (BM25/vector/hybrid), embeddings, and retrieval pipelines.
Playwright/Selenium, stealth browsers, anti-bot frameworks, CAPTCHA providers.
CI/CD, infrastructure as code (Terraform), and cost/perf observability.
Security & compliance basics for data handling and PII.
#J-18808-Ljbffr
Design and ship agentic systems (tool calling, multi-agent workflows, structured outputs) that reliably fetch, extract, and normalize data across the web and APIs.
Own robust web scraping: directory crawling, CAPTCHA handling, headless browsers, rotating proxies, anti-bot evasion, and backoff/retry policies.
Develop backend services in
Python + FastAPI
with clean contracts and strong observability.
Scale workloads on
AWS + Docker
(batch/queue workers, autoscaling, fault tolerance, cost control).
Parallelize external API requests safely (rate limits, idempotency, circuit breakers, retries, dedupe).
Integrate third-party APIs for enrichment and search; model and cache responses; manage schema evolution.
Transform and analyze data using
Pandas
(or similar) for normalization, QA, and reporting.
Pitch in across the stack: billing (Stripe), and occasional front-end changes to ship end-to-end features.
Minimum requirements
Hands‑on experience with
agentic architectures
(tool calling, structured outputs/JSON, planning/execution loops) and prompt engineering.
Proven
web scraping
expertise: solving CAPTCHAs, session/auth flows, proxy rotation, stealth techniques, and legal/ethical constraints.
AWS + Docker
in production (at least two of: ECS/EKS, Lambda, SQS/SNS, Batch, Step Functions, CloudWatch).
Building
high-throughput
data/IO pipelines with concurrency (asyncio/multiprocessing), resilient retries, and rate‑limit aware scheduling.
Integrating diverse
external APIs
(auth patterns, pagination, webhooks); designing stable interfaces and backfills.
Strong data wrangling with
Pandas
or equivalent; comfort with large CSV/Parquet workflows and memory/perf tuning.
Excellent ownership, product sense, and pragmatic debugging.
Nice to have
Entity resolution/record linkage at scale (probabilistic matching, blocking, deduping).
Experience with
Langfuse , OpenTelemetry, or similar for tracing/evals; task queues (Celery/RQ), Redis, Postgres.
Search relevance (BM25/vector/hybrid), embeddings, and retrieval pipelines.
Playwright/Selenium, stealth browsers, anti-bot frameworks, CAPTCHA providers.
CI/CD, infrastructure as code (Terraform), and cost/perf observability.
Security & compliance basics for data handling and PII.
#J-18808-Ljbffr