Infinity
Stack @ Labrynth:
GCP · Python · Pydantic/PydanticAI · Docling · Django · Cloud Run · LLMs · GitHub · Clickup · Selenium
About Labrynth:
At
Labrynth , we’re a
Silicon Valley startup
building next-generation
Hermeneutical-Agent systems
— AI that can read, reason, and execute on the world’s most complex regulations. Our
Application Validator
is live, performing audit-grade, evidence-grounded compliance checks. Next, we’re expanding the
Application Generator
to create regulator-ready drafts backed by verified data and citations.
You’ll help shape both — advancing
safety ,
evaluation rigor ,
latency , and
cost-efficiency
across
large-scale, production AI systems
at the edge of applied research and real-world impact.
Our mission is to transform bureaucratic and complex processes using AI and automation, turning them into fast, transparent, and scalable pipelines. We are a spin-off from the world’s largest AI Model Trainer -
Invisible Technologies
and are backed by the
Infinity Constellation
group. We already work with enterprise clients, governments, and large-scale projects, so you will have a real impact accelerating major developments.
About the Role We are looking for a
Senior Web Scraping Engineer
to design, build, and operate large-scale data collection systems. You will be responsible for developing robust scrapers using tools such as
Selenium, Beautiful Soup, Playwright, Scrapy , etc., and for creating automated workflows in the cloud that run reliably on a schedule, generate logs, and surface failures proactively.
You will also experiment with and apply
LLM-based techniques
to improve scraping robustness and data extraction quality.
Key Responsibilities
Design, implement, and maintain web scraping pipelines for a wide variety of websites and data sources.
Build scrapers using tools and frameworks such as
Selenium, Playwright, BeautifulSoup, Scrapy
(and similar libraries) with a focus on reliability, performance, and maintainability.
Create automated workflows for scraping and data processing:
Containerize scraping jobs (e.g., using Docker).
Deploy and orchestrate them in the cloud (e.g., AWS, GCP, Azure).
Configure scheduling (e.g., run daily/weekly/hourly) and dependency management.
Implement monitoring, alerting, and logging:
Capture detailed logs for each job run.
Track job statuses and failures.
Implement notifications/alerts when a scraper breaks or a website changes.
Handle anti-bot measures (proxies, captchas, rate limits) and design scrapers that are resilient to layout and structure changes.
Work closely with data engineering / product / ML teams to understand data requirements and ensure data quality.
Utilize LLMs (Large Language Models) to:
Parse and extract structured information from messy HTML or semi-structured content.
Increase robustness of scrapers to frequent UI/DOM changes.
Prototype new scraping / extraction strategies using LLM APIs.
Write clean, well-tested, and well-documented code, and contribute to best practices, code reviews, and tooling for the team.
Continuously improve the scraping platform, including performance optimizations, standardization, and reusability of components.
Requirements
3+ years of professional experience working with web scraping or data collection at scale.
Strong proficiency in Python and common scraping libraries/frameworks such as:
Selenium, Playwright, BeautifulSoup, Scrapy (or similar).
Solid understanding of HTML, CSS, JavaScript, HTTP, and browser behavior.
Experience building automated, production-grade workflows:
Orchestrators / schedulers (e.g., Airflow, Prefect, Dagster, or similar).
Building ETL/ELT pipelines and integrating with databases, data warehouses, or storage (e.g., PostgreSQL, BigQuery, S3, GCS).
Hands‑on experience with cloud platforms (AWS, GCP, or Azure), including:
Deploying and running scheduled jobs.
Managing infrastructure-as-code or similar deployment processes.
Strong experience with logging, monitoring, and alerting:
Ability to design logging for scraping jobs and to debug failures from logs.
Familiarity with tools like CloudWatch, Stackdriver, ELK, Prometheus, Grafana, or similar.
Experience with containers (Docker) and familiarity with CI/CD workflows.
Exposure to LLMs (e.g., OpenAI, Anthropic, etc.) for tasks like parsing, information extraction, or automation.
Strong problem‑solving skills and the ability to debug complex, dynamic websites.
Comfortable working in a fast‑paced environment, with good communication skills in English.
Nice‑to‑Have
Experience with Kubernetes or other container orchestration systems.
Experience dealing with large-scale crawling, distributed scraping, and high‑concurrency systems.
Familiarity with handling CAPTCHAs, rotating proxies, and headless browsers at scale.
Background in data engineering.
Contributions to open‑source web scraping tools or frameworks.
Working Model
Remote‑first; primary collaboration in Americas time zones with ~5 hours overlap.
Fully remote, flexible hours.
Payment in USD (contractor/freelance basis)
Budget: $5,000 USD/month
Work on a global team, with real‑world challenges and rapid growth opportunities.
#J-18808-Ljbffr
GCP · Python · Pydantic/PydanticAI · Docling · Django · Cloud Run · LLMs · GitHub · Clickup · Selenium
About Labrynth:
At
Labrynth , we’re a
Silicon Valley startup
building next-generation
Hermeneutical-Agent systems
— AI that can read, reason, and execute on the world’s most complex regulations. Our
Application Validator
is live, performing audit-grade, evidence-grounded compliance checks. Next, we’re expanding the
Application Generator
to create regulator-ready drafts backed by verified data and citations.
You’ll help shape both — advancing
safety ,
evaluation rigor ,
latency , and
cost-efficiency
across
large-scale, production AI systems
at the edge of applied research and real-world impact.
Our mission is to transform bureaucratic and complex processes using AI and automation, turning them into fast, transparent, and scalable pipelines. We are a spin-off from the world’s largest AI Model Trainer -
Invisible Technologies
and are backed by the
Infinity Constellation
group. We already work with enterprise clients, governments, and large-scale projects, so you will have a real impact accelerating major developments.
About the Role We are looking for a
Senior Web Scraping Engineer
to design, build, and operate large-scale data collection systems. You will be responsible for developing robust scrapers using tools such as
Selenium, Beautiful Soup, Playwright, Scrapy , etc., and for creating automated workflows in the cloud that run reliably on a schedule, generate logs, and surface failures proactively.
You will also experiment with and apply
LLM-based techniques
to improve scraping robustness and data extraction quality.
Key Responsibilities
Design, implement, and maintain web scraping pipelines for a wide variety of websites and data sources.
Build scrapers using tools and frameworks such as
Selenium, Playwright, BeautifulSoup, Scrapy
(and similar libraries) with a focus on reliability, performance, and maintainability.
Create automated workflows for scraping and data processing:
Containerize scraping jobs (e.g., using Docker).
Deploy and orchestrate them in the cloud (e.g., AWS, GCP, Azure).
Configure scheduling (e.g., run daily/weekly/hourly) and dependency management.
Implement monitoring, alerting, and logging:
Capture detailed logs for each job run.
Track job statuses and failures.
Implement notifications/alerts when a scraper breaks or a website changes.
Handle anti-bot measures (proxies, captchas, rate limits) and design scrapers that are resilient to layout and structure changes.
Work closely with data engineering / product / ML teams to understand data requirements and ensure data quality.
Utilize LLMs (Large Language Models) to:
Parse and extract structured information from messy HTML or semi-structured content.
Increase robustness of scrapers to frequent UI/DOM changes.
Prototype new scraping / extraction strategies using LLM APIs.
Write clean, well-tested, and well-documented code, and contribute to best practices, code reviews, and tooling for the team.
Continuously improve the scraping platform, including performance optimizations, standardization, and reusability of components.
Requirements
3+ years of professional experience working with web scraping or data collection at scale.
Strong proficiency in Python and common scraping libraries/frameworks such as:
Selenium, Playwright, BeautifulSoup, Scrapy (or similar).
Solid understanding of HTML, CSS, JavaScript, HTTP, and browser behavior.
Experience building automated, production-grade workflows:
Orchestrators / schedulers (e.g., Airflow, Prefect, Dagster, or similar).
Building ETL/ELT pipelines and integrating with databases, data warehouses, or storage (e.g., PostgreSQL, BigQuery, S3, GCS).
Hands‑on experience with cloud platforms (AWS, GCP, or Azure), including:
Deploying and running scheduled jobs.
Managing infrastructure-as-code or similar deployment processes.
Strong experience with logging, monitoring, and alerting:
Ability to design logging for scraping jobs and to debug failures from logs.
Familiarity with tools like CloudWatch, Stackdriver, ELK, Prometheus, Grafana, or similar.
Experience with containers (Docker) and familiarity with CI/CD workflows.
Exposure to LLMs (e.g., OpenAI, Anthropic, etc.) for tasks like parsing, information extraction, or automation.
Strong problem‑solving skills and the ability to debug complex, dynamic websites.
Comfortable working in a fast‑paced environment, with good communication skills in English.
Nice‑to‑Have
Experience with Kubernetes or other container orchestration systems.
Experience dealing with large-scale crawling, distributed scraping, and high‑concurrency systems.
Familiarity with handling CAPTCHAs, rotating proxies, and headless browsers at scale.
Background in data engineering.
Contributions to open‑source web scraping tools or frameworks.
Working Model
Remote‑first; primary collaboration in Americas time zones with ~5 hours overlap.
Fully remote, flexible hours.
Payment in USD (contractor/freelance basis)
Budget: $5,000 USD/month
Work on a global team, with real‑world challenges and rapid growth opportunities.
#J-18808-Ljbffr