Dovel Technologies, Inc
Software Developer (Backend – Integration)
Dovel Technologies, Inc, Huntsville, Alabama, United States, 35824
**Job Family:**Data Science & Analysis**Travel Required:**Up to 10%**Clearance Required:**Active Top Secret (TS)Guidehouse is seeking a Software Developer to join our Technology / AI and Data team, supporting mission-critical initiatives for Defense and Security clients. In this role, you will lead the design and implementation of secure, scalable ingestion and data processing workflows that power advanced AI-driven platforms. You will architect solutions for transforming complex, high-volume data into structured outputs optimized for downstream AI/ML pipelines, while ensuring compliance with stringent federal security and regulatory standards. Collaborating with engineers, architects, and mission stakeholders, you will deliver innovative backend capabilities that enable accurate, efficient, and reliable decision-making in support of national security objectives.**What You Will Do:*** Serves as the lead backend integration engineer responsible for architecting and implementing ingestion, preprocessing, normalization, and transformation workflows for the FBI adjudication AI platform.* Designs ingestion frameworks supporting SF-86 forms, investigative attachments, summaries, financial/criminal records, and continuous vetting alerts using both traditional OCR and VLM/LLM-based document understanding.* Ensures ingestion workflows comply with FedRAMP High, RMF, CJIS, and FBI ATO requirements, including logging, auditability, encryption, and secure processing of PII and sensitive investigative information.* Collaborates with AI/ML engineers, backend API developers, cloud engineers, and security engineers to ensure ingestion outputs are optimized for RAG workflows, SEAD-4 scoring, anomaly detection, and adjudicator review.* Data Ingestion, Parsing & ETL Architecture* Design ingestion pipelines supporting LLMs and VLMs for OCR, document understanding, multimodal extraction, and parsing of complex investigative materials including forms, tables, handwritten elements, and embedded imagery.* Build scalable ingestion and ETL workflows capable of processing hundreds of pages per case using OCR engines (Textract, Tesseract) and VLM-based parsing models such as LayoutLM, Qwen-VL, Donut, or LLaVA.* Implement normalization and transformation workflows including deduplication, schema harmonization, field mapping, classification labeling, chunking, segmentation, and tokenization optimized for downstream LLM/RAG operations.* Develop fault-tolerant ingestion systems with checkpointing, idempotency, retry frameworks, ingestion-state tracking, and structured error reporting.* Backend Integration & System Connectivity* Build secure, compliant integrations with FBI systems, case repositories, identity/HR systems, and continuous vetting alert sources using APIs, ETL endpoints, SFTP, and message queues.* Develop backend microservices that assemble case packages, correlate evidence across disparate sources, and produce structured adjudication-ready datasets.* Integrate ingestion outputs with vector databases, embedding pipelines, and LLM inference services, ensuring data is structured, enriched, and optimized for reasoning workflows.* Ensure all integrations enforce strict authentication, authorization, validation, and data-handling policies.* RAG / LLM Data Preparation* Create ingestion workflows that prepare documents and extracted content for embeddings, retrieval indexing, semantic search, and long-context reasoning.* Implement chunking, segmentation, labeling, and evidence-tagging strategies designed to maximize retrieval precision and reduce hallucination risk in LLM inference.* Develop heuristics for filtering, prioritizing, and contextualizing extracted information to enable fact-grounded SEAD-4 scoring and memo generation.* Support preparation of vector representations, metadata fields, and retrieval keys for large-scale evidence collections.* Security, Compliance & Logging* Implement secure ingestion pipelines aligned with FedRAMP High, RMF, CJIS, and FBI security requirements including encryption, access control, PII-handling rules, and secure logging.* Apply advanced PII-safe processing techniques including automated redaction, VLM-aided sensitive field detection, classification tagging, and compliance-driven filtering.* Ensure ingestion systems generate detailed logs, lineage metadata, provenance trails, and audit events supporting adjudication oversight and accreditation documentation.* Collaborate with Security Engineers to ensure ingestion controls map to SSP requirements and POA&M items are remediated promptly.* Performance Optimization & Reliability* Optimize ingestion pipelines for parallelization, concurrency, batching, memory efficiency, and large-scale document processing throughput.* Implement distributed ETL frameworks such as Step Functions, Airflow, Dagster, Glue, or Spark depending on workload and security constraints.* Develop monitoring dashboards capturing ingestion throughput, VLM/LLM OCR accuracy metrics, error frequencies, latency patterns, and retry trends.* Implement resilience features including dead-letter queues, backoff retry mechanisms, fault isolation, and disaster-recovery patterns.* Collaboration, Leadership & Mission Enablement* Align ingestion outputs directly with AI/ML engineer requirements for long-context LLM inference, retrieval indexing, and SEAD-4 scoring workflows.* Work with backend API developers to ensure ingestion flows integrate seamlessly with scoring engines, entity explorers, memo builders, and anomaly detection pipelines.* Participate in sprint ceremonies, architecture reviews, backlog refinement, and cross-functional coordination with mission stakeholders.* Mentor mid-level engineers in ETL design, multimodal OCR techniques, distributed system patterns, and secure ingestion best practices.**What You Will Need:*** An ACTIVE and MAINTAINED "TOP SECRET" Federal or DoD security clearance* Requires a University Degree and minimum 4-6 years of prior relevant experience; (Relevant experience may be substituted for formal education or advanced degree)* 5 years of backend/integration engineering experience, including 3 years in large-scale ETL or ingestion workflows.* Deep experience with Python, Java, or Scala; ingestion frameworks such as Airflow, Step Functions, Dagster, or Glue.* Experience with ETL pipelines, large-scale document ingestion, OCR/VLM document understanding, unstructured data parsing.* Experience developing secure data processing/normalization workflows.* Experience with distributed processing frameworks.**What Would Be Nice To Have:*** An ACTIVE and MAINTAINED "TOP SECRET" Federal or DoD security clearance.
• Once onboard with Guidehouse, new hire MUST be able to OBTAIN and MAINTAIN a Federal
or DoD "TOP SECRET/SCI (TS/SCI)" security clearance.* 8+ years of backend/integration engineering experience, including 4+ years in large-scale ETL or ingestion workflows.* Experience integrating FBI, DCSA, or NBIB systems or adjudication-related data sources.* Experience designing ingestion workflows for RAG, embeddings, vector databases, or long-context LLM pipelines.* Experience training or applying VLMs such as LayoutLM, Donut, Qwen-VL, or LLaVA for OCR replacement or enhancement.* Experience with knowledge graphs, entity resolution, evidence-linking workflow development.* Familiarity with SEAD-4, continuous vetting, or investigative case analysis processes.* Airflow vs. Dagster vs. Step Functions.* Textract, Tesseract, LayoutLM, Donut, Qwen-VL, LLaVA.* Specific AWS ingestion tools (Glue, Batch, S3 eventing).**What We Offer:**Guidehouse offers a comprehensive, total rewards package that includes competitive compensation and a flexible benefits package that reflects our commitment to creating a diverse and supportive workplace.Benefits include:* Medical, Rx, Dental & Vision Insurance* Personal and Family Sick Time & Company Paid #J-18808-Ljbffr
• Once onboard with Guidehouse, new hire MUST be able to OBTAIN and MAINTAIN a Federal
or DoD "TOP SECRET/SCI (TS/SCI)" security clearance.* 8+ years of backend/integration engineering experience, including 4+ years in large-scale ETL or ingestion workflows.* Experience integrating FBI, DCSA, or NBIB systems or adjudication-related data sources.* Experience designing ingestion workflows for RAG, embeddings, vector databases, or long-context LLM pipelines.* Experience training or applying VLMs such as LayoutLM, Donut, Qwen-VL, or LLaVA for OCR replacement or enhancement.* Experience with knowledge graphs, entity resolution, evidence-linking workflow development.* Familiarity with SEAD-4, continuous vetting, or investigative case analysis processes.* Airflow vs. Dagster vs. Step Functions.* Textract, Tesseract, LayoutLM, Donut, Qwen-VL, LLaVA.* Specific AWS ingestion tools (Glue, Batch, S3 eventing).**What We Offer:**Guidehouse offers a comprehensive, total rewards package that includes competitive compensation and a flexible benefits package that reflects our commitment to creating a diverse and supportive workplace.Benefits include:* Medical, Rx, Dental & Vision Insurance* Personal and Family Sick Time & Company Paid #J-18808-Ljbffr