Ayass BioScience LLC
LLM Engineer - Transcriptome Analysis Platform
Ayass BioScience LLC, Frisco, Texas, United States, 75034
We are seeking an innovative LLM Engineer to develop and optimize large language model systems for our cutting-edge transcriptome differential expression gene (DEG) analysis platform. This role is critical in building the reasoning foundation that will transform biological data analysis from statistical correlation to mechanistic understanding. You will work at the intersection of advanced AI and precision medicine, creating systems that can reason about complex biological relationships and generate actionable insights from petabytes of genomic data.
Key Responsibilities
Core LLM Development
Design and implement specialized LLM architectures for biological reasoning and causal inference
Fine-tune foundation models (GPT-4, Claude, Gemma, etc.) for domain-specific transcriptome analysis tasks
Develop custom prompting strategies that enable complex reasoning about gene regulatory networks
Create RAG (Retrieval-Augmented Generation) pipelines integrating scientific literature with experimental data
Implement chain-of-thought (CoT) and tree-of-thoughts (ToT) prompting for multi-step biological reasoning
Model Optimization & Scaling
Optimize LLM inference for production environments handling 20,000+ gene analyses
Implement distributed processing using Ray Serve or similar frameworks for sub-second response times
Design context compression techniques for handling large-scale genomic datasets
Develop model ensembling strategies to reduce output variability from 30% to
Create efficient token management strategies for processing lengthy biological contexts Biological Domain Integration Build knowledge graphs connecting genes, pathways, diseases, and literature findings Implement causal reasoning capabilities for identifying driver vs. passenger gene mutations Develop specialized embeddings for biological entities (genes, proteins, pathways) Create explanation generation systems that produce clinician-friendly interpretations Design validation frameworks ensuring biological accuracy of LLM outputs Quality & Reliability Implement uncertainty quantification for model predictions Develop robust evaluation metrics beyond traditional NLP measures Create testing frameworks for biological reasoning accuracy Design fallback mechanisms for handling edge cases in genomic data Build monitoring systems for production model performance Required Qualifications Technical Expertise MS/PhD in Computer Science, AI, Computational Biology, or related field 3+ years of experience with LLM development and deployment Expert proficiency in Python and ML frameworks (PyTorch, TensorFlow, Hugging Face) Proven experience with prompt engineering and fine-tuning techniques Strong understanding of transformer architectures and attention mechanisms Experience with distributed computing frameworks (Ray, Dask, or similar) Domain Knowledge Understanding of biological terminology and genomics concepts Experience with scientific text processing and literature mining Familiarity with causal inference and reasoning frameworks Knowledge of medical/clinical NLP applications is a plus Production Experience Track record of deploying LLM systems at scale Experience with model optimization techniques (quantization, pruning, distillation) Knowledge of MLOps practices and model versioning Experience with API design for AI services Preferred Qualifications Experience with biomedical language models (BioBERT, PubMedBERT, BioGPT) Knowledge of transcriptomics and differential expression analysis Familiarity with clinical regulatory requirements (FDA/EMA) Publications in NLP, computational biology, or related fields Experience with multi-modal AI systems Understanding of graph neural networks for biological applications Key Performance Metrics Achieve
Reduce LLM output variability to
Improve biological reasoning accuracy to >90% on benchmark datasets Successfully integrate 1M+ scientific papers into knowledge base Deploy production systems handling 10,000+ analyses per day What We Offer Opportunity to work on transformative AI technology with direct patient impact Collaboration with leading scientists and AI researchers Access to state-of-the-art computational resources and datasets Comprehensive benefits and equity participation Professional development and conference attendance support Remote-first culture with flexible working arrangements Integration with Team You will work closely with: Agentic AI Engineers to enable autonomous biological discovery systems Software Engineers to build scalable, production-ready platforms Bioinformaticians to ensure biological accuracy and relevance Clinical researchers to translate findings into therapeutic insights
Create efficient token management strategies for processing lengthy biological contexts Biological Domain Integration Build knowledge graphs connecting genes, pathways, diseases, and literature findings Implement causal reasoning capabilities for identifying driver vs. passenger gene mutations Develop specialized embeddings for biological entities (genes, proteins, pathways) Create explanation generation systems that produce clinician-friendly interpretations Design validation frameworks ensuring biological accuracy of LLM outputs Quality & Reliability Implement uncertainty quantification for model predictions Develop robust evaluation metrics beyond traditional NLP measures Create testing frameworks for biological reasoning accuracy Design fallback mechanisms for handling edge cases in genomic data Build monitoring systems for production model performance Required Qualifications Technical Expertise MS/PhD in Computer Science, AI, Computational Biology, or related field 3+ years of experience with LLM development and deployment Expert proficiency in Python and ML frameworks (PyTorch, TensorFlow, Hugging Face) Proven experience with prompt engineering and fine-tuning techniques Strong understanding of transformer architectures and attention mechanisms Experience with distributed computing frameworks (Ray, Dask, or similar) Domain Knowledge Understanding of biological terminology and genomics concepts Experience with scientific text processing and literature mining Familiarity with causal inference and reasoning frameworks Knowledge of medical/clinical NLP applications is a plus Production Experience Track record of deploying LLM systems at scale Experience with model optimization techniques (quantization, pruning, distillation) Knowledge of MLOps practices and model versioning Experience with API design for AI services Preferred Qualifications Experience with biomedical language models (BioBERT, PubMedBERT, BioGPT) Knowledge of transcriptomics and differential expression analysis Familiarity with clinical regulatory requirements (FDA/EMA) Publications in NLP, computational biology, or related fields Experience with multi-modal AI systems Understanding of graph neural networks for biological applications Key Performance Metrics Achieve
Reduce LLM output variability to
Improve biological reasoning accuracy to >90% on benchmark datasets Successfully integrate 1M+ scientific papers into knowledge base Deploy production systems handling 10,000+ analyses per day What We Offer Opportunity to work on transformative AI technology with direct patient impact Collaboration with leading scientists and AI researchers Access to state-of-the-art computational resources and datasets Comprehensive benefits and equity participation Professional development and conference attendance support Remote-first culture with flexible working arrangements Integration with Team You will work closely with: Agentic AI Engineers to enable autonomous biological discovery systems Software Engineers to build scalable, production-ready platforms Bioinformaticians to ensure biological accuracy and relevance Clinical researchers to translate findings into therapeutic insights