Inflection AI, Inc
Member of Technical Staff – Inference
Inflection AI, Inc, Palo Alto, California, United States, 94306
Inflection AI is a public benefit corporation leveraging our world class large language model to build the first AI platform focused on the needs of the enterprise.
Who we are:
Inflection AI was re-founded inMarch of 2024 and our leadership team has assembled a team of kind, innovative, and collaborative individuals focused on building enterprise AI solutions. We are an organization passionate about what we are building, enjoy working together and strive to hire people with diverse backgrounds and experience. Our first product, Pi, provides an empathetic and conversational chatbot. Pi is a public instance of building from our 350B+ frontier model with our sophisticated fine-tuning (10M+ examples), inference, and orchestration platform. We are now focusing on building new systems that directly support the needs of enterprise customers using this same approach. Want to work with us? Have questions? Learn more below. About the Role
As an Inference Engineer, you will own the real-time performance, scalability, and reliability of our LLM-powered systems. You’ll optimize every layer—from GPU kernels to orchestration frameworks—to deliver sub-second latency, high throughput, and enterprise-grade uptime. Your work will also enable advanced capabilities such as tool usage, agentic flows, retrieval-augmented generation (RAG), and long-term memory. This is a good role for you if you: Have direct experience deploying and optimizing large transformer models for real-time inference across multi-GPU or multi-node environments Are skilled with tools like Triton, TensorRT, TVM, ONNX Runtime, or custom CUDA kernels—and know when to use C++/Rust for critical performance gains Understand the balance between latency, throughput, accuracy, and cost, and make smart choices around quantization, speculative decoding, and caching Have developed or integrated agent-based orchestration systems, RAG pipelines, or memory architectures in production environments Automate at every layer—CI/CD for model artifacts, load testing, canary rollouts, and auto-scaling Communicate clearly with both infrastructure teams and product stakeholders
Responsibilities include: Design and optimize high-performance inference pipelines using PyTorch, vLLM, Triton, TensorRT, and FSDP/DeepSpeed Integrate agentic runtimes—tool calling, function execution, and multi-step planning—while meeting strict latency requirements Build robust retrieval-augmented generation (RAG) stacks combining vector search, caching, and real-time context packing Develop memory services to support conversation continuity and user personalization at scale Monitor, instrument, and autotune GPU performance, kernel fusion, and batching strategies across clusters of NVIDIA H100 and Intel Gaudi accelerators Partner with training, safety, and product teams to transform research into stable, production-grade systems Contribute upstream to open-source performance libraries and share insights with the community Employee Pay Disclosures
At Inflection AI, we aim to attract and retain the best employees and compensate them in a way that appropriately and fairly values their individual contributions to the company. For this role, Inflection AI estimates a starting annual base salary will fall in the range of approximately $175,000 - $350,000 depending on experience. This estimate can vary based on the factors described above, so the actual starting annual base salary may be above or below this range.
Interview Process
Apply: Please apply on Linkedin or our website for a specific role. After speaking with one of our recruiters, you’ll enter our structured interview process, which includes the following stages: Hiring Manager Conversation – An initial discussion with the hiring manager to assess fit and alignment. Technical Interview – A deep dive with an Inflection Engineer to evaluate your technical expertise. A domain-specific interview A final conversation with the hiring manager Depending on the role, we may also ask you to complete a take-home exercise or deliver a presentation. For non-technical roles , be prepared for a role-specific interview, such as a portfolio review. Decision Timeline We aim to provide feedback within one week of your final interview. Apply for this job
* indicates a required field First Name * Last Name * Email * Phone Resume/CV * Enter manually Accepted file types: pdf, doc, docx, txt, rtf Enter manually Accepted file types: pdf, doc, docx, txt, rtf LinkedIn Profile * Describe your most significant professional achievements and the impact they had on your team or organization. Tell us about a particularly challenging problem you've faced in your career. How did you approach it, and what was the outcome? What area of Artificial Intelligence are you most passionate about, and why? Do you now or in the future require work authorization to work legally in the United States? * Voluntary Self-Identification
For government reporting purposes, we ask candidates to respond to the below self-identification survey.Completion of the form is entirely voluntary. Whatever your decision, it will not be considered in the hiringprocess or thereafter. Any information that you do provide will be recorded and maintained in aconfidential file. As set forth in Inflection AI’s Equal Employment Opportunity policy,we do not discriminate on the basis of any protected group status under any applicable law. If you believe you belong to any of the categories of protected veterans listed below, please indicate by making the appropriate selection.As a government contractor subject to the Vietnam Era Veterans Readjustment Assistance Act (VEVRAA), we request this information in order to measurethe effectiveness of the outreach and positive recruitment efforts we undertake pursuant to VEVRAA. Classification of protected categoriesis as follows: A "disabled veteran" is one of the following: a veteran of the U.S. military, ground, naval or air service who is entitled to compensation (or who but for the receipt of military retired pay would be entitled to compensation) under laws administered by the Secretary of Veterans Affairs; or a person who was discharged or released from active duty because of a service-connected disability. A "recently separated veteran" means any veteran during the three-year period beginning on the date of such veteran's discharge or release from active duty in the U.S. military, ground, naval, or air service. An "active duty wartime or campaign badge veteran" means a veteran who served on active duty in the U.S. military, ground, naval or air service during a war, or in a campaign or expedition for which a campaign badge has been authorized under the laws administered by the Department of Defense. An "Armed forces service medal veteran" means a veteran who, while serving on active duty in the U.S. military, ground, naval or air service, participated in a United States military operation for which an Armed Forces service medal was awarded pursuant to Executive Order 12985. Select... Voluntary Self-Identification of Disability
Form CC-305 Page 1 of 1 OMB Control Number 1250-0005 Expires 04/30/2026 Voluntary Self-Identification of Disability Form CC-305 Page 1 of 1 OMB Control Number 1250-0005 Expires 04/30/2026 Why are you being asked to complete this form?
We are a federal contractor or subcontractor. The law requires us to provide equal employment opportunity to qualified people with disabilities. We have a goal of having at least 7% of our workers as people with disabilities. The law says we must measure our progress towards this goal. To do this, we must ask applicants and employees if they have a disability or have ever had one. People can become disabled, so we need to ask this question at least every five years. Completing this form is voluntary, and we hope that you will choose to do so. Your answer is confidential. No one who makes hiring decisions will see it. Your decision to complete the form and your answer will not harm you in any way. If you want to learn more about the law or this form, visit the U.S. Department of Labor’s Office of Federal Contract Compliance Programs (OFCCP) website at www.dol.gov/ofccp . How do you know if you have a disability?
A disability is a condition that substantially limits one or more of your “major life activities.” If you have or have ever had such a condition, you are a person with a disability.
Disabilities include, but are not limited to: Alcohol or other substance use disorder (not currently using drugs illegally) Autoimmune disorder, for example, lupus, fibromyalgia, rheumatoid arthritis, HIV/AIDS Blind or low vision Cancer (past or present) Cardiovascular or heart disease Celiac disease Cerebral palsy Deaf or serious difficulty hearing Diabetes Disfigurement, for example, disfigurement caused by burns, wounds, accidents, or congenital disorders Epilepsy or other seizure disorder Gastrointestinal disorders, for example, Crohn's Disease, irritable bowel syndrome Intellectual or developmental disability Mental health conditions, for example, depression, bipolar disorder, anxiety disorder, schizophrenia, PTSD Missing limbs or partially missing limbs Mobility impairment, benefiting from the use of a wheelchair, scooter, walker, leg brace(s) and/or other supports Nervous system condition, for example, migraine headaches, Parkinson’s disease, multiple sclerosis (MS) Neurodivergence, for example, attention-deficit/hyperactivity disorder (ADHD), autism spectrum disorder, dyslexia, dyspraxia, other learning disabilities Partial or complete paralysis (any cause) Pulmonary or respiratory conditions, for example, tuberculosis, asthma, emphysema Short stature (dwarfism) Traumatic brain injury
Disability Status Select... PUBLIC BURDEN STATEMENT: According to the Paperwork Reduction Act of 1995 no persons are required to respond to a collection of information unless such collection displays a valid OMB control number. This survey should take about 5 minutes to complete.
#J-18808-Ljbffr
Who we are:
Inflection AI was re-founded inMarch of 2024 and our leadership team has assembled a team of kind, innovative, and collaborative individuals focused on building enterprise AI solutions. We are an organization passionate about what we are building, enjoy working together and strive to hire people with diverse backgrounds and experience. Our first product, Pi, provides an empathetic and conversational chatbot. Pi is a public instance of building from our 350B+ frontier model with our sophisticated fine-tuning (10M+ examples), inference, and orchestration platform. We are now focusing on building new systems that directly support the needs of enterprise customers using this same approach. Want to work with us? Have questions? Learn more below. About the Role
As an Inference Engineer, you will own the real-time performance, scalability, and reliability of our LLM-powered systems. You’ll optimize every layer—from GPU kernels to orchestration frameworks—to deliver sub-second latency, high throughput, and enterprise-grade uptime. Your work will also enable advanced capabilities such as tool usage, agentic flows, retrieval-augmented generation (RAG), and long-term memory. This is a good role for you if you: Have direct experience deploying and optimizing large transformer models for real-time inference across multi-GPU or multi-node environments Are skilled with tools like Triton, TensorRT, TVM, ONNX Runtime, or custom CUDA kernels—and know when to use C++/Rust for critical performance gains Understand the balance between latency, throughput, accuracy, and cost, and make smart choices around quantization, speculative decoding, and caching Have developed or integrated agent-based orchestration systems, RAG pipelines, or memory architectures in production environments Automate at every layer—CI/CD for model artifacts, load testing, canary rollouts, and auto-scaling Communicate clearly with both infrastructure teams and product stakeholders
Responsibilities include: Design and optimize high-performance inference pipelines using PyTorch, vLLM, Triton, TensorRT, and FSDP/DeepSpeed Integrate agentic runtimes—tool calling, function execution, and multi-step planning—while meeting strict latency requirements Build robust retrieval-augmented generation (RAG) stacks combining vector search, caching, and real-time context packing Develop memory services to support conversation continuity and user personalization at scale Monitor, instrument, and autotune GPU performance, kernel fusion, and batching strategies across clusters of NVIDIA H100 and Intel Gaudi accelerators Partner with training, safety, and product teams to transform research into stable, production-grade systems Contribute upstream to open-source performance libraries and share insights with the community Employee Pay Disclosures
At Inflection AI, we aim to attract and retain the best employees and compensate them in a way that appropriately and fairly values their individual contributions to the company. For this role, Inflection AI estimates a starting annual base salary will fall in the range of approximately $175,000 - $350,000 depending on experience. This estimate can vary based on the factors described above, so the actual starting annual base salary may be above or below this range.
Interview Process
Apply: Please apply on Linkedin or our website for a specific role. After speaking with one of our recruiters, you’ll enter our structured interview process, which includes the following stages: Hiring Manager Conversation – An initial discussion with the hiring manager to assess fit and alignment. Technical Interview – A deep dive with an Inflection Engineer to evaluate your technical expertise. A domain-specific interview A final conversation with the hiring manager Depending on the role, we may also ask you to complete a take-home exercise or deliver a presentation. For non-technical roles , be prepared for a role-specific interview, such as a portfolio review. Decision Timeline We aim to provide feedback within one week of your final interview. Apply for this job
* indicates a required field First Name * Last Name * Email * Phone Resume/CV * Enter manually Accepted file types: pdf, doc, docx, txt, rtf Enter manually Accepted file types: pdf, doc, docx, txt, rtf LinkedIn Profile * Describe your most significant professional achievements and the impact they had on your team or organization. Tell us about a particularly challenging problem you've faced in your career. How did you approach it, and what was the outcome? What area of Artificial Intelligence are you most passionate about, and why? Do you now or in the future require work authorization to work legally in the United States? * Voluntary Self-Identification
For government reporting purposes, we ask candidates to respond to the below self-identification survey.Completion of the form is entirely voluntary. Whatever your decision, it will not be considered in the hiringprocess or thereafter. Any information that you do provide will be recorded and maintained in aconfidential file. As set forth in Inflection AI’s Equal Employment Opportunity policy,we do not discriminate on the basis of any protected group status under any applicable law. If you believe you belong to any of the categories of protected veterans listed below, please indicate by making the appropriate selection.As a government contractor subject to the Vietnam Era Veterans Readjustment Assistance Act (VEVRAA), we request this information in order to measurethe effectiveness of the outreach and positive recruitment efforts we undertake pursuant to VEVRAA. Classification of protected categoriesis as follows: A "disabled veteran" is one of the following: a veteran of the U.S. military, ground, naval or air service who is entitled to compensation (or who but for the receipt of military retired pay would be entitled to compensation) under laws administered by the Secretary of Veterans Affairs; or a person who was discharged or released from active duty because of a service-connected disability. A "recently separated veteran" means any veteran during the three-year period beginning on the date of such veteran's discharge or release from active duty in the U.S. military, ground, naval, or air service. An "active duty wartime or campaign badge veteran" means a veteran who served on active duty in the U.S. military, ground, naval or air service during a war, or in a campaign or expedition for which a campaign badge has been authorized under the laws administered by the Department of Defense. An "Armed forces service medal veteran" means a veteran who, while serving on active duty in the U.S. military, ground, naval or air service, participated in a United States military operation for which an Armed Forces service medal was awarded pursuant to Executive Order 12985. Select... Voluntary Self-Identification of Disability
Form CC-305 Page 1 of 1 OMB Control Number 1250-0005 Expires 04/30/2026 Voluntary Self-Identification of Disability Form CC-305 Page 1 of 1 OMB Control Number 1250-0005 Expires 04/30/2026 Why are you being asked to complete this form?
We are a federal contractor or subcontractor. The law requires us to provide equal employment opportunity to qualified people with disabilities. We have a goal of having at least 7% of our workers as people with disabilities. The law says we must measure our progress towards this goal. To do this, we must ask applicants and employees if they have a disability or have ever had one. People can become disabled, so we need to ask this question at least every five years. Completing this form is voluntary, and we hope that you will choose to do so. Your answer is confidential. No one who makes hiring decisions will see it. Your decision to complete the form and your answer will not harm you in any way. If you want to learn more about the law or this form, visit the U.S. Department of Labor’s Office of Federal Contract Compliance Programs (OFCCP) website at www.dol.gov/ofccp . How do you know if you have a disability?
A disability is a condition that substantially limits one or more of your “major life activities.” If you have or have ever had such a condition, you are a person with a disability.
Disabilities include, but are not limited to: Alcohol or other substance use disorder (not currently using drugs illegally) Autoimmune disorder, for example, lupus, fibromyalgia, rheumatoid arthritis, HIV/AIDS Blind or low vision Cancer (past or present) Cardiovascular or heart disease Celiac disease Cerebral palsy Deaf or serious difficulty hearing Diabetes Disfigurement, for example, disfigurement caused by burns, wounds, accidents, or congenital disorders Epilepsy or other seizure disorder Gastrointestinal disorders, for example, Crohn's Disease, irritable bowel syndrome Intellectual or developmental disability Mental health conditions, for example, depression, bipolar disorder, anxiety disorder, schizophrenia, PTSD Missing limbs or partially missing limbs Mobility impairment, benefiting from the use of a wheelchair, scooter, walker, leg brace(s) and/or other supports Nervous system condition, for example, migraine headaches, Parkinson’s disease, multiple sclerosis (MS) Neurodivergence, for example, attention-deficit/hyperactivity disorder (ADHD), autism spectrum disorder, dyslexia, dyspraxia, other learning disabilities Partial or complete paralysis (any cause) Pulmonary or respiratory conditions, for example, tuberculosis, asthma, emphysema Short stature (dwarfism) Traumatic brain injury
Disability Status Select... PUBLIC BURDEN STATEMENT: According to the Paperwork Reduction Act of 1995 no persons are required to respond to a collection of information unless such collection displays a valid OMB control number. This survey should take about 5 minutes to complete.
#J-18808-Ljbffr