Allen Institute for AI
Senior Software Engineer, Data
The Allen Institute for AI (Ai2) is hiring a Data Engineer to help integrate a large U.S. patent corpus into the Semantic Scholar platform. This NSF-funded role focuses on high-impact data engineering: linking patent and academic research data, resolving citations, disambiguating inventors and authors, applying topic models, and extending data products and APIs. You'll work in a high-performing engineering environment and own full-stack data tasks including building pipelines, integrating or training practical ML models, and deploying production services. This is not a research role, but you should be confident implementing ML-driven solutions when off-the-shelf tools don't cut it. This is a fixed term position scheduled for 2 years with the possibility of renewal. The Semantic Scholar team builds open, production-grade systems that power scientific discovery and large-scale AI research. We focus on creating high-quality structured datasets, integrating diverse content types, and enabling downstream applications across search, citation analysis, and model training. The team combines strong engineering practices with close collaboration across Ai2's product and research orgs to deliver tools and infrastructure used by millions of researchers and developers worldwide. Your next challenge includes: Build scalable data pipelines (Airflow) for citation resolution and corpus integration Develop and deploy lightweight ML models for inventor disambiguation and author linking Train or adapt a topic model to classify patents using titles, abstracts, claims, and specs Extend REST APIs to expose linked metadata and topic classifications Contribute to dashboards and tools for evaluating data quality and model precision Collaborate with Ai2 engineers to ensure maintainability, test coverage, and robust deployment Produce reliable, well-documented code and contribute technical designs that support long-term maintainability What you'll need: Required: Bachelor's degree and 8+ years of technical experience; relevant experience may substitute for education. Strong Python engineering skills, especially for building and maintaining data pipelines Experience with SQL and schema design in production settings (PostgreSQL preferred) Familiarity with common ML workflows (training classifiers, tuning models, and deploying for inference), particularly for large-scale or ambiguous structured datasets Comfortable working with structured datasets (XML/JSON/Parquet) and writing ETL code Experience with workflow orchestration tools (Airflow or similar) and cloud infrastructure (e.g. AWS, S3, Docker) Strong communicator and a strong sense of ownership for results Preferred: Experience with author disambiguation, entity resolution, or record linkage problems Experience applying vector-based similarity or topic modeling techniques to real-world corpora at scale Exposure to citation networks or scholarly data systems (e.g., arXiv, OpenAlex, USPTO) Comfort building internal APIs and dashboards to support ML and data quality review Physical demands and work environment: Must be able to remain in a stationary position for long periods of time. The ability to communicate information and ideas so others will understand. Must be able to exchange accurate information in these situations. The ability to observe details at close range. Can work under deadlines. A little more about Ai2: We are a learning organization because everything Ai2 does is ground-breaking, we are learning every day. Similarly, through weekly Ai2 Academy lectures, a wide variety of world-class AI experts as guest speakers, and our commitment to your personal on-going education, Ai2 is a place where you will have opportunities to continue learning alongside your coworkers. We value diversity We seek to hire, support, and promote people from all genders, ethnicities, and all levels of experience regardless of age. We particularly encourage applications from women, non-binary individuals, people of color, members of the LGBTQA+ community, and people with disabilities of any kind. We value inclusion We understand the value that people's individual experiences and perspectives can bring to an organization, and we are building a culture in which all voices are heard, respected and considered. We emphasize a healthy work/life balance we believe our team members are happiest and most productive when their work/life balance is optimized. While we value powerful research results which drive our mission forward, we also value dinner with family, weekend time, and vacation time. We offer generous paid vacation and sick leave as well as family leave. We are collaborative and transparent we consider ourselves a team, all moving with a common purpose. We are quick to cheer our successes, and even quicker to share and jointly problem solve our failures. We are in Seattle and our office is on the water! We have mountains, we have lakes, we have four seasons, we bike to work, we have a vibrant theater scene, and we have so much else. We even have kayaks for you to paddle right outside our front door. We welcome interest from applicants from outside of the United States. We are friendly chances are you will like every one of the 200+ (and growing) people who work here. We do. Ai2 is proud to be an Equal Opportunity employer. We do not discriminate based upon race, religion, color, national origin, sex (including pregnancy, childbirth, or related medical conditions), sexual orientation, gender, gender identity, gender expression, transgender status, sexual stereotypes, age, status as a protected veteran, status as an individual with a disability, or other applicable legally protected characteristics. This employer participates in E-Verify and will provide the federal government with your Form I-9 information to confirm that you are authorized to work in the U.S. If E-Verify cannot confirm that you are authorized to work, this employer is required to give you written instructions and an opportunity to contact the Department of Homeland Security (DHS) or Social Security Administration (SSA) so you can begin to resolve the issue before the employer can take any action against you, including terminating your employment. Employers can only use E-Verify once you have accepted a job offer and completed the Form I-9. We are committed to providing reasonable accommodations to employees and applicants with disabilities to the full extent required by the Americans with Disabilities Act (ADA). If you feel you need a reasonable accommodation pursuant to the ADA, you are encouraged to contact us at recruiting@allenai.org. Benefits: Team members and their families are covered by medical, dental, vision, and an employee assistance program. Team members are able to enroll in our health savings account plan, our healthcare reimbursement arrangement plan, and our health care and dependent care flexible spending account plans. Team members are able to enroll in our company's 401k plan. Team members will receive $125 per month to assist with commuting or internet expenses and will also receive $200 per month for fitness and wellbeing expenses. Team members will also receive up to ten sick days per year, up to seven personal days per year, up to 20 vacation days per year and twelve paid holidays throughout the calendar year. Team members will be able to receive annual bonuses.
The Allen Institute for AI (Ai2) is hiring a Data Engineer to help integrate a large U.S. patent corpus into the Semantic Scholar platform. This NSF-funded role focuses on high-impact data engineering: linking patent and academic research data, resolving citations, disambiguating inventors and authors, applying topic models, and extending data products and APIs. You'll work in a high-performing engineering environment and own full-stack data tasks including building pipelines, integrating or training practical ML models, and deploying production services. This is not a research role, but you should be confident implementing ML-driven solutions when off-the-shelf tools don't cut it. This is a fixed term position scheduled for 2 years with the possibility of renewal. The Semantic Scholar team builds open, production-grade systems that power scientific discovery and large-scale AI research. We focus on creating high-quality structured datasets, integrating diverse content types, and enabling downstream applications across search, citation analysis, and model training. The team combines strong engineering practices with close collaboration across Ai2's product and research orgs to deliver tools and infrastructure used by millions of researchers and developers worldwide. Your next challenge includes: Build scalable data pipelines (Airflow) for citation resolution and corpus integration Develop and deploy lightweight ML models for inventor disambiguation and author linking Train or adapt a topic model to classify patents using titles, abstracts, claims, and specs Extend REST APIs to expose linked metadata and topic classifications Contribute to dashboards and tools for evaluating data quality and model precision Collaborate with Ai2 engineers to ensure maintainability, test coverage, and robust deployment Produce reliable, well-documented code and contribute technical designs that support long-term maintainability What you'll need: Required: Bachelor's degree and 8+ years of technical experience; relevant experience may substitute for education. Strong Python engineering skills, especially for building and maintaining data pipelines Experience with SQL and schema design in production settings (PostgreSQL preferred) Familiarity with common ML workflows (training classifiers, tuning models, and deploying for inference), particularly for large-scale or ambiguous structured datasets Comfortable working with structured datasets (XML/JSON/Parquet) and writing ETL code Experience with workflow orchestration tools (Airflow or similar) and cloud infrastructure (e.g. AWS, S3, Docker) Strong communicator and a strong sense of ownership for results Preferred: Experience with author disambiguation, entity resolution, or record linkage problems Experience applying vector-based similarity or topic modeling techniques to real-world corpora at scale Exposure to citation networks or scholarly data systems (e.g., arXiv, OpenAlex, USPTO) Comfort building internal APIs and dashboards to support ML and data quality review Physical demands and work environment: Must be able to remain in a stationary position for long periods of time. The ability to communicate information and ideas so others will understand. Must be able to exchange accurate information in these situations. The ability to observe details at close range. Can work under deadlines. A little more about Ai2: We are a learning organization because everything Ai2 does is ground-breaking, we are learning every day. Similarly, through weekly Ai2 Academy lectures, a wide variety of world-class AI experts as guest speakers, and our commitment to your personal on-going education, Ai2 is a place where you will have opportunities to continue learning alongside your coworkers. We value diversity We seek to hire, support, and promote people from all genders, ethnicities, and all levels of experience regardless of age. We particularly encourage applications from women, non-binary individuals, people of color, members of the LGBTQA+ community, and people with disabilities of any kind. We value inclusion We understand the value that people's individual experiences and perspectives can bring to an organization, and we are building a culture in which all voices are heard, respected and considered. We emphasize a healthy work/life balance we believe our team members are happiest and most productive when their work/life balance is optimized. While we value powerful research results which drive our mission forward, we also value dinner with family, weekend time, and vacation time. We offer generous paid vacation and sick leave as well as family leave. We are collaborative and transparent we consider ourselves a team, all moving with a common purpose. We are quick to cheer our successes, and even quicker to share and jointly problem solve our failures. We are in Seattle and our office is on the water! We have mountains, we have lakes, we have four seasons, we bike to work, we have a vibrant theater scene, and we have so much else. We even have kayaks for you to paddle right outside our front door. We welcome interest from applicants from outside of the United States. We are friendly chances are you will like every one of the 200+ (and growing) people who work here. We do. Ai2 is proud to be an Equal Opportunity employer. We do not discriminate based upon race, religion, color, national origin, sex (including pregnancy, childbirth, or related medical conditions), sexual orientation, gender, gender identity, gender expression, transgender status, sexual stereotypes, age, status as a protected veteran, status as an individual with a disability, or other applicable legally protected characteristics. This employer participates in E-Verify and will provide the federal government with your Form I-9 information to confirm that you are authorized to work in the U.S. If E-Verify cannot confirm that you are authorized to work, this employer is required to give you written instructions and an opportunity to contact the Department of Homeland Security (DHS) or Social Security Administration (SSA) so you can begin to resolve the issue before the employer can take any action against you, including terminating your employment. Employers can only use E-Verify once you have accepted a job offer and completed the Form I-9. We are committed to providing reasonable accommodations to employees and applicants with disabilities to the full extent required by the Americans with Disabilities Act (ADA). If you feel you need a reasonable accommodation pursuant to the ADA, you are encouraged to contact us at recruiting@allenai.org. Benefits: Team members and their families are covered by medical, dental, vision, and an employee assistance program. Team members are able to enroll in our health savings account plan, our healthcare reimbursement arrangement plan, and our health care and dependent care flexible spending account plans. Team members are able to enroll in our company's 401k plan. Team members will receive $125 per month to assist with commuting or internet expenses and will also receive $200 per month for fitness and wellbeing expenses. Team members will also receive up to ten sick days per year, up to seven personal days per year, up to 20 vacation days per year and twelve paid holidays throughout the calendar year. Team members will be able to receive annual bonuses.