Upscaleai
Security Software Architect, Principal Engineer or Sr. Principal Engineer
Upscaleai, Santa Clara, California, us, 95053
We’re looking for a highly skilled and experienced Security expert to join our team at Upscale AI, focusing on the security features for AI networks. This role is critical for protecting multi-tenant inferencing workloads on AI networks from a wide range of adversaries, including those with physical access. You will be instrumental in defining and implementing security features for next-generation AI and machine learning applications, particularly in the domain of Confidential Computing.
Job Responsibilities
Security Architecture & Design:
Architect and define security features for the AI network link protection, with a specific focus on meeting confidentiality and integrity objectives for data transmitted between accelerators and AI networking switches. Threat Modeling & Mitigation:
Develop a comprehensive security model and threat analysis, including identifying and mitigating potential attacks such as transaction snooping, corruption, replay, deletion, injection, and endpoint spoofing. Cryptographic Implementation:
Define the implementation of cryptographic schemes, such as AES-GCM-256, for data encryption and optional integrity protection. This includes specifying key sizes, tag sizes, and the Initialization Vector (IV) format. Key Management:
Specify and design secure key management flows, including master key programming, stream-key derivation using a NIST-approved Key Derivation Function (KDF) like KMAC256, and master key swap procedures. Hardware-Software Co-design:
Collaborate closely with hardware and firmware teams to ensure the proper implementation of security features at various layers of the AI networking stack, including the functional/protocol layer, transaction layer (TL), and data link (DL) layer. Secure Configuration:
Define requirements for safeguarding switch configuration settings and registers to prevent malicious modification, ensuring that sensitive parameters like master keys are protected and that other configurations can be securely locked by trusted software/firmware. Confidential Computing (CC) Enablement:
Work within a Confidential Computing framework, ensuring that tenant data is protected from the infrastructure provider and other tenants. You will be responsible for defining the security workflow for setting up virtual pods and programming keys within a Trusted Computing Base (TCB) involving a TVM (Trusted Execution Environment (TEE) VM) and TSM (TEE Security Manager). Error Handling:
Define mechanisms for handling security failures, such as integrity check failures, including dropping malicious traffic and securely reporting errors to the accelerator’s security processor. Cross-functional Collaboration:
Partner with standards organizations such as UA Link, UEC, with external vendors providing security hardware IP, and with internal teams responsible for switch manageability, to ensure the security requirements are integrated seamlessly into the overall AI switch system framework. Requirements
Deep expertise in security protocols and cryptography, with specific knowledge of
Confidential computing in a multi-tenant environment. Experience in designing and implementing security features for hardware-based communication protocols. Strong understanding of TEEs (Trusted Execution Environments) such as Intel TDX, SGX, AMD SEV, and ARM CCA, and their role in Confidential Computing. Knowledge of industry standards like
SPDM, TDISP,
and
NIST
recommendations for cryptography. Experience with hardware design, including understanding of data paths, control flows, and on-chip interfaces like UALink Protocol Level Interface (UPLI) and Transaction Layer (TL). Familiarity with the UALink stack and its components, including the role of accelerators, switches, and the interconnect layers. Ability to perform detailed threat modeling and define robust mitigation strategies. Excellent communication skills, with the ability to articulate complex security concepts to diverse technical and non-technical audiences including customers and partners. Preferred
10+ years of industry experience in security software design and development Networking switch hardware experience Understanding of OCP (Open Compute Platform) network spec. Network Operating System experience – e.g., SONiC, EoS, NX-OS, JunOS Participation in security related standards, conferences, print media
#J-18808-Ljbffr
Security Architecture & Design:
Architect and define security features for the AI network link protection, with a specific focus on meeting confidentiality and integrity objectives for data transmitted between accelerators and AI networking switches. Threat Modeling & Mitigation:
Develop a comprehensive security model and threat analysis, including identifying and mitigating potential attacks such as transaction snooping, corruption, replay, deletion, injection, and endpoint spoofing. Cryptographic Implementation:
Define the implementation of cryptographic schemes, such as AES-GCM-256, for data encryption and optional integrity protection. This includes specifying key sizes, tag sizes, and the Initialization Vector (IV) format. Key Management:
Specify and design secure key management flows, including master key programming, stream-key derivation using a NIST-approved Key Derivation Function (KDF) like KMAC256, and master key swap procedures. Hardware-Software Co-design:
Collaborate closely with hardware and firmware teams to ensure the proper implementation of security features at various layers of the AI networking stack, including the functional/protocol layer, transaction layer (TL), and data link (DL) layer. Secure Configuration:
Define requirements for safeguarding switch configuration settings and registers to prevent malicious modification, ensuring that sensitive parameters like master keys are protected and that other configurations can be securely locked by trusted software/firmware. Confidential Computing (CC) Enablement:
Work within a Confidential Computing framework, ensuring that tenant data is protected from the infrastructure provider and other tenants. You will be responsible for defining the security workflow for setting up virtual pods and programming keys within a Trusted Computing Base (TCB) involving a TVM (Trusted Execution Environment (TEE) VM) and TSM (TEE Security Manager). Error Handling:
Define mechanisms for handling security failures, such as integrity check failures, including dropping malicious traffic and securely reporting errors to the accelerator’s security processor. Cross-functional Collaboration:
Partner with standards organizations such as UA Link, UEC, with external vendors providing security hardware IP, and with internal teams responsible for switch manageability, to ensure the security requirements are integrated seamlessly into the overall AI switch system framework. Requirements
Deep expertise in security protocols and cryptography, with specific knowledge of
Confidential computing in a multi-tenant environment. Experience in designing and implementing security features for hardware-based communication protocols. Strong understanding of TEEs (Trusted Execution Environments) such as Intel TDX, SGX, AMD SEV, and ARM CCA, and their role in Confidential Computing. Knowledge of industry standards like
SPDM, TDISP,
and
NIST
recommendations for cryptography. Experience with hardware design, including understanding of data paths, control flows, and on-chip interfaces like UALink Protocol Level Interface (UPLI) and Transaction Layer (TL). Familiarity with the UALink stack and its components, including the role of accelerators, switches, and the interconnect layers. Ability to perform detailed threat modeling and define robust mitigation strategies. Excellent communication skills, with the ability to articulate complex security concepts to diverse technical and non-technical audiences including customers and partners. Preferred
10+ years of industry experience in security software design and development Networking switch hardware experience Understanding of OCP (Open Compute Platform) network spec. Network Operating System experience – e.g., SONiC, EoS, NX-OS, JunOS Participation in security related standards, conferences, print media
#J-18808-Ljbffr