Join to apply for the System Test Software Engineer role at Etched
Continue with Google Continue with Google
Join to apply for the System Test Software Engineer role at Etched
About Etched
Etched is building AI chips that are hard-coded for individual model architectures. Our first product (Sohu) only supports transformers, but has an order of magnitude more throughput and lower latency than a B200. With Etched ASICs, you can build products that would be impossible with GPUs, like real-time video generation models and extremely deep & parallel chain-of-thought reasoning agents.
About Etched
Etched is building AI chips that are hard-coded for individual model architectures. Our first product (Sohu) only supports transformers, but has an order of magnitude more throughput and lower latency than a B200. With Etched ASICs, you can build products that would be impossible with GPUs, like real-time video generation models and extremely deep & parallel chain-of-thought reasoning agents.
Job Summary
We are seeking highly motivated and detail-oriented Software Engineers to join our Burn-in Testing team. This team plays a critical role in ensuring the reliability and stability of our highest-performance Inference server hardware and software. As a Software Engineer on this team, you will design, develop, and execute comprehensive burn-in test suites, analyze test results, and collaborate with hardware and software engineering teams at Etched and our ODM partners to identify and resolve potential issues. You will be at the forefront of ensuring our server products meet the highest quality standards before they reach our customers.
Key Responsibilities
- Test Development: Design, develop, and implement automated burn-in test suites using common scripting languages (Python, Go, Bash) and test frameworks across all aspects of System Operation including: boot sequences, root-of-trust, system management, workload deployment and performance.
- Test Execution: Execute burn-in tests on server hardware, monitor system performance and health, and analyze test results.
- Failure Analysis: Investigate and debug hardware and software failures identified during testing, providing detailed reports and mitigation plans.
- Collaboration: Collaborate with internal and external hardware and software engineering teams to identify root causes of failures and implement corrective actions.
- Test Infrastructure: Contribute to the development and maintenance of the burn-in testing infrastructure, including portable test environments and automation tools runable in any environment.
- Documentation: Create and maintain comprehensive documentation for test plans, test cases, and test results.
- Performance Analysis: Analyze system performance metrics to identify potential bottlenecks and areas for optimization.
- Continuous Improvement: Participate in continuous improvement efforts to enhance the efficiency and effectiveness of the burn-in testing process.
- Develop automated test suites to stress-testing of CPUs, memory, storage, and network subsystems under extreme workloads.
- Design and implement fault injection tests to simulate hardware and software failures.
- Create tools to monitor and analyze system performance metrics, such as CPU utilization, cross-socket memory performance and usage, and network latency.
- Build and maintain a scalable burn-in testing environment capable of handling multiple server configurations.
- Collaborate with hardware engineers to develop tests for new server features and components.
- Contribute to the creation of dashboards that show the current state of burn in testing across the server farm.
- Proficiency in at least one scripting language (e.g., Python, Bash, Go).
- Experience with software testing methodologies and tools.
- Strong understanding of operating systems (Linux preferred) and server hardware architectures.
- Ability to analyze complex technical problems and provide effective solutions.
- Excellent communication and collaboration skills.
- Ability to work independently and as part of a team.
- Experience with version control systems (e.g., Git).
- Experience with reading and interpreting hardware logs.
- Experience with hardware burn-in testing or reliability testing.
- Knowledge of server virtualization and cloud computing concepts.
- Experience with performance testing and benchmarking tools.
- Familiarity with hardware diagnostic tools and techniques.
- Experience with containerization technologies (e.g., Docker, Kubernetes).
- Experience with CI/CD pipelines.
- Knowledge of low level hardware communication protocols (i2c, etc.)
- Experience with data analysis tools and techniques.
- Candidates with experience in server hardware or software development, testing, or support.
- Individuals with a strong interest in hardware and software reliability.
- Professionals with a background in system administration or performance engineering.
- Individuals who enjoy working in a fast-paced and challenging environment.
- Those who have worked in a datacenter environment.
- Experience in the telecommunications or high performance computing fields.
- Full medical, dental, and vision packages, with generous premium coverage
- Housing subsidy of $2,000/month for those living within walking distance of the office
- Daily lunch and dinner in our office
- Relocation support for those moving to West San Jose
- $150,000 - $275,000
Etched believes in the Bitter Lesson. We think most of the progress in the AI field has come from using more FLOPs to train and run models, and the best way to get more FLOPs is to build model-specific hardware. Larger and larger training runs encourage companies to consolidate around fewer model architectures, which creates a market for single-model ASICs.
We are a fully in-person team in West San Jose, and greatly value engineering skills. We do not have boundaries between engineering and research, and we expect all of our technical staff to contribute to both as needed.
Seniority level
Seniority level
Entry level
Employment type
Employment type
Full-time
Job function
Job function
Quality AssuranceIndustries
Computer Hardware Manufacturing
Referrals increase your chances of interviewing at Etched by 2x
Sign in to set job alerts for “System Test Engineer” roles.
Continue with Google Continue with Google
Continue with Google Continue with Google
San Jose, CA $76,874 - $103,780 6 hours ago
Palo Alto, CA $140,000 - $170,000 13 hours ago
Santa Clara, CA $106,600 - $165,300 1 day ago
Software Test Engineer, Pixel Cross-Device Experiences
Mountain View, CA
$102,000.00
-
$146,000.00
1 week ago
Software Development Engineer in Test -Front End
San Jose, CA
$113,400.00
-
$206,300.00
3 days ago
Mountain View, CA
$64.00
-
$75.00
1 week ago
Campbell, CA
$178,000.00
-
$190,000.00
2 days ago
Santa Clara, CA
$106,600.00
-
$165,300.00
2 days ago
Test Infrastructure Development Software Engineer
Sunnyvale, CA
$141,000.00
-
$202,000.00
2 days ago
Software Engineer - Test Automation & QA
Palo Alto, CA
$150,000.00
-
$250,000.00
2 weeks ago
San Jose, CA
$130,000.00
-
$180,000.00
2 weeks ago
Mountain View, CA
$135,000.00
-
$168,000.00
2 months ago
Palo Alto, CA $117,200 - $146,500 2 weeks ago
San Jose, CA $100,500 - $173,250 2 months ago
Menlo Park, CA $169,000 - $236,000 3 weeks ago
We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.
#J-18808-Ljbffr