GPU Software Engineer

Job Type: Contract
Work Flexibility: On-site
Location: San Jose CA
Required Skills: AMD GPU CUDA CXL MI210 ROCm SGLang

Role: GPU Software Engineer
Location: San Jose, CA – Onsite
Duration: 12+ Months/Contract-to-Hire
Pay Rate: $90 t0 $100

Overview:

We are seeking an experienced GPU Software Engineer for a 12-month milestone-based engagement supporting a cutting-edge GPU software integration project. The consultant will work on AMD GPU platforms, drive AI stack development, contribute to open-source projects, and deliver performance benchmarking and integration reports across a structured set of monthly deliverables. This is a highly technical, hands-on role requiring deep expertise in GPU software stacks, ROCm, AI frameworks, and systems-level integration.

 

Manager’s Note: Client confirmed flexibility around AMD-specific experience and is open to strong GPU software engineers from the NVIDIA/CUDA ecosystem, provided they possess solid GPU architecture fundamentals and can ramp up on ROCm. ROCm ecosystem exposure is considered a key factor for success, while MI210 deliverables will have onboarding support. Open-source contributions to SGLang will be coordinated through Samsung channels, with additional clarification pending on ramp-up timing before milestone tracking begins.

 

Position Details:

  • Project Title: GPU SW Integration for Samsung Cognos
  • Engagement Type: Contract / Milestone-Based (12 Months)
  • Client Environment: AMD MI210 GPU, CXL Memory, NVMe Gen6, ROCm Stack
  • Delivery Tools: Confluence, Jira, GitHub/GitLab (client-provided)

 

Key Responsibilities:

  • Design and develop GPU software modules aligned with project milestones.
  • Perform systems integration and end-to-end testing of AI stack SW modules.
  • Validate AMD Infinity Bridge and AIS on MI210 GPU hardware.
  • Conduct functional and performance benchmarking (pSLC Firmware, CXL, ROCm).
  • Implement and validate SGLang changes for L3 to L1 memory transfer optimization.
  • Develop and contribute CaMa module changes to the ROCm software stack.
  • Collaborate with the SGLang open-source community and contribute code to their public GitHub repo.
  • Develop CaMa module for ROCm over Infinity Fabric/Ethernet.
  • Perform E2E performance benchmarking and publish formal benchmarking reports.
  • Integrate CaMa changes into the Cognos AI stack and publish integration documentation.
  • Scope UALink support for CaMa and publish an investigation/feasibility document.
  • Maintain all documentation, code, and status updates in Confluence, Jira, and GitHub/GitLab.

 

Required Skills and Qualifications:

GPU Software and Hardware

  • Hands-on experience with AMD GPU platforms, specifically MI210.
  • Proficiency with AMD ROCm software stack including kernel libraries and drivers.
  • Experience with AMD Infinity Bridge / Infinity Fabric architecture.
  • Familiarity with CXL (Compute Express Link) memory integration.
  • Experience with NVMe storage and GPU Direct Storage (GDS).

AI Frameworks and Software Stack

  • Experience with SGLang or similar LLM inference frameworks.
  • Familiarity with AI stack installation and end-to-end workload benchmarking.
  • Knowledge of GPU memory hierarchy (HBM, L1/L3 cache) and data transfer optimization.
  • Proficiency in GPU kernel programming and library management (e.g., GDS, CaMa).

Programming and Tools

  • Strong proficiency in C/C++ and Python for GPU/systems-level development.
  • Experience with open-source contribution workflows (GitHub, pull requests, code reviews).
  • Familiarity with Jira and Confluence for project management and documentation.
  • Experience with pSLC firmware validation and performance benchmarking methodologies.

Soft Skills

  • Ability to work independently and deliver against defined monthly milestones.
  • Strong written communication skills for publishing technical reports and documentation.
  • Collaborative mindset; ability to work with third-party teams (AMD, SGLang community).

 

Preferred Qualifications:

  • Prior experience with Samsung Cognos AI stack or similar enterprise AI platforms.
  • Familiarity with UALink protocol and its GPU interconnect applications.
  • Prior open-source contributions to ROCm, SGLang, or similar GPU frameworks.
  • Experience presenting benchmarking results to semiconductor partners (AMD, NVIDIA, etc.).

Apply for this position

Allowed Type(s): .pdf, .doc, .docx