Role: GPU Software Engineer/GPU Architect
Location: San Jose, CA – Onsite
Duration: 6+ Months Contract (potential conversion)
Pay Rate: $90 t0 $100

Overview: We’re looking for a strong GPU Software Engineer/GPU Architect to join a high‑impact engineering team working on next‑generation AI, GPU, and semiconductor technologies. This role focuses on GPU kernel development, memory architecture, and integration with modern inference systems such as vLLM and SGLang. You’ll work onsite in San Jose, collaborating closely with a team of engineers building high‑performance GPU‑accelerated systems.

Develop and optimize CUDA/ROCm kernels for AI workloads
Work with HBM, memory hierarchy, thread scheduling, and P2P communication
Integrate GPU kernels with vLLM, SGLang, and other inference servers
Build high‑performance components in C++ and Python
Support AI frameworks such as PyTorch and TensorFlow
Optimize multi‑GPU scaling, KV‑cache, and attention kernels
Profile and debug GPU workloads using Nsight, rocprof, etc.
Collaborate with cross‑functional GPU, AI, and semiconductor teams

Required Skills:

Strong experience with CUDA, ROCm/HIP, OpenCL, or MPI
Deep understanding of GPU architecture, HBM, memory models, and thread hierarchies
Hands‑on experience with AMD/NVIDIA GPU software stacks
Expert‑level C++ and Python
Experience with PyTorch or TensorFlow
Experience with vLLM, SGLang, or similar inference systems

Preferred Skills:

RDMA, RoCE, InfiniBand, or Infinity Fabric
Distributed inference/training or HPC experience
Semiconductor or hardware‑adjacent experience

Apply for this position

Company

Services

Let’s Connect!