Role: GPU Software Engineer/GPU Architect
Location: San Jose, CA – Onsite
Duration: 6+ Months Contract (potential conversion)
Pay Rate: $90 t0 $100
Overview: We’re looking for a strong GPU Software Engineer/GPU Architect to join a high‑impact engineering team working on next‑generation AI, GPU, and semiconductor technologies. This role focuses on GPU kernel development, memory architecture, and integration with modern inference systems such as vLLM and SGLang. You’ll work onsite in San Jose, collaborating closely with a team of engineers building high‑performance GPU‑accelerated systems.
- Develop and optimize CUDA/ROCm kernels for AI workloads
- Work with HBM, memory hierarchy, thread scheduling, and P2P communication
- Integrate GPU kernels with vLLM, SGLang, and other inference servers
- Build high‑performance components in C++ and Python
- Support AI frameworks such as PyTorch and TensorFlow
- Optimize multi‑GPU scaling, KV‑cache, and attention kernels
- Profile and debug GPU workloads using Nsight, rocprof, etc.
- Collaborate with cross‑functional GPU, AI, and semiconductor teams
Required Skills:
- Strong experience with CUDA, ROCm/HIP, OpenCL, or MPI
- Deep understanding of GPU architecture, HBM, memory models, and thread hierarchies
- Hands‑on experience with AMD/NVIDIA GPU software stacks
- Expert‑level C++ and Python
- Experience with PyTorch or TensorFlow
- Experience with vLLM, SGLang, or similar inference systems
Preferred Skills:
- RDMA, RoCE, InfiniBand, or Infinity Fabric
- Distributed inference/training or HPC experience
- Semiconductor or hardware‑adjacent experience


