<aside>
<img src="/icons/city_gray.svg" alt="/icons/city_gray.svg" width="40px" /> Mako, Inc.
</aside>
<aside>
<img src="/icons/row_gray.svg" alt="/icons/row_gray.svg" width="40px" /> Table of Contents
</aside>
<aside>
<img src="/icons/flash_gray.svg" alt="/icons/flash_gray.svg" width="40px" /> Apply Now!
</aside>
Summary
Mako is hiring our first Forward Deployed Engineer!
As a FDE Lead, you are the technical lead for all customer relationships. You will act as a hands-on technical consultant and trusted partner, ensuring our customers successfully adopt, integrate, and optimize their most demanding workloads on the Mako platform. This is not a traditional support role; it requires deep, hands-on engineering capability to solve performance challenges at the inference engine & GPU kernel level.
This job is based in San Francisco. Remote work will be considered for exceptional candidates.
About Mako
Mako is a venture-backed tech startup building an intelligent GPU optimization and deployment platform with two key components.
- MakoOptimize automatically selects and swaps GPU kernels in combination with tuning inference engine (vLLM, SGlang, etc..) hyperparameters to optimize performance
- MakoGenerate writes GPU kernels in CUDA, HIP, and Triton using LLMs
Responsibilities
- Become the Performance Expert: Provide deep, hands-on technical guidance to customer engineering teams, solving complex optimization challenges related to model serving performance with our platform.
- Lead Technical Onboarding: Spearhead the end-to-end integration and deployment lifecycle for new customers, ensuring rapid Time-to-Value and maximum utilization of Mako's infrastructure.
- Debug & Optimize: Directly engage with customer codebases to diagnose and resolve inference bottlenecks, from high-level system configurations down to low-level hardware utilization.
- Drive Product Strategy: Act as the primary internal advocate for our users, translating real-world deployment pain points, performance gaps, and feature requests into actionable input for our Product and Engineering teams.
- Create Technical Assets: Develop high-impact documentation, performance benchmarks, best-practice guides, and technical deep-dives to enable our entire customer base.
Qualifications
The ideal candidate possesses a deep, demonstrable understanding of low-level AI infrastructure performance:
- GPU Kernel Optimization: Expert-level understanding of GPU Kernel optimization techniques and how they affect large model inference latency and throughput.
- Inference Engine Fluency: Hands-on experience with and a strong conceptual grasp of various inference engine optimizations (e.g., paging, dynamic batching, quantization).
- High-Priority Frameworks: Direct, production-level experience with high-performance LLM serving frameworks: CUDA, ROCM, vLLM, and SGLang.
- Proven track record in a customer-facing technical role (e.g., Solutions Architect, Technical Consultant, Sales Engineer) at a deep-tech or AI-focused company.
- Fluency in modern development practices and programming languages relevant to AI (Python, C++).
- Highly organized, with a proactive customer first mindset, and strong communication skills.
Bonus Points
- Prior founder or early employee at a scaling startup
- Existing network within AI/ML, HPC, or cloud infrastructure teams at Fortune 500 companies
- Familiarity with GPU kernel optimization (CUDA, HIP, Triton) or related inference engines (vLLM, SGlang)
Our Benefits
- Competitive salary and equity package
- Comprehensive health insurance coverage for you and your family
- Remote work option for exceptional candidates
- Generous vacation and paid time off policy
- Modern and comfortable work environment with state-of-the-art equipment and facilities
To Apply
Fill out this form