Forward Deployed Engineer

<aside> <img src="/icons/city_gray.svg" alt="/icons/city_gray.svg" width="40px" /> Mako, Inc.

</aside>

<aside> <img src="/icons/row_gray.svg" alt="/icons/row_gray.svg" width="40px" /> Table of Contents

</aside>

<aside> <img src="/icons/flash_gray.svg" alt="/icons/flash_gray.svg" width="40px" /> Apply Now!

</aside>

Summary

Mako is hiring our first Forward Deployed Engineer!

As a FDE Lead, you are the technical lead for all customer relationships. You will act as a hands-on technical consultant and trusted partner, ensuring our customers successfully adopt, integrate, and optimize their most demanding workloads on the Mako platform. This is not a traditional support role; it requires deep, hands-on engineering capability to solve performance challenges at the inference engine & GPU kernel level.

This job is based in San Francisco. Remote work will be considered for exceptional candidates.

About Mako

Mako is a venture-backed tech startup building an intelligent GPU optimization and deployment platform with two key components.

MakoOptimize automatically selects and swaps GPU kernels in combination with tuning inference engine (vLLM, SGlang, etc..) hyperparameters to optimize performance
MakoGenerate writes GPU kernels in CUDA, HIP, and Triton using LLMs

Responsibilities

Become the Performance Expert: Provide deep, hands-on technical guidance to customer engineering teams, solving complex optimization challenges related to model serving performance with our platform.
Lead Technical Onboarding: Spearhead the end-to-end integration and deployment lifecycle for new customers, ensuring rapid Time-to-Value and maximum utilization of Mako's infrastructure.
Debug & Optimize: Directly engage with customer codebases to diagnose and resolve inference bottlenecks, from high-level system configurations down to low-level hardware utilization.
Drive Product Strategy: Act as the primary internal advocate for our users, translating real-world deployment pain points, performance gaps, and feature requests into actionable input for our Product and Engineering teams.
Create Technical Assets: Develop high-impact documentation, performance benchmarks, best-practice guides, and technical deep-dives to enable our entire customer base.

Qualifications

The ideal candidate possesses a deep, demonstrable understanding of low-level AI infrastructure performance:

GPU Kernel Optimization: Expert-level understanding of GPU Kernel optimization techniques and how they affect large model inference latency and throughput.
Inference Engine Fluency: Hands-on experience with and a strong conceptual grasp of various inference engine optimizations (e.g., paging, dynamic batching, quantization).
High-Priority Frameworks: Direct, production-level experience with high-performance LLM serving frameworks: CUDA, ROCM, vLLM, and SGLang.
Proven track record in a customer-facing technical role (e.g., Solutions Architect, Technical Consultant, Sales Engineer) at a deep-tech or AI-focused company.
Fluency in modern development practices and programming languages relevant to AI (Python, C++).
Highly organized, with a proactive customer first mindset, and strong communication skills.

Bonus Points

Prior founder or early employee at a scaling startup
Existing network within AI/ML, HPC, or cloud infrastructure teams at Fortune 500 companies
Familiarity with GPU kernel optimization (CUDA, HIP, Triton) or related inference engines (vLLM, SGlang)

Our Benefits

Competitive salary and equity package
Comprehensive health insurance coverage for you and your family
Remote work option for exceptional candidates
Generous vacation and paid time off policy
Modern and comfortable work environment with state-of-the-art equipment and facilities

To Apply

Fill out this form