Staff Deep Learning Training Optimization Engineer – Cruise

Full time @Cruise Career in Vehicles & Autonomous Mobility
  • California, United States, San Francisco, California, United States View on Map
  • Post Date : January 31, 2022
  • Apply Before : May 1, 2022
  • View(s) 39
Email Job
  • Share:

Job Detail

  • Job ID 41988
  • Career Level Mid-Senior
  • Gender All
  • Qualifications certificate
  • Language Requirement
  • Region North America
  • Other Classifications startup
  • Special Programs y-combinator
  • Remote No
Top Banner Power Tools

Job Description

We’re Cruise, a self-driving service designed for the cities we love.
We’re building the world’s most advanced, self-driving vehicles to safely connect people to the places, things, and experiences they care about. We believe self-driving vehicles will help save lives, reshape cities, give back time in transit, and restore freedom of movement for many.
Cruisers have the opportunity to grow and develop while learning from leaders at the forefront of their fields. With a culture of internal mobility, there’s an opportunity to thrive in a variety of disciplines. This is a place for dreamers and doers to succeed.
If you are looking to play a part in making a positive impact in the world by advancing the revolutionary work of self-driving cars, join us.

About the role
The Autonomous Vehicle (AV) software stack heavily relies on machine learning techniques to perform a variety of tasks, each with different requirements of hardware/compute resources. Throughout the life-cycle of each machine learning model, skilled ML engineers (on both training and inference sides) work closely to prepare it for a robust, scalable, and compute/power efficient inferencing on a resource-constrained hardware accelerator. Such a close working relationship is key to fast and successful deployment of intelligent systems on the car.
Cruise is looking for deep learning performance experts who can understand the big picture of training performance on GPUs, prioritizing and then solving problems across many dozens of state-of-the-art neural networks.In this position, you will be responsible to understand, analyze, profile, and optimize deep learning training workloads on state-of-the-art hardware and software platforms. You will be expected to evaluate and improve performance optimization of every stage of computation, from vertical scaling (data pipeline optimization, faster layer execution/scheduling, mixed precision, model/subgraph parallelism) to horizontal scaling (strong vs weak scaling, communication collective tuning for latency/bandwidth), to convergence tuning for large batches (LARS, LAMB, etc.). 
This is a tech-leadership role. You will be charged with defining and leading a strategic roadmap to scale the velocity and throughput of training large ML models that are deployed to our AVs. In collaboration with your team, you will identify the current gaps and explore the right training optimizations strategies to invest in for the AI department. You will work closely with both ML engineers and ML platform engineers to scale these solutions to fit the expected demand and scaling.
If you’re interested in optimizing machine learning training and inference on different hardware accelerators, and want to test your skills with real-world (and practical) applications in the autonomous vehicle domain, let’s chat!
Day-to-day responsibilities include: 

Technical leader in driving strategy for optimizing training workloads at Cruise by defining strategic investments in collaboration with partner teams in AI.
Understand the big picture of training performance at Cruise, and define the technical roadmap wrt to optimizing performance across many dozens of state-of-the-art neural networks
Build performance analysis tools (profilers, hotspot analysis, etc) to diagnose the bottlenecks in end-to-end training workflow here at Cruise
Define technical strategies for scaling up utilization of training workloads on cloud resources
Lead execution of strategy in partnership with customer/partner teams within AI department (e.g., perception, prediction, robotics and infra teams), product managers and TPMs
Bring and extend SOTA in training efficiency to scale up the velocity of AI at Cruise

You should apply for this role if you have the following qualifications:

Expertise in optimizing training workloads (in pytorch or Tensorflow) for scaling out and scaling up training performance in the cloud / datacenter
Experience with defining technical strategy, vision and direction and bringing alignment across cross-functional teams.
Experience working as TL (tech lead) and delivered impact in improved training efficiency
Experience in Pytorch performance tuning (such as enabling async data loading, avoiding unnecessary CPU-GPU sync,  disabling redundant gradient calculations, enabling more effective op-fusions, eliminating redundant ops), familiarity with Pytorch general optimizations APIs (such as buffer checkpointing) 
Good understanding of deep learning framework building  blocks, e.g. operator registry, CPU & GPU ops, tensor memory management system (e.g. caching allocator), performance analysis, diagnosis and optimization for GPU workloads in DL framework runtimes. 
General C++ experience
MS, or higher degree, in CS/CE/EE, or equivalent, in industry experience

Bonus points!

Experience with deep learning optimization libraries such as DeepSpeed. MLPerf training optimization
Familiarity with distributed training packages in frameworks (such as torch.distribute, Horovod), libraries (such as Nvidia’s NCCL) and other scaling technologies (such as Reduction Server) for scaling up performance on multiple-GPU systems.
Familiarity with exploiting model parallelism and data parallelism to improve performance in multi-node data centers
Experience with open-source deep learning stacks (TVM, XLA, etc)
Familiarity/experience using auto-grad capable compilers (such as Jax, JuliaGPU) 
GPU programming (CUDA) and familiarity with deep learning stack (e.g., cuDNN, cuBLAS)
SIMD programming model (avx2, neon)

Why Cruise?

Our benefits are here to support the whole you:

Competitive salary and benefits 
401(k) Cruise matching program 
Medical / dental / vision, AD+D and Life
One Medical membership
Flexible vacation and company paid holidays
Healthy meals and snacks provided for non-remote employees
Paid parental leave
Fertility Benefits 
Dependent Care Flexible Spending Account, subsidized by Cruise
Flexible Spending Account 
Monthly wellness stipend
Pre-tax Commuter Benefit Plan for non-remote employees

We’re Integrated

Through our partnerships with General Motors and Honda, we are the only self-driving company with fully integrated manufacturing at scale.

We’re Funded

GM, Honda, Microsoft, SoftBank, & T. Rowe Price, have invested billions in Cruise. Their backing for our technology demonstrates their confidence in our progress, team, and vision and makes us one of the leading autonomous vehicle organizations in the industry. Our deep resources greatly accelerate our operating speed.

We’re Independent

We have our own governance, board of directors, equity, and investors. Our independence allows us to not just work on the edge of technology, but also define it.

We’re Vested

You won’t just own your work here, you’ll have the potential to own equity in Cruise, too. We are competing in a market that is projected to grow exponentially, which gives our company valuation room to grow.

Cruise LLC is an equal opportunity employer. We strive to create a supportive and inclusive workplace where contributions are valued and celebrated, and our employees thrive by being themselves and are inspired to do the best work of their lives. 
We seek applicants of all backgrounds and identities, across race, color, ethnicity, national origin or ancestry, citizenship, religion, sex, sexual orientation, gender identity or expression, veteran status, marital status, pregnancy or parental status, or disability. Applicants will not be discriminated against based on these or other protected categories or social identities. Cruise will consider for employment qualified applicants with arrest and conviction records, in accordance with applicable laws.
Cruise is committed to the full inclusion of all applicants. If reasonable accommodation is needed to participate in the job application or interview process please let our recruiting team know or email
We proactively work to design hiring processes that promote equity and inclusion while mitigating bias. To help us track the effectiveness and inclusivity of our recruiting efforts, please consider answering the following demographic questions. Answering these questions is entirely voluntary. Your answers to these questions will not be shared with the hiring decision makers and will not impact the hiring decision in any way. Instead, Cruise will use this information not only to comply with any government reporting obligations but also to track our progress toward meeting our diversity, equity, inclusion, and belonging objectives.
Vaccine Mandate. 
At Cruise, we’re tasked with leading in the communities we serve €” and doing our part to help keep our communities and our teams safe. Our #StaySafe culture transcends and informs all we do, and because of this, as of October 31, 2021 Cruise will be mandating COVID-19 vaccinations for all US-based Cruisers who need or want to access any of our US Cruise facilities and engage in any business travel €” including attending any in-person Company-sponsored event. 
If you are unable to get a vaccine due to a medical condition, disability, or a strongly-held religious belief, Cruise will consider requests for an accommodation.
Note to Recruitment Agencies: Cruise does not accept unsolicited agency resumes. Furthermore, Cruise does not pay placement fees for candidates submitted by any agency other than its approved partners.
Share:Click to share on LinkedIn (Opens in new window)Click to share on Twitter (Opens in new window)
Related Jobs

Showing 1 – 4 of 775 jobs

Senior Data Engineer €“ AI
Remote United States
Remote United States
Full Time
3 weeks ago
Lead Data Scientist €“ Analytics
Full Time
1 month ago
Senior Data Scientist- Trace
Remote United States
Remote United States
Full Time
2 months ago
Manager, Software Engineering AI
Full Time
2 months ago

Resume Top banner

Other jobs you may like