Microsoft

AI Workloads Team Lead

Microsoft Rhode Island, United States

Azure is building supercomputers at unprecedented scales to facilitate the massive computational demands of the world’s leading generative AI. Microsoft’s Eagle cluster, a Graphics Processing Unit (GPU)-accelerated supercomputer, is a noteworthy example achieving the coveted #3 and #2 ranks in Top500 and MLPerf benchmarks respectively. The Azure Artificial Intelligence / high performance computing (AI/HPC) team is looking for a dynamic leader to take charge of the AI Workloads Team. The team’s charter is to benchmark, profile, debug and tune the generative AI applications running in the production infrastructure. Sophisticated tools and techniques are needed to maintain the reliability, runtime performance, and health of the hundreds of nodes in a supercomputer consisting of thousands of GPUs.

The team lead will work closely with customers and vendors to understand the characteristics of their workloads, profile them to find performance bottlenecks, and propose techniques to achieve smooth and optimal performance of AI jobs. Your work will directly impact the business goals of a wide range of users and facilitate the next wave of growth and innovation in AI, and HPC in the cloud in general.

As a AI Workloads Team Lead, You Will Lead a Team Of Engineers And Researchers With Experience In High Performance Computing, Machine Learning, Deep Learning, Middleware, And Software Engineering. The Following Values Drive Us

  • Drive for Results: We’re here to build great products. We take on whatever work is right for the product and strive for the best possible results.
  • Modesty and Adaptability: The right answer is more important than being right. We search for solutions as a team, adapt quickly and value transparent and open feedback.

Your mission will be to help ensure the Azure platform is consistent on performance, can scale on-demand, and engineered to withstand the unparalleled computing demand from the customer workloads.

Microsoft’s mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond.

Responsibilities

  • Lead a team of highly qualified, diverse SMEs (subject matter experts)
  • Engage with Azure customers, internal teams and industry vendors
  • Develop and achieve goals to showcase Azure’s leadership in AI training and inference hardware for a range of real workloads and industry benchmarks
  • Develop a full-stack understanding of AI workloads, spanning CPU/GPU, NVLink, interconnects, programming frameworks (PyTorch, Nvidia CUDA, AMD HiP), communication libraries (NCCL/RCCL) and Deep Learning model architectures.
  • Oversee development and execution of cross-functional and time-critical projects.

Other

  • Embody our Culture & Values

Qualifications

Required Qualifications:

  • Master's Degree in Computer Science, or related technical discipline AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python
    • OR equivalent experience.
  • 6+ years of experience in software design and development
  • 3+ years of experience in developing and running AI/HPC applications on clusters
  • 5+ years of leadership experience in managing complex projects, or teams of individual contributors
Other Requirements

  • Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include, but are not limited to the following specialized security screenings:
    • Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud Background Check upon hire/transfer and every two years thereafter.
Preferred Qualifications

  • PHD in Computer Science or related technical field AND 10+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python
    • OR PHD in Computer Science or related technical field AND 8+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python
    • OR equivalent experience.
  • Previous experience with running and troubleshooting machine learning workloads on GPU clusters is a plus
  • Exposure to Cloud Computing, Virtualization and Container Technologies
  • Prior experience in Full-stack development
Software Engineering M5 - The typical base pay range for this role across the U.S. is USD $137,600 - $267,000 per year. There is a different range applicable to specific work locations, within the San Francisco Bay area and New York City metropolitan area, and the base pay range for this role in those locations is USD $180,400 - $294,000 per year.

Certain roles may be eligible for benefits and other compensation. Find additional benefits and pay information here: https://careers.microsoft.com/us/en/us-corporate-pay

Microsoft will accept applications for the role until September 5, 2024.

#azurecorejobs

Microsoft is an equal opportunity employer. Consistent with applicable law, all qualified applicants will receive consideration for employment without regard to age, ancestry, citizenship, color, family or medical care leave, gender identity or expression, genetic information, immigration status, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran or military status, race, ethnicity, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable local laws, regulations and ordinances. If you need assistance and/or a reasonable accommodation due to a disability during the application process, read more about requesting accommodations.
  • Seniority level

    Not Applicable
  • Employment type

    Full-time
  • Job function

    Information Technology
  • Industries

    Software Development

Referrals increase your chances of interviewing at Microsoft by 2x

See who you know

Get notified about new Team Lead jobs in Rhode Island, United States.

Sign in to create job alert

Similar jobs

People also viewed

Similar Searches

Explore collaborative articles

We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.

Explore More