Menu

Global Azure did AI and ML in the Cloud ten years before it was cool

A look back at a groundbreaking research project powered by "Windows" Azure in 2014.

The Global Azure community of communities is a network of passionate community leaders, developers, architects, it-pros, and other enthusiasts who share knowledge of and experience with Microsoft Azure. For over a decade, the #GlobalAzure event has been hosting webinars, workshops, and hackathons to showcase and geek out about the latest innovations and best practices in cloud computing.

Did you know that the Global Azure community was also doing cutting-edge AI and ML research in the cloud way before it was cool?

Here is a highlight on an amazing global computation lab the community members were involved in using early means of data analysis, machine learning, and high-performance computing. We were not first, and we were not the only ones working on similar setups, but we were very much among the pioneers of using cloud computing resources for scientific research and we worked together with scientists to do something for good and for the glory of tech!

Clay Hagler was part of the collaboration and helped Global Azure connect to scientists and to the Cloud, today an Account Technical Strategist at Microsoft:

I like to think this project was the distant ancestor to some of the developer centric, cloud powered, scientific discoveries we are seeing today.

There are so many areas of discovery that need developers willing to direct their creative energy
toward the problems that are facing us as humans and the planet in general.

It was so rewarding to see this project come together for such a valuable goal back then
and I am thrilled to see this community continuing to expand. I hope to see some of you in and around a science project someday.

Read further down how over 1,500 people around the globe came together and participated in this massive, multi-compute, gamified scientific compute lab for good using what was then called "Windows Azure"! First, let us see what ten years of optimizing this approach have brought:

Satya Nadella highlights AI with high performing computing for accelerating scientific discovery.

AI and ML services in the Cloud are today put to good use to find fast breakthroughs such as this noted by Satya Nadella:

"We’re bringing together next-generation AI with high-performance computing to accelerate scientific discovery, collaborating with organizations like Pacific Northwest National Laboratory to find new materials for energy storage solutions in weeks, not years."

LinkedIn Post by Satya Nadella about AI and high performance computing at PNNL

This Microsoft Whitepaper has more: "Discoveries in weeks, not years: How AI and high-performance computing are speeding up scientific discovery" as summarized by Copilot:

The article discusses how advanced AI and high-performance computing (HPC) are accelerating scientific discovery, particularly in the fields of chemistry and materials science. Researchers at Microsoft and Pacific Northwest National Laboratory (PNNL) collaborated to use AI and HPC tools to identify around 500,000 stable materials in just a few days. The collaboration resulted in the discovery of a new battery material that could be used in battery development in just 80 hours. The article highlights the potential of this technology to accelerate scientific discovery across several scientific fields.

  • AI and HPC for scientific discovery: Microsoft and PNNL are collaborating to use advanced AI and high-performance computing (HPC) to accelerate the discovery of new materials for energy solutions.
  • A new battery material in weeks: The Microsoft Quantum team used AI to screen 32 million potential materials and identify 18 promising candidates for battery development in just 80 hours. PNNL scientists synthesized and tested one of the new materials, which uses both lithium and sodium ions.
  • Azure Quantum Elements for chemistry: Microsoft's Azure Quantum Elements is a cloud-based service that offers AI models and tools for chemistry and materials science research. The service can perform fast and accurate simulations of molecular properties and dynamics and can be used for any kind of materials research.
  • A new era of acceleration: The collaboration between Microsoft and PNNL demonstrates how AI and HPC can speed up the scientific process and enable discoveries that would take years with traditional methods. The researchers believe that this approach can be applied to other scientific fields and address urgent challenges in sustainability, pharmaceuticals and more.

The Global Azure 2014 GlyQ-IQ lab

In 2014, participants of the Global Azure Community joined with PNNL in a massive compute effort. The goal was to calculate statistically significant glycan profile differences between healthy and diabetic states, thereby elucidating quantitative glycomics differences between healthy and diabetic blood samples. Effectively, what we accomplished then with the tools of "Windows Azure" (as it was known at the time) was quite like operations performed today using high-level ML and AI services in Azure.

That's right! Together, as a Global Community of tech enthusiasts, WE PIONEERED AI IN THE CLOUD IN 2014.

Clay Hagler again:

“We were taking on a common challenge of sweeping across millions of potential candidate solutions in a deserialized way
to discover the most likely candidates to fit a model.

To solve the problem, we leveraged the combined capability and creativity of a US National Laboratory, a leading cloud technology provider, and the creativity and innovation of the Azure developer community. This unique combination of domain specific expertise, technology innovation and creativity are the hallmark of some of the most significant innovations. In this case, together we were able to accelerate the discovery process 550x over what the researcher could do on his own system!”

The Technical Azure Architecture of the Lab

Technically, in Azure terms, we staged the data to be processed in cloud storage, and then deployed many thousand instances of compute nodes running the research algorithm.

The compute service we employed was the original compute service in Azure, the "hosted service" nowadays now known as Cloud Services. It was attainable and efficient for us to automate the deployment of the algorithm and to scale out the computation to many nodes (hundreds and thousands) using this technology.  Today, modern AI and ML services perform similar tasks, where the underlying compute is entirely abstracted away from the user and the management of the clusters is fully automated. In 2014, we achieved similar results through manual processes.

Here is the then cutting-edge architecture we deployed:

The Global Windows Azure Bootcamp GlyQIQ Lab Architecture in 2014

  • The left of the architecture is the staged data, the computational nodes, the queued-up work tasks, and the results storage.
  • On the right of the picture, you see our central command units that were able to log the progress of the swarm of computational nodes, as well as send commands to them to change their behaviour.

Effectively, each participant in Global Azure used the same deployment package to run any number of Worker Role instances in their own Azure account. Some would run one or a few instances in their own free trial Azure subscription. Others borrowed their corporate Azure accounts which meant their company sponsored our research lab with some of their compute time, and then deployed a couple of hundred instances for some hours.

Together, this formed a massive global ~17,000 compute node cluster that went to work on the scaled-out data set of research data!

Below is a map showing the locations of Global Azure events from that same year. Anyone anywhere who was part of Global Azure could share some “compute for good” in the lab.

Global map with a pin for each of hundreds of Global Azure event locations.

The main challenges of the lab were to handle the unpredictable scaling of the cloud application, to ensure the data transfer across different data centres, and to monitor the progress of the analysis. As far as we were able to summarize, 1,583 attendees at 93 locations in 38 countries participated in the 2014 event. Overall, 482,225 glycan targets were searched, and 26 high-resolution LC-MS diabetes datasets were analysed.

Scientific outcome

The project produced at least one scientific article, "GlyQ-IQ: Glycomics Quintavariate-Informed Quantification with High Performance Computing and GlycoGrid 4D Visualization", which presents the findings of this project. I asked Copilot to make a summary of the science. Hopefully it got some of it almost right! ;)

The GlyQ-IQ project aimed to develop a novel method for analysing the glycoproteins in human blood plasma. Glycoproteins are proteins that have sugars attached to them, and they play a crucial role in many biological processes, such as immune response, cell signalling, and disease progression. However, analysing the glycoproteins is a complex and challenging task, as they have a high degree of variability and diversity. The GlyQ-IQ project from Pacific Northwest National Laboratory (PNNL), used Azure to create a scalable and robust pipeline for glycoprotein quantification and visualization, using a combination of mass spectrometry, machine learning, and 4D graphics. The project was able to process over 2.5 terabytes of data and generate over 100,000 glycoprotein profiles, which were then visualized using a custom-built tool called GlycoGrid, which allowed the researchers to explore the data in four dimensions: mass, retention time, charge, and abundance. The project also used Azure Machine Learning to train and deploy a predictive model that could identify the glycoproteins associated with type 1 diabetes, a chronic autoimmune disease that affects millions of people worldwide. The GlyQ-IQ project was a finalist in the 2014 Microsoft Research Azure for Research Award, and it demonstrated how the cloud can enable large-scale and innovative research in the field of glycomics. 

  • GlyQ-IQ is a software tool for glycan analysis that uses a targeted approach to identify and quantify N-glycans in liquid chromatography-mass spectrometry (LC-MS) data sets.
  • GlyQ-IQ uses prior information about the glycan target's elemental composition, isotopic profile, and family relationships to improve the sensitivity and specificity of the analysis. It also leverages insource fragmentation information to confirm the glycan assignments and remove false positives.
  • GlyQ-IQ was evaluated on a high-resolution LC-MS data set of N-glycans enzymatically released from human serum glycoproteins. It detected 156 glycan compositions and 640 glycan isomers, with over 99% of the assignments passing manual validation.
  • GlyQ-IQ provides a GlycoGrid 4D visualization software that plots the glycan compositions in a four-dimensional grid and indicates the detection and confirmation status of each composition. It also provides a GlyQ-IQ viewer that allows the user to inspect the raw data and the LC-MS features.
  • The advantage of using a cluster to compute the result over using a single machine is that it significantly reduced the runtime. The GlyQ-IQ software was deployed on a cluster of computational nodes in Windows Azure with a head node and 1,504 compute cores. This computational acceleration reduced the runtime by 99.92% (550× faster) when compared to executing the same job on a single core processor.

Getting involved with Global Azure 2024

As an event organizer, as an attendee, or as a sponsor – you can also be part of Global Azure 2024

Global Azure https://globalazure.net/ is an event driven by azure tech enthusiasm, organized “by the community for the community” with the aim to share learning and knowledge about Azure. Global Azure fosters connections among like-minded individuals passionate and curious about these technologies.

  • Follow #GlobalAzure on social platforms: If you are interested in joining the Global Azure community, you can follow and engage with the hashtag #globalazure on social media and discover local events and meetups through the community map on the website.
  • Speak at or help at a local event: You can contribute by submitting sessions, offering feedback, or volunteering as an organizer or speaker. Global Azure exemplifies how cloud technology unites people, sparking learning and innovation.
  • Create your own community event: This community-driven initiative encourages tech community leaders to participate and contribute. You can get involved by adding a map pin for your location following the instructions on the Global Azure website, thereby setting up a local community Azure learning event. Organizing such events brings together developers, architects, and enthusiasts to exchange their knowledge and experiences with Azure and cloud computing.

    Go to our #HowTo #GlobalAzure page to learn how to join!
     
  • Sponsor Global Azure: We are also looking for additional sponsors to support the community by providing vouchers and licenses to its members. By becoming a sponsor, you can help support the growth and development of the Global Azure community and enable its members to continue learning and innovating with Azure and the cloud.

    Please use our contact Global Azure form if you are interested to learn more!

Summary

The reality is that without our beloved Azure Platform there would be no modern AI services.

Many of the wonderful/incredible AI services, that are so hyped today, run on the Azure Platform. In the very early days of the cloud, pioneers began elaborating on how to bring the field of scientific research together with massive public cloud compute platforms. Modern AI/ML services use Azure storage, Azure networking, Azure compute, and many more services, all set up with full automation today so that scientists are empowered by these cloud resources without any headache of needing to comprehend how the underlying AI infrastructure platform of massively scalable compute and automation works.

Communities of tech enthusiasts, such as those that collaborate in Global Azure represent the cutting edge of innovation, who push forward and envision futuristic dreams for technological achievement.

Mark Brown, Principal PM Manager Microsoft, was the first Program Manager at Microsoft who cared for the growing Azure Community of Azure Most Valuable Professionals (MVPs):

“For over 10 years Global Azure has brought developers together to lift the knowledge and skills of the Azure community.

It has also shown to the world that Azure and its community can also come together to help find and solve real-world issues.”

It's remarkable to reflect on how the research and technology we developed ten years ago have been part of the evolution to the current AI/ML landscape. At the time, we didn't realize it, but Global Azure played a significant role in pioneering many of the concepts that are now fundamental in AI.

We want to direct a special thanks to everyone who were involved in Global Azure over the years, and who keep engaging with the community of Azure tech enthusiasts to empower every learner on the planet to do more!

Please everyone join the biggest global cloud tech community event on the planet!

Cheers,

Magnus Mårtensson – one of the founders of Global Azure – and the whole team of Global Azure admins: Alex Mang, David Rodríguez, Hugo Barona, Jennifer Holland, Luce Carter, Martin Abbott, Olena Borzenko, Rik Hepworth, and Tiago Costa.