We are excited to announce the release of SynapseML v1.1 marking a host of powerful new features introduced since the initial v1.0 release. SynapseML is an open-source library that aims to streamline the development of massively scalable machine learning pipelines. It unifies several existing ML Frameworks and new Microsoft algorithms in a single, scalable API that is usable across Python, R, Scala, and Java. SynapseML is usable from any Apache Spark platform with first class enterprise support on Microsoft Fabric.
Highlights
| | | |
|:--:|:--:|:--:|
| | | |
| Build and operationalize distributed ML with SynapseML in Fabric | Apply Pandas and Spark LLM transformations with one line of code | Automatically derive AI insights for unstructured data in OneLake |
| | | |
Spark 3.5 Support – In this version we transitioned to Spark 3.5 as our main Spark platform.
OpenAI Ecosystem – Comprehensive improvements including global parameter defaults, GPT-4 enablement, custom endpoints/headers, GPU-accelerated embeddings with KNN, and fine-grained control over model parameters (top_p, seed, responseFormat, temperature).
ML Innovation – HuggingFaceCausalLM transformer for distributed language model evaluation, custom embedder support, and synthetic difference-in-differences causal inference module.
Platform features – Spark Native OneLake support; MSI for Azure Storage; OpenAITranslate transformer.
AI Functions in Data Wrangler on Fabric – AI Functions built into Data Wrangler in Fabric allow you to apply LLM-powered operations to your dataframe without writing a single line of code.
Updated to use new AnalyzeText API in docs (#2126)
Updated find_secret on Fabric documentation (#2132)
Pointed Cognitive APIs documentation to Azure AI (#2119)
Clarified default dataTransferMode is streaming, not bulk (#2377)
Added audiobook paper to README
Raised error with documentation link for find_secret (#2180)
Contributor Spotlight
We are excited to highlight the contributions of the following SynapseML contributors:
| | | |
|:--:|:--:|:--:|
| Rana Singh | Farrukh Masud | Tom Finley |
| Rana is the Senior Engineering Manager for SynapseML and was instrumental in improving the prompt engineering that powers AI Functions. Working alongside Tom, he helped build the feature from the ground up, ensuring high-quality and reliable AI-powered transformations. His attention to detail in refining prompts has made AI Functions more accurate and his leadership has been essential to the initiative’s success. | Farrukh is a Principal Engineer on the Code-First AI team and a prolific contributor this release. He was key in lighting up AI-powered transforms in OneLake, allowing users to apply AI transformations directly through shortcuts, dramatically expanding the reach of cognitive services across Fabric. Farrukh's contributions to Fabric integrations continue to expand the possibilities for AI-powered data workflows. | Tom is a Principal Engineer on the Code-First AI team and was pivotal in the API design for AI Functions. His thoughtful decision-making shaped AI Functions that are both powerful and intuitive. Working closely with Rana, he helped architect the feature from its earliest stages, making key choices that have shaped how users interact with AI Functions. Tom's design sensibility and technical expertise have been foundational to the feature's success. |
| | | |
| Jessica Wang | Wendong Li | Samhitha Mamindla |
| Jessica is a Software Engineer on the SynapseML team and architected the AI Foundry integration. She has been a consistent and reliable contributor across multiple releases, building robust features and working directly with customers to understand their needs. Her integrations with AI Foundry and Hugging Face have been invaluable in helping SynapseML bridge the gap between closed and open source communities. We're excited for her continued impact on the team. | Wendong is a Software Engineer who recently joined the SynapseML team and has already emerged as a rising star. He single-handedly built multimodal AI Functions, enabling seamless transformations across text, images, and other data types. These are significant contributions for someone so new to the team, demonstrating both technical prowess and the ability to deliver complex features independently. We're eager to see what Wendong builds next. | Samhitha is a Software Engineer II who recently joined the SynapseML team and has quickly become pivotal in evaluating the quality and reliability of AI Functions. Samhitha works through logging infrastructure to ensure we can track, measure, and improve the performance of these features. Her meticulous approach to monitoring has been essential for continuously improving the user experience. |
| | | |
| Virginia Roman | Elias Yousefi | Shyam Sai |
| Virginia is a Senior Product Manager on the SynapseML team and leads AI Functions in Fabric. She presents and shares AI Functions with customers, gathering critical feedback that shapes the roadmap. Virginia also champions the integration of AI capabilities into Data Wrangler, making AI-powered data transformations accessible to a broader audience of data professionals. Virginia's product leadership is invaluabel, and we're lucky to have her as a collaborator. | Elias is a Senior Engineer on the Code-First AI team and is building core service infrastructure for AI Functions. He is driving the next generation of integrations with AI Functions, though many of these exciting developments are still under wraps. We look forward to sharing more of his contributions as they roll out publicly. | Shyam was a key contributor during his time with SynapseML and is now Co-Founder at Frizzle. Shyam brought AI Functions to PySpark, giving Spark users access to the same capabilities already available in Pandas and expanding the feature's reach to a broader developer community. Though Shyam has moved on to his next venture at Frizzle, his contributions and presence will be missed by the SynapseML team.
Acknowledgements ❤️
We would like to acknowledge the developers and contributors, both internal and external, who helped create this version of SynapseML
| | | |
|------|------|------|
| Mark Hamilton @mhamilton723 | Markus Weimer | Avrilia Floratou |
| Eric Dettinger @sandshadow | Virginia Roman | Eren Orbey @orbey |
| Rana Singh @ranadeepsingh | Farrukh Masud @FarrukhMasud | Tom Finley @TomFinley |
| Brendan Walsh @BrendanWalsh | Jessica Wang @JessicaXYWang | Wendong Li @levscaut |
| Elias Yousefi @elyousef | Shyam Sai @sss04 | Shirley Huang |
| Chongyu Wang | Ruoyu Jing | Bo Zhou |
| Paul Wang @pwang347 | Kyle Cutler @kycutler | Dan Vilnoiu |
| Scott Votaw @svotaw | Mark Niehaus @niehaus59 | Aydan Aksoylar @aydan-at-microsoft |
| Sheryl Zhao @sherylZhaoCode | Markus Cozowicz @eisber | Sailesh Baidya @saileshbaidya |
| Keerthi Yanda @KeerthiYandaOS | Kyle Rush @k-rush | Aadharsh Kannan @AKannanMSFT |
| Serena Ruan @serena-ruan | Cruise Li @mslhrotk @lhrotk | Jason Wang @memoryz |
| Haizhou (Dylan) Wang @dylanw-oss | Sarah Shy @sarahshy | Kashyap Patel @ms-kashyap |
| Puneet Pruthi @ppruthi | Ilya Matiach @imatiach-msft | Amir Jafari @amhjf |
| Nellie Gustafsson | Bogdan Crivat | Justyna Lucznik @juluczni |
| Richard Wydrowski @richwyd | Tania Arya @taniaarya | Adithya Mukund @adithyamukund |
| Roman Batoukov @RomanBat | Alexandra Savelieva @alsavelv | Jessica Wolk @msplants |
| Luis França @luisffranca | Paul Koch @paulbkoch | Rich Caruana |
| Martha Laguna @martthalch @marthalc | Jeff Zheng | Sicong Yang |
| Peixian Gong | Ruixin Xu | Chris Hoder |
| Derek Legenzoff | Misha Desai | Beverly Kodhek |
| Louise Han @jr-MS | Raj Rikhy | Brice Chung |
| Marcos Campos | Mike Estee | Kim Manis |
| Mitrabhanu Mohanty | Anand Raman | Sudarshan Raghunathan @drdarshan |
| William T. Freeman | Gregory B. Newby | John Moyer |
| Vidip Acharya | Ashit Gosalia | Miguel Fierro @miguelgfierro |
| Ismaël Mejía @iemejia | Kartavya Neema @kartavyaneema | Daniel Ciborowski @dciborow |
| Mark Tabladillo @marktab | Guilherme Beltramini @gcbeltramini | Akshaya Annavajhala (AK) |
| James Verbus @jverbus | Mopé Akande @msakande | Ikko Eltociear Ashimine @eltociear |
| Alexander Spiridonov @vonodiripsa | Hiroshi Yoshioka @hyoshioka0128 | Frank Solomon @fbsolo-ms1 |
| Leonard Herold @LeonardHd | David Smith @dsmith111 | Denniz Svens @DennizSvens |
| João Moura @operte | Sean Marihugh | ONNX Team |
| Azure Global | Vowpal Wabbit Team | LightGBM Team |
| MSFT Garage Team | MSR Outreach Team | Speech SDK Team |
| MLflow Team | Azure Docs Team | |