1 /
N notes · ← → navigate

Project Deep Dive

Leon Ye

Principal Software Engineer · Microsoft · SharePoint Pages

10+ years building real-time collaborative & AI-powered authoring platforms

Brief intro. 10+ years at Microsoft, all on SharePoint Pages. Fudan University CS, ACM ICPC World Finals 27th place. Will walk through my career arc then deep dive into two major projects.

My Path

SharePoint Pages — A Decade of Evolution

1
Web Parts & Accessibility
Delivered high-traffic SharePoint web parts end-to-end. Co-authored 9 React a11y TSLint rules, built ScreenReaderAlert for org-wide reuse.
2
Authoring & Performance
Shipped Undo/Redo (JSON diffing), Version History. Designed viewport loader improving p75 LCP/TTI. Built org-wide A/B experimentation infra.
3
Canvas Architecture
Took ownership of the Page Canvas. Re-architected editing surface and section model that became the foundation for coauthoring.
4
Deep Dive
Real-time Coauthoring
Incubated from hackathon to production. Led 6–8 engineers, re-architected Pages data model with Fluid Framework Distributed Data Structures. 330K DAU / 2.6M MAU.
5
Deep Dive
AI-Powered Authoring
Architected AI Properties platform, launched Smart Sections (60K MAU), leading Vibe Authoring with content-oriented architecture.
Walk through the boxes briefly. Spend maybe 30 seconds on boxes 1-3, then transition into the deep dives for 4 and 5.
04

Real-time Collaborative Authoring for SharePoint Pages

From Zero Collaboration
to Real-time at Scale

Incubated from a hackathon into a production real-time coauthoring system built on Fluid Framework’s total-order broadcast protocol. Designed a dual-storage architecture (list item columns + Alternative File Partition) and migrated Pages to SharedTree — a Distributed Data Structure (DDS) designed for tree-like hierarchical data.

Dec 2020 – Sep 2024 · Senior Software Engineer · Team of 6–8 · 330K DAU / 2.6M MAU

Fluid Framework SharedTree (Distributed Data Structure) Total Order Broadcast React TypeScript
Set the scene. Pages was single-author only with data loss on concurrent edits. Key bet: Fluid Framework's total-order broadcast protocol. Unique challenge: Pages stores data in .aspx list items, unlike Loop/Whiteboard .fluid files — needed AFP storage model.

Real-time Collaborative Authoring for SharePoint Pages

Led a team of 6–8 engineers, collaborating with 3 cross-functional teams to deliver real-time coauthoring for SharePoint Pages — reaching 330K DAU / 2.6M MAU.

Designed 1:1 mapping between Pages data & Fluid DDS
Built standalone architecture for SharePoint Pages compatibility
Unified cross-service observability dashboard
Pages App Fluid Framework Protocol Rendering Layer Presence Pages Data Structure Fluid Integration Ephemeral Notification Observability SharedTree DDS Fluid APIs Ops & snapshot storage Push Service Sequencing & broadcast
Pages App — our team
Fluid Framework — protocol & SDK
Fluid APIs — ops & storage
Push Service — sequencing & broadcast
Impact statement first: led 6-8 engineers, 3 partner teams, 330K DAU. Then three concrete things I did: data model mapping, standalone architecture, unified observability. Diagram shows the system architecture. Bottom legend explains what each team owns. Talk through the diagram briefly, then transition to decisions slide.

Real-time Collaborative Authoring for SharePoint Pages

Decisions & Impact

From hackathon POC to leadership buy-in

Built POCs that made real-time coauthoring user impact visible and surfaced the technical challenges transparently. Leadership hesitated on signing a large investment — the POC turned an abstract pitch into a concrete, testable bet. Made the cost, risk, and payoff tangible, which secured the buy-in.

Adapting when the default doesn’t fit

The standard Fluid protocol assumed native .fluid files — but SharePoint Pages stores data in .aspx list items, a fundamentally different model. Led the design of a derived architecture that diverged from the default Fluid path, requiring cross-team alignment with Fluid APIs and Push teams to agree on a new storage and sync contract. Drove it to completion as tech lead.

Designing for safe contribution at scale

Designed the architecture with schema constraints and validation so non-senior engineers could safely contribute features without risking data integrity. Invested in reusable, perf-optimized components (e.g. presence indicators) that encapsulated complexity behind clean APIs.

Slowed down when it matters

Deliberately slowed rollout velocity to avoid data loss. Spent 3 months building a dedicated data-loss dashboard, fixing data-loss related issues, and adding cross-team observability — aligning Push and Fluid APIs metrics into a unified dashboard across 3 teams. Ensured we could detect and resolve problems end-to-end before scaling to broader audiences.

Swimlane and delegation

Cross-team collaboration requires clear problem statements, proposed alternatives, and efficient communication. Delegation isn’t just about reducing load — it’s about enabling the team to scale faster, giving others growth opportunities, and mentoring junior engineers into independent owners of their areas.

Lead with the POC story: hackathon prototype made the feature real and surfaced costs transparently — that's what got leadership to sign off. Standalone architecture: Pages uses .aspx files, not .fluid — needed a custom solution across 3 teams. Safe contribution: schema constraints let junior engineers ship without risking data integrity. Reliability trade-off: chose quality over speed — invested 3 months in data-loss dashboard and cross-team observability before scaling rollout.
05

AI-Powered Authoring

From “We Don’t Know
What to Build” to 60K MAU

The team had already tried command-based AI editing — but LLM capabilities weren’t there yet, our agent toolset was poor, and nobody knew how an LLM could reliably modify pages on behalf of users. We got there in three phases: schematize the platform, scope a shippable bet, then scale to full AI authoring.

Sep 2024 – Present · Principal Software Engineer

LLM Orchestration Schema Design Prompt/Context Engineering 0→1 Product Launch Agent Architecture
Start here: extreme ambiguity. Team had already tried command-based approach — LLM calls tools to modify page elements. It didn't work well. LLM couldn't reliably use our tools, and the tool surface was incomplete. The question was: how can an LLM efficiently make changes to a SharePoint page? That's the problem I solved in three phases.

AI Authoring

THE PROBLEM Command-based AI editing: LLM calls tools one-by-one LLM couldn't use tools reliably No AI-first schema existed PHASE 1 — First Principle AI Properties Schematized every component into JSON structured output the LLM could understand >50% token reduction Significantly improved model focus PHASE 2 — Scope Down Smart Sections Scoped down from full-page to section generation — well-scoped risks, real value ~60K MAU · GA PHASE 3 — Scope Up Vibe Authoring Full-page AI authoring. LLM writes directly to our JSON page format rolling out HOW VIBE AUTHORING WORKS — content-first orchestration Page Harness Represents full page state { "sections": [ { "type":"hero", "props":{ ... } }, // AI edits here directly ] } LLM Agent Reads page state Generates JSON diffs directly to page format state edits Why This Works 01 Explicit state — LLM sees the whole page, not fragments 02 Schema-validated output — AI Properties ensures correctness 03 Proven success from coding agents (Claude, GitHub Copilot) 04 Versionable — every edit is a diffable JSON snapshot
AI Properties — schema foundation
Smart Sections — 0→1 shipped product
Vibe Authoring — full-page AI agent, rolling out
Walk through the progression: (1) Problem: team tried command-based editing where the LLM calls tools. It failed — LLM couldn't use tools reliably and the tool surface was incomplete. (2) AI Properties: I schematized every page component into clean JSON so the LLM could understand what a SharePoint page is and what it can do. >50% token reduction, significantly improved model focus. (3) Smart Sections: scoped down from full page to section generation. Well-calculated risks, shipped real value, 60K MAU. (4) Vibe Authoring: now fully powered to do full-page AI authoring. Content-first means the LLM writes directly to our JSON page format — like code generation. The harness represents the full page state, the LLM reads it and generates diffs. No fragile tool chain.

AI Authoring

Decisions & Impact

Designed the AI-first abstraction layer

SharePoint Pages had no LLM-readable contract — components were opaque blobs. Proposed and led the design of an AI-first schema that distilled complex page data into structured JSON the model could reason about. Reduced token usage by >50% and significantly improved model focus — becoming the foundation for every AI feature that followed.

Scoped down to ship through ambiguity

Full-page AI editing was undefined — unclear scope, unproven value. Defined a scoped-down plan targeting section generation: bounded risk, measurable customer value, and fast iteration cycles. Shipped Smart Sections to GA (~60K MAU), validated assumptions with real users, and built the infrastructure that scaled to full-page later.

Turned nondeterministic into measurable

LLM output is inherently unpredictable. Built a multi-channel evaluation framework — assertion-based checks, LLM-as-judge scoring, and visual critic analysis — to evaluate content quality comprehensively. Gave the team a repeatable way to measure what was previously unmeasurable.

Identified foundational needs, executed fast

Recognized AI Properties as the critical enabler before any feature could ship. Built and rapidly iterated a POC and design doc that proved the value proposition — leading to adoption by multiple partner feature teams across the organization.

Navigating constant trade-offs

Every decision involved balancing latency, content quality, and system complexity. Developed the judgment to navigate these trade-offs quickly — knowing when to accept higher latency for better output, when to simplify the architecture, and when good enough is the right call.

Engineering excellence with AI in the loop

Embraced AI tooling for velocity while maintaining critical thinking and owning final judgment. Helped the team understand hallucination boundaries, build intuition for when to trust and when to verify, and keep engineering rigor as the bar — not speed alone.

Six cards showing staff-level judgment in AI. AI-first schema: proposed the abstraction that made everything else possible. Scope down: navigated product ambiguity by shipping a bounded bet. Eval: turned nondeterministic LLM output into something measurable. Fast execution: identified foundational needs early, proved value quickly. Trade-offs: constant navigation between latency, quality, complexity. Engineering excellence: maintained rigor while embracing AI, helped team learn hallucination boundaries.

Thank You

Questions &
Discussion

Leon Ye · Principal Software Engineer · Microsoft

Open for questions. Backup details: Fluid internals, data loss incidents, Sev2 response process, perf experimentation infra, AI Properties schema design, Vibe Authoring architecture, Playwright migration, a11y work.