Shopify Sidekick vs Your Agency: The Honest 2026 Scorecard
Shopify Sidekick vs Your Agency: The Honest 2026 Scorecard
Shopify Sidekick vs Your Agency: The Honest 2026 Scorecard

There's a quiet experiment running in Shopify Plus operator Slacks right now.
It goes like this: take your $8K–$20K monthly agency retainer. Pause it for 90 days. Run every routine task through Sidekick — workflow builds, theme edits, analytics queries, app evaluation, photo cleanup, even custom app scaffolds. Track what Sidekick eats, what it chokes on, what the math actually says. Then decide whether the retainer comes back, comes back smaller, or comes back as something fundamentally different.
I've watched versions of this experiment play out on Plus stores since Sidekick's Winter '26 Edition landed in January. The honest answer isn't the one the shopify sidekick vs agency discourse wants you to hear. Sidekick is genuinely transformative on about 70% of the work a typical Plus merchant sends to an agency. On the other 30%, it silently breaks in ways that have already cost merchants real money — one of whom I'll walk through below. The verdict isn't "fire your agency" or "Sidekick is a gimmick." It's a framework for which 70 and which 30.

This article is the scorecard: what Sidekick actually does well, where it breaks, the $40,000 failure pattern I've now heard three separate times, the decision framework Plus operators are landing on, and what this means for the agency side of the equation.
What Sidekick Actually Does in 2026
Before the scorecard, worth establishing what this thing is in April 2026. Sidekick is no longer a chat widget. After the Winter '26 Edition it's a system-wide AI layer that spans the Shopify admin, mobile app, and Shop app, with a deliberately wide capability surface.
The features that matter operationally:
Sidekick Pulse — proactive recommendations drawn from market trends + your own store data. Less "ask a question, get an answer" and more "here's something worth looking at you wouldn't have thought to ask."
Workflow Automations — describe a workflow in plain English, Sidekick builds it in Shopify Flow. "When a first-time customer spends over $200 and the order contains a loyalty-eligible item, tag the order
VIP-onboard, trigger a delayed 5-day follow-up email, and notify the CX team in Slack." That prompt produces a working Flow without you touching the Flow canvas.Custom Analytics Reports — generate ShopifyQL queries + data visualizations from a natural-language request. "Show me AOV by customer cohort for repeat customers in Q1, segmented by acquisition channel" produces a real dashboard.
Theme Edits Generation — describe a design change, Sidekick updates the theme. Color tweaks, copy updates, section reorganizations, padding fixes.
Custom App Generation — Sidekick can now scaffold entire apps specific to your business. This is new in Winter '26 and it's better than people realize.
Block Generation for All Themes — create custom blocks in any Shopify Theme Store theme, not just your primary one.
Enhanced Memory + Skills — Sidekick remembers your preferences, prior conversations, and lets you save reusable prompts ("skills") that you can share across the team.
Multi-Step Task Completion — breaks a complex request into a plan, executes the plan, checks in between steps.
The capability list is long. On first read it feels overwhelming. In practice, 80% of the daily operator value comes from the first four bullets.
Where Sidekick Wins Against Agencies — Fast, Clearly, Repeatedly
The wins are real and they're not subtle.
Turnaround time collapses. A theme copy change that goes through an agency queue takes 48–72 hours best case — request written, triaged, assigned, dev work, QA, review, deploy. With Sidekick it's two minutes in the admin. A simple Flow automation that used to eat a half-day of agency time is now a 15-minute prompt-and-refine session. For Plus merchants who ship constantly, this is the single biggest shift.
Routine analytics stops being a ticket. "How much revenue did we do from the Horizon theme launch last quarter, segmented by landing page?" That query used to go into the agency queue and come back with a report two days later. Now you type it into Sidekick in the admin and have the answer in 40 seconds. For a team that runs 20–50 analytical asks a month, Sidekick is functionally a free in-house analyst.
Photo and content work is 10× faster. Studio-quality background changes, object removal, canvas expansion, banner variants for A/B tests. Plus merchants used to pay agencies retainers that included creative production; Sidekick handles most of this in the browser now.
App discovery and evaluation becomes structured. Sidekick can compare apps on functionality, reviews, integration footprint, and even install one and verify the setup. Plus merchants who used to rely on agency recommendations for app selection now get a second opinion instantly.
Simple Flow automation becomes trivial. Abandoned cart recovery, tag-based segmentation, first-order thank-you emails, review requests timed to delivery — all of these are one-prompt builds now. They were 4–8 hours of agency work each.
Fine-grained version of the pattern: Sidekick wins on work where the problem is well-specified, the solution space is standard, and there's no cross-system state to reason about. The 70% of the agency queue that fits that description is gone.
Where Sidekick Silently Breaks — Six Patterns Plus Operators Keep Running Into

This is where the honesty gets uncomfortable for people selling AI-replaces-everything narratives. Six patterns come up in every Plus merchant post-mortem I've seen.
1. Cross-system orchestration. Sidekick sees Shopify. It doesn't see your ERP, your 3PL's WMS, your finance system, your CX platform, or your data warehouse. The moment a workflow needs to coordinate state across systems — an order edit that needs to sync back to NetSuite, a refund that needs to adjust commissions in the finance tool, a return that needs to trigger a specific workflow in Gorgias — Sidekick's helpful answer is "here's the Shopify part." The other 60% of the work still needs human orchestration.
2. Judgment calls. "Should we run this promo this week?" Sidekick will give you a confident answer. It will not give you the tradeoffs. It won't tell you the promo cannibalizes next week's revenue, or that the discount structure conflicts with a contract with your top wholesale customer, or that the margin math assumes your vendor hasn't raised unit costs. These are the decisions that should be human, and Sidekick treating them as "just tell me and I'll execute" is how expensive mistakes happen.
3. Custom UX logic. Plus merchants have these — B2B variant visibility rules, tiered pricing that depends on login state, custom attribute-driven checkout flows, region-specific payment method gating. Ask Sidekick to modify these and it reverts to the "standard Shopify pattern" answer. The bespoke logic gets quietly discarded. This is an entire class of bug that looks like it worked but broke your edge cases.
4. Data pipeline work. ShopifyQL is powerful inside the Shopify data model. It can't join to your Google Analytics data, your Northbeam attribution, your Klaviyo engagement scores, your subscription churn tables in Stripe. Sidekick writes great ShopifyQL. It cannot answer a question that requires external data — and when asked, it'll answer confidently using only the Shopify-native slice. The answer will be wrong for reasons the operator may not immediately spot.
5. Post-purchase workflows with edge cases. Order edits, exchanges, address changes, partial refunds with complex loyalty-discount logic, multi-line swaps with different tax implications. Sidekick handles the easy versions — the ones with a standard Shopify-native path. The edge cases that drive the majority of support tickets are precisely the ones where it punts or, worse, produces a clean-looking answer that doesn't handle the edge.
6. Strategic positioning. "Should we reposition our Pro plan to emphasize B2B?" Sidekick will give you a crisp, confident answer. It won't know your competitive landscape, your team's capacity to execute, your sales pipeline, your founder's intuition about the market. These are not AI problems. Treating them as AI problems produces confidently wrong strategy documents.
Pattern across all six: Sidekick is brilliant at work inside Shopify's data model where the solution space is standard. It breaks the moment the work needs context Shopify doesn't hold.
The $40,000 Miss — One Specific Failure Pattern

The numbers vary but the story is now common enough that I've heard three versions of it. Here's one.
A Plus apparel brand on a seven-figure run-rate, ~15% of revenue from B2B wholesale. Two months into their Sidekick-replaces-agency experiment, they ask Sidekick to build an abandoned-cart recovery Flow. Sidekick does it in four minutes. The Flow triggers on abandoned checkout, waits six hours, sends a templated email with the cart contents, includes a 10% discount code "to bring them back."
The Flow works perfectly for retail customers. It works incorrectly for wholesale customers. Wholesale customers log into the same Shopify storefront but see B2B-specific pricing (roughly 40% below retail) applied through a metafield-driven price list. The abandoned-cart email shows the B2B price and adds a 10% discount on top. Wholesale customers received the email showing a price that was effectively 50% below retail.
Within 72 hours, several wholesale accounts had forwarded the email to their sales contact demanding the quoted price be honored on the next PO. The brand could either (a) honor a discount they hadn't intended to extend and that wrecked the margin on the subsequent orders or (b) explain to their biggest wholesale partners that the email from the brand's own domain was a bug. They chose (a). The calculated impact across the affected accounts over the quarter: roughly $40K in unplanned discounts.
Sidekick didn't know about the B2B pricing logic. The Flow had no way to check login state or customer tag or B2B membership. The operator who asked Sidekick for the Flow had no reason to think of those edge cases because "build an abandoned-cart email" is a task that, for a retail-only store, is genuinely a five-minute build. But on any multi-segment store — B2B + DTC, subscription + one-time, VIP + first-time — Sidekick's helpful confidence is a liability.
The agency this brand had worked with previously would have caught this. Not because the agency is smarter — because the agency knew the store. That context is the thing Sidekick can't substitute for, and the thing merchants are surprised every time to find they actually need.
The Decision Framework — When to Use Each

After watching these experiments, the rule Plus operators keep landing on is:
Sidekick for work where the problem is well-specified and context-free. Agency (or in-house senior) for work where context is load-bearing.
More concretely:
Use case | Sidekick wins | Agency / senior wins |
|---|---|---|
Theme copy + layout tweaks | ✓ | |
Simple Flow automations (abandoned cart, tagging, emails) | ✓ | |
Analytics queries inside Shopify data | ✓ | |
App discovery and comparison | ✓ | |
Photo editing and variant creation | ✓ | |
Standard integrations (Shopify → Klaviyo, Shopify → Postscript) | ✓ | |
Custom integrations requiring your ERP / 3PL / finance stack | ✓ | |
Multi-segment business logic (B2B + DTC + subs) | ✓ | |
Strategic decisions (pricing, positioning, launch timing) | ✓ | |
Post-purchase workflows with non-standard rules | ✓ | |
Data work requiring external joins | ✓ | |
Regulated markets (EU tax logic, HS codes, data residency) | ✓ | |
Audit and review of AI-generated outputs | ✓ |
The 70/30 split is the emerging consensus: route the routine 70% to Sidekick, keep an agency (at a smaller retainer, or as on-call) for the 30% where judgment and cross-system context matter.
The worst outcome I've seen is merchants who try to force 100% — Sidekick for everything — save the retainer for one quarter, and then blow up the savings on one expensive mistake (the $40K Flow bug is mild compared to some of the horror stories). The second-worst outcome is merchants who ignore Sidekick entirely and keep paying agency rates for work an intern-level AI could do faster and cheaper.
What This Means for Shopify Agencies

The retainer model most Shopify agencies run on was built in 2018. It assumed a Plus merchant had recurring, routine development needs that justified $10K–$25K/month. Sidekick has collapsed the routine half of that work to zero marginal cost.
The agencies that are thriving in 2026 have already pivoted. They're doing three things:
Moving up the work stack. Less "execute the task" and more "design the system." Agency principals are taking on a role closer to fractional head of engineering than outsourced developer. That's more hours on architecture reviews, integration strategy, and AI governance (what prompts to use, what outputs to trust, how to review Sidekick's work before it ships).
Premium positioning on what Sidekick can't touch. Custom ERP integrations, B2B workflow design, data warehouse connections, regulatory compliance, complex post-purchase logic, performance optimization at scale. These are the remaining moats and they command higher rates now that the routine work is gone.
Productizing Sidekick governance. Some agencies now offer a service specifically around Sidekick — training the prompts, establishing guardrails, reviewing AI-generated outputs before they ship. It's a different kind of expertise but the merchants who've been burned once are willing to pay for it.
The agencies that are dying are the ones still pitching "we'll build this for you in 3 weeks for $8K" — because Sidekick does it in a day for free, and the merchant has already figured that out.
The Post-Purchase Gap — Where Sidekick Isn't Coming
One pattern worth highlighting for Plus operators specifically, because it's the area I watch most closely.
Post-purchase workflows — order edits, exchanges, address changes after fulfillment has started, partial refunds with loyalty-discount logic, B2B post-order adjustments, subscription swaps — are the category where the "standard Shopify pattern" Sidekick defaults to just doesn't cover the real merchant cases. The merchant doesn't want a standard pattern. The merchant wants "if the customer has the VIP tag and the order is under 48 hours old and the variant swap doesn't exceed 15% of order value, auto-approve the edit; otherwise route to CX with order context attached." That's a workflow Sidekick will attempt and produce something that looks right but silently drops half the conditions.
Post-purchase is also cross-system by nature — the edit has to reconcile with the 3PL (don't ship the old variant), the finance system (adjust the invoice), the CX platform (log what the customer requested), and the loyalty tool (keep the points entitlement intact). Sidekick handles the Shopify-side update. The rest still requires integration work.
This is the category Revize is on the Shopify App Store for — post-purchase edits with the rules and integrations the merchant actually needs, not the standard pattern the AI would default to. Because until Sidekick can see your ERP, your finance stack, and your loyalty rules simultaneously, post-purchase is a place where human-designed systems still win.
Our deeper walkthrough of the post-purchase stack under agentic commerce covers where this is headed as Shopify's MCP servers mature and agents start calling post-purchase APIs directly.
Frequently Asked Questions
Is Shopify Sidekick worth the upgrade for a Plus merchant?
Yes, and it's already included on Plus — there's no upgrade to pay for. Sidekick is a default feature across all Shopify plans as of Winter '26 Edition. The question is whether to use it, not whether to buy it. Plus merchants see the biggest ROI because the capability most useful to them (Shopify Flow workflow generation, custom analytics in ShopifyQL, bulk theme edits) is also the most expensive to replicate through an agency.
Can Shopify Sidekick actually replace a Shopify agency?
Partially. For the routine 70% of agency work — theme tweaks, simple Flows, analytics queries, app evaluation, photo editing — Sidekick is faster, cheaper, and often better than agency output. For the strategic 30% — custom integrations, multi-segment business logic, post-purchase workflows, data pipeline work, strategic positioning — agencies still win because they bring context Sidekick doesn't have.
What's the typical cost savings from adding Sidekick to a Plus store?
Among Plus merchants who've rebalanced their agency retainer after Sidekick, the typical pattern is cutting $5K–$15K/month from the retainer while keeping $3K–$8K/month for strategic work and on-call integration help. Total savings usually 40–70% on the agency line item, with the caveat that one expensive Sidekick mistake can wipe out a quarter of savings.
Does Sidekick work with Shopify Flow?
Yes — directly. You describe a workflow in plain English and Sidekick builds it in Shopify Flow. You can review the generated Flow before activating. It also has access to the new Flow editor improvements from Winter '26 (preview workflow results before activation, cancel running workflows), so Sidekick-generated Flows can be tested safely.
What are the biggest mistakes people make with Shopify Sidekick?
Three patterns. First, treating it as context-aware when it isn't — Sidekick doesn't know about your B2B pricing logic, your loyalty rules, or your integrations with non-Shopify systems. Second, trusting its confident tone on strategic decisions where tradeoffs matter. Third, not having a review process for AI-generated workflows and code before they ship — the $40K abandoned-cart bug is the canonical example.
How does Sidekick Pulse differ from regular Sidekick?
Sidekick Pulse is the proactive layer — it surfaces recommendations you didn't ask for, drawn from market trend data and your store's own performance signals. Regular Sidekick answers questions you pose. Pulse raises flags on trends (seasonal shifts, category-specific growth patterns, competitor activity that might affect your category) before you'd think to ask.
Can Sidekick build custom apps for my Shopify store?
Yes — Custom App Generation launched in Winter '26. Sidekick can scaffold full Shopify apps for specific use cases. For simple apps (internal tools, admin dashboards, one-off integrations) the output is usable with minor cleanup. For complex apps requiring sophisticated data models or multi-system integrations, Sidekick's output is a starting point that still needs developer review — not a finished product.
Does Shopify Sidekick handle multi-language or multi-currency scenarios?
Partially. Sidekick's theme edits and workflow generation work across Shopify Markets configurations, but localization quality varies by language. Japanese and Korean translations produced by Sidekick still need human review for honorifics and formality levels. European languages are more reliable. Currency and tax logic is surfaced correctly in analytics queries but Sidekick defers to your existing Markets configuration rather than proposing changes.
Is there a way to audit what Sidekick has done on my store?
Yes. Shopify logs Sidekick's actions in the admin activity log — every Flow created, theme edit, app install, and analytics query is recorded with the Sidekick-authored attribution. Plus stores can require approval for Sidekick-initiated changes, which is the recommended governance pattern for any store running Sidekick without an agency review layer.
How is Sidekick different from the old "Shopify Magic" features?
Shopify Magic (launched 2023) was mostly content generation — product descriptions, email subject lines, FAQ answers. Sidekick is a full operational agent that can execute multi-step tasks, build workflows, modify themes, generate apps, and coordinate across the admin. Magic answered prompts; Sidekick does jobs.
Can I share Sidekick skills with my team?
Yes. Skills (reusable prompts) can be saved by individual team members and shared across the team or with the broader Shopify community. This matters for Plus stores because "how do we want to phrase customer-service escalation emails" or "what's our standard abandoned-cart flow pattern" becomes a team asset rather than a knowledge silo.
What should I NOT use Sidekick for?
Four categories. First, anything requiring data or context outside Shopify (ERP, finance, CX platform integrations). Second, strategic decisions where tradeoffs and judgment matter. Third, workflows that touch B2B-specific logic, custom variant rules, or non-standard pricing. Fourth, post-purchase edge cases where a wrong answer causes customer-facing errors — these need a human-reviewed system, not a generated one.
Will agencies still exist in 3 years?
Yes, but different. The ones that survive are pivoting to architecture review, custom integrations, strategic consulting, and AI governance (training prompts, reviewing Sidekick outputs before they ship). The ones that don't survive are the "we'll build this for you in 3 weeks" shops — because Sidekick does that in a day for free, and every Plus merchant has figured that out.
Related Articles
Shopify AI Toolkit Guide 2026: Agents, MCP, and UCP Explained — the layer below Sidekick that's reshaping how agents interact with stores.
Shopify Checkout Extensibility 2026: Deadline, Migration, and What's Broken — another case where Plus merchants outsourced to agencies, and what's still breaking 6 months after the deadline.
Advanced Shopify Flow Workflows for 2025 — pre-Sidekick Flow patterns that are still the baseline for what "good" looks like.
How to Sell on ChatGPT with Shopify Agentic Storefronts — another Winter '26 launch reshaping what the merchant/agency boundary looks like.
There's a quiet experiment running in Shopify Plus operator Slacks right now.
It goes like this: take your $8K–$20K monthly agency retainer. Pause it for 90 days. Run every routine task through Sidekick — workflow builds, theme edits, analytics queries, app evaluation, photo cleanup, even custom app scaffolds. Track what Sidekick eats, what it chokes on, what the math actually says. Then decide whether the retainer comes back, comes back smaller, or comes back as something fundamentally different.
I've watched versions of this experiment play out on Plus stores since Sidekick's Winter '26 Edition landed in January. The honest answer isn't the one the shopify sidekick vs agency discourse wants you to hear. Sidekick is genuinely transformative on about 70% of the work a typical Plus merchant sends to an agency. On the other 30%, it silently breaks in ways that have already cost merchants real money — one of whom I'll walk through below. The verdict isn't "fire your agency" or "Sidekick is a gimmick." It's a framework for which 70 and which 30.

This article is the scorecard: what Sidekick actually does well, where it breaks, the $40,000 failure pattern I've now heard three separate times, the decision framework Plus operators are landing on, and what this means for the agency side of the equation.
What Sidekick Actually Does in 2026
Before the scorecard, worth establishing what this thing is in April 2026. Sidekick is no longer a chat widget. After the Winter '26 Edition it's a system-wide AI layer that spans the Shopify admin, mobile app, and Shop app, with a deliberately wide capability surface.
The features that matter operationally:
Sidekick Pulse — proactive recommendations drawn from market trends + your own store data. Less "ask a question, get an answer" and more "here's something worth looking at you wouldn't have thought to ask."
Workflow Automations — describe a workflow in plain English, Sidekick builds it in Shopify Flow. "When a first-time customer spends over $200 and the order contains a loyalty-eligible item, tag the order
VIP-onboard, trigger a delayed 5-day follow-up email, and notify the CX team in Slack." That prompt produces a working Flow without you touching the Flow canvas.Custom Analytics Reports — generate ShopifyQL queries + data visualizations from a natural-language request. "Show me AOV by customer cohort for repeat customers in Q1, segmented by acquisition channel" produces a real dashboard.
Theme Edits Generation — describe a design change, Sidekick updates the theme. Color tweaks, copy updates, section reorganizations, padding fixes.
Custom App Generation — Sidekick can now scaffold entire apps specific to your business. This is new in Winter '26 and it's better than people realize.
Block Generation for All Themes — create custom blocks in any Shopify Theme Store theme, not just your primary one.
Enhanced Memory + Skills — Sidekick remembers your preferences, prior conversations, and lets you save reusable prompts ("skills") that you can share across the team.
Multi-Step Task Completion — breaks a complex request into a plan, executes the plan, checks in between steps.
The capability list is long. On first read it feels overwhelming. In practice, 80% of the daily operator value comes from the first four bullets.
Where Sidekick Wins Against Agencies — Fast, Clearly, Repeatedly
The wins are real and they're not subtle.
Turnaround time collapses. A theme copy change that goes through an agency queue takes 48–72 hours best case — request written, triaged, assigned, dev work, QA, review, deploy. With Sidekick it's two minutes in the admin. A simple Flow automation that used to eat a half-day of agency time is now a 15-minute prompt-and-refine session. For Plus merchants who ship constantly, this is the single biggest shift.
Routine analytics stops being a ticket. "How much revenue did we do from the Horizon theme launch last quarter, segmented by landing page?" That query used to go into the agency queue and come back with a report two days later. Now you type it into Sidekick in the admin and have the answer in 40 seconds. For a team that runs 20–50 analytical asks a month, Sidekick is functionally a free in-house analyst.
Photo and content work is 10× faster. Studio-quality background changes, object removal, canvas expansion, banner variants for A/B tests. Plus merchants used to pay agencies retainers that included creative production; Sidekick handles most of this in the browser now.
App discovery and evaluation becomes structured. Sidekick can compare apps on functionality, reviews, integration footprint, and even install one and verify the setup. Plus merchants who used to rely on agency recommendations for app selection now get a second opinion instantly.
Simple Flow automation becomes trivial. Abandoned cart recovery, tag-based segmentation, first-order thank-you emails, review requests timed to delivery — all of these are one-prompt builds now. They were 4–8 hours of agency work each.
Fine-grained version of the pattern: Sidekick wins on work where the problem is well-specified, the solution space is standard, and there's no cross-system state to reason about. The 70% of the agency queue that fits that description is gone.
Where Sidekick Silently Breaks — Six Patterns Plus Operators Keep Running Into

This is where the honesty gets uncomfortable for people selling AI-replaces-everything narratives. Six patterns come up in every Plus merchant post-mortem I've seen.
1. Cross-system orchestration. Sidekick sees Shopify. It doesn't see your ERP, your 3PL's WMS, your finance system, your CX platform, or your data warehouse. The moment a workflow needs to coordinate state across systems — an order edit that needs to sync back to NetSuite, a refund that needs to adjust commissions in the finance tool, a return that needs to trigger a specific workflow in Gorgias — Sidekick's helpful answer is "here's the Shopify part." The other 60% of the work still needs human orchestration.
2. Judgment calls. "Should we run this promo this week?" Sidekick will give you a confident answer. It will not give you the tradeoffs. It won't tell you the promo cannibalizes next week's revenue, or that the discount structure conflicts with a contract with your top wholesale customer, or that the margin math assumes your vendor hasn't raised unit costs. These are the decisions that should be human, and Sidekick treating them as "just tell me and I'll execute" is how expensive mistakes happen.
3. Custom UX logic. Plus merchants have these — B2B variant visibility rules, tiered pricing that depends on login state, custom attribute-driven checkout flows, region-specific payment method gating. Ask Sidekick to modify these and it reverts to the "standard Shopify pattern" answer. The bespoke logic gets quietly discarded. This is an entire class of bug that looks like it worked but broke your edge cases.
4. Data pipeline work. ShopifyQL is powerful inside the Shopify data model. It can't join to your Google Analytics data, your Northbeam attribution, your Klaviyo engagement scores, your subscription churn tables in Stripe. Sidekick writes great ShopifyQL. It cannot answer a question that requires external data — and when asked, it'll answer confidently using only the Shopify-native slice. The answer will be wrong for reasons the operator may not immediately spot.
5. Post-purchase workflows with edge cases. Order edits, exchanges, address changes, partial refunds with complex loyalty-discount logic, multi-line swaps with different tax implications. Sidekick handles the easy versions — the ones with a standard Shopify-native path. The edge cases that drive the majority of support tickets are precisely the ones where it punts or, worse, produces a clean-looking answer that doesn't handle the edge.
6. Strategic positioning. "Should we reposition our Pro plan to emphasize B2B?" Sidekick will give you a crisp, confident answer. It won't know your competitive landscape, your team's capacity to execute, your sales pipeline, your founder's intuition about the market. These are not AI problems. Treating them as AI problems produces confidently wrong strategy documents.
Pattern across all six: Sidekick is brilliant at work inside Shopify's data model where the solution space is standard. It breaks the moment the work needs context Shopify doesn't hold.
The $40,000 Miss — One Specific Failure Pattern

The numbers vary but the story is now common enough that I've heard three versions of it. Here's one.
A Plus apparel brand on a seven-figure run-rate, ~15% of revenue from B2B wholesale. Two months into their Sidekick-replaces-agency experiment, they ask Sidekick to build an abandoned-cart recovery Flow. Sidekick does it in four minutes. The Flow triggers on abandoned checkout, waits six hours, sends a templated email with the cart contents, includes a 10% discount code "to bring them back."
The Flow works perfectly for retail customers. It works incorrectly for wholesale customers. Wholesale customers log into the same Shopify storefront but see B2B-specific pricing (roughly 40% below retail) applied through a metafield-driven price list. The abandoned-cart email shows the B2B price and adds a 10% discount on top. Wholesale customers received the email showing a price that was effectively 50% below retail.
Within 72 hours, several wholesale accounts had forwarded the email to their sales contact demanding the quoted price be honored on the next PO. The brand could either (a) honor a discount they hadn't intended to extend and that wrecked the margin on the subsequent orders or (b) explain to their biggest wholesale partners that the email from the brand's own domain was a bug. They chose (a). The calculated impact across the affected accounts over the quarter: roughly $40K in unplanned discounts.
Sidekick didn't know about the B2B pricing logic. The Flow had no way to check login state or customer tag or B2B membership. The operator who asked Sidekick for the Flow had no reason to think of those edge cases because "build an abandoned-cart email" is a task that, for a retail-only store, is genuinely a five-minute build. But on any multi-segment store — B2B + DTC, subscription + one-time, VIP + first-time — Sidekick's helpful confidence is a liability.
The agency this brand had worked with previously would have caught this. Not because the agency is smarter — because the agency knew the store. That context is the thing Sidekick can't substitute for, and the thing merchants are surprised every time to find they actually need.
The Decision Framework — When to Use Each

After watching these experiments, the rule Plus operators keep landing on is:
Sidekick for work where the problem is well-specified and context-free. Agency (or in-house senior) for work where context is load-bearing.
More concretely:
Use case | Sidekick wins | Agency / senior wins |
|---|---|---|
Theme copy + layout tweaks | ✓ | |
Simple Flow automations (abandoned cart, tagging, emails) | ✓ | |
Analytics queries inside Shopify data | ✓ | |
App discovery and comparison | ✓ | |
Photo editing and variant creation | ✓ | |
Standard integrations (Shopify → Klaviyo, Shopify → Postscript) | ✓ | |
Custom integrations requiring your ERP / 3PL / finance stack | ✓ | |
Multi-segment business logic (B2B + DTC + subs) | ✓ | |
Strategic decisions (pricing, positioning, launch timing) | ✓ | |
Post-purchase workflows with non-standard rules | ✓ | |
Data work requiring external joins | ✓ | |
Regulated markets (EU tax logic, HS codes, data residency) | ✓ | |
Audit and review of AI-generated outputs | ✓ |
The 70/30 split is the emerging consensus: route the routine 70% to Sidekick, keep an agency (at a smaller retainer, or as on-call) for the 30% where judgment and cross-system context matter.
The worst outcome I've seen is merchants who try to force 100% — Sidekick for everything — save the retainer for one quarter, and then blow up the savings on one expensive mistake (the $40K Flow bug is mild compared to some of the horror stories). The second-worst outcome is merchants who ignore Sidekick entirely and keep paying agency rates for work an intern-level AI could do faster and cheaper.
What This Means for Shopify Agencies

The retainer model most Shopify agencies run on was built in 2018. It assumed a Plus merchant had recurring, routine development needs that justified $10K–$25K/month. Sidekick has collapsed the routine half of that work to zero marginal cost.
The agencies that are thriving in 2026 have already pivoted. They're doing three things:
Moving up the work stack. Less "execute the task" and more "design the system." Agency principals are taking on a role closer to fractional head of engineering than outsourced developer. That's more hours on architecture reviews, integration strategy, and AI governance (what prompts to use, what outputs to trust, how to review Sidekick's work before it ships).
Premium positioning on what Sidekick can't touch. Custom ERP integrations, B2B workflow design, data warehouse connections, regulatory compliance, complex post-purchase logic, performance optimization at scale. These are the remaining moats and they command higher rates now that the routine work is gone.
Productizing Sidekick governance. Some agencies now offer a service specifically around Sidekick — training the prompts, establishing guardrails, reviewing AI-generated outputs before they ship. It's a different kind of expertise but the merchants who've been burned once are willing to pay for it.
The agencies that are dying are the ones still pitching "we'll build this for you in 3 weeks for $8K" — because Sidekick does it in a day for free, and the merchant has already figured that out.
The Post-Purchase Gap — Where Sidekick Isn't Coming
One pattern worth highlighting for Plus operators specifically, because it's the area I watch most closely.
Post-purchase workflows — order edits, exchanges, address changes after fulfillment has started, partial refunds with loyalty-discount logic, B2B post-order adjustments, subscription swaps — are the category where the "standard Shopify pattern" Sidekick defaults to just doesn't cover the real merchant cases. The merchant doesn't want a standard pattern. The merchant wants "if the customer has the VIP tag and the order is under 48 hours old and the variant swap doesn't exceed 15% of order value, auto-approve the edit; otherwise route to CX with order context attached." That's a workflow Sidekick will attempt and produce something that looks right but silently drops half the conditions.
Post-purchase is also cross-system by nature — the edit has to reconcile with the 3PL (don't ship the old variant), the finance system (adjust the invoice), the CX platform (log what the customer requested), and the loyalty tool (keep the points entitlement intact). Sidekick handles the Shopify-side update. The rest still requires integration work.
This is the category Revize is on the Shopify App Store for — post-purchase edits with the rules and integrations the merchant actually needs, not the standard pattern the AI would default to. Because until Sidekick can see your ERP, your finance stack, and your loyalty rules simultaneously, post-purchase is a place where human-designed systems still win.
Our deeper walkthrough of the post-purchase stack under agentic commerce covers where this is headed as Shopify's MCP servers mature and agents start calling post-purchase APIs directly.
Frequently Asked Questions
Is Shopify Sidekick worth the upgrade for a Plus merchant?
Yes, and it's already included on Plus — there's no upgrade to pay for. Sidekick is a default feature across all Shopify plans as of Winter '26 Edition. The question is whether to use it, not whether to buy it. Plus merchants see the biggest ROI because the capability most useful to them (Shopify Flow workflow generation, custom analytics in ShopifyQL, bulk theme edits) is also the most expensive to replicate through an agency.
Can Shopify Sidekick actually replace a Shopify agency?
Partially. For the routine 70% of agency work — theme tweaks, simple Flows, analytics queries, app evaluation, photo editing — Sidekick is faster, cheaper, and often better than agency output. For the strategic 30% — custom integrations, multi-segment business logic, post-purchase workflows, data pipeline work, strategic positioning — agencies still win because they bring context Sidekick doesn't have.
What's the typical cost savings from adding Sidekick to a Plus store?
Among Plus merchants who've rebalanced their agency retainer after Sidekick, the typical pattern is cutting $5K–$15K/month from the retainer while keeping $3K–$8K/month for strategic work and on-call integration help. Total savings usually 40–70% on the agency line item, with the caveat that one expensive Sidekick mistake can wipe out a quarter of savings.
Does Sidekick work with Shopify Flow?
Yes — directly. You describe a workflow in plain English and Sidekick builds it in Shopify Flow. You can review the generated Flow before activating. It also has access to the new Flow editor improvements from Winter '26 (preview workflow results before activation, cancel running workflows), so Sidekick-generated Flows can be tested safely.
What are the biggest mistakes people make with Shopify Sidekick?
Three patterns. First, treating it as context-aware when it isn't — Sidekick doesn't know about your B2B pricing logic, your loyalty rules, or your integrations with non-Shopify systems. Second, trusting its confident tone on strategic decisions where tradeoffs matter. Third, not having a review process for AI-generated workflows and code before they ship — the $40K abandoned-cart bug is the canonical example.
How does Sidekick Pulse differ from regular Sidekick?
Sidekick Pulse is the proactive layer — it surfaces recommendations you didn't ask for, drawn from market trend data and your store's own performance signals. Regular Sidekick answers questions you pose. Pulse raises flags on trends (seasonal shifts, category-specific growth patterns, competitor activity that might affect your category) before you'd think to ask.
Can Sidekick build custom apps for my Shopify store?
Yes — Custom App Generation launched in Winter '26. Sidekick can scaffold full Shopify apps for specific use cases. For simple apps (internal tools, admin dashboards, one-off integrations) the output is usable with minor cleanup. For complex apps requiring sophisticated data models or multi-system integrations, Sidekick's output is a starting point that still needs developer review — not a finished product.
Does Shopify Sidekick handle multi-language or multi-currency scenarios?
Partially. Sidekick's theme edits and workflow generation work across Shopify Markets configurations, but localization quality varies by language. Japanese and Korean translations produced by Sidekick still need human review for honorifics and formality levels. European languages are more reliable. Currency and tax logic is surfaced correctly in analytics queries but Sidekick defers to your existing Markets configuration rather than proposing changes.
Is there a way to audit what Sidekick has done on my store?
Yes. Shopify logs Sidekick's actions in the admin activity log — every Flow created, theme edit, app install, and analytics query is recorded with the Sidekick-authored attribution. Plus stores can require approval for Sidekick-initiated changes, which is the recommended governance pattern for any store running Sidekick without an agency review layer.
How is Sidekick different from the old "Shopify Magic" features?
Shopify Magic (launched 2023) was mostly content generation — product descriptions, email subject lines, FAQ answers. Sidekick is a full operational agent that can execute multi-step tasks, build workflows, modify themes, generate apps, and coordinate across the admin. Magic answered prompts; Sidekick does jobs.
Can I share Sidekick skills with my team?
Yes. Skills (reusable prompts) can be saved by individual team members and shared across the team or with the broader Shopify community. This matters for Plus stores because "how do we want to phrase customer-service escalation emails" or "what's our standard abandoned-cart flow pattern" becomes a team asset rather than a knowledge silo.
What should I NOT use Sidekick for?
Four categories. First, anything requiring data or context outside Shopify (ERP, finance, CX platform integrations). Second, strategic decisions where tradeoffs and judgment matter. Third, workflows that touch B2B-specific logic, custom variant rules, or non-standard pricing. Fourth, post-purchase edge cases where a wrong answer causes customer-facing errors — these need a human-reviewed system, not a generated one.
Will agencies still exist in 3 years?
Yes, but different. The ones that survive are pivoting to architecture review, custom integrations, strategic consulting, and AI governance (training prompts, reviewing Sidekick outputs before they ship). The ones that don't survive are the "we'll build this for you in 3 weeks" shops — because Sidekick does that in a day for free, and every Plus merchant has figured that out.
Related Articles
Shopify AI Toolkit Guide 2026: Agents, MCP, and UCP Explained — the layer below Sidekick that's reshaping how agents interact with stores.
Shopify Checkout Extensibility 2026: Deadline, Migration, and What's Broken — another case where Plus merchants outsourced to agencies, and what's still breaking 6 months after the deadline.
Advanced Shopify Flow Workflows for 2025 — pre-Sidekick Flow patterns that are still the baseline for what "good" looks like.
How to Sell on ChatGPT with Shopify Agentic Storefronts — another Winter '26 launch reshaping what the merchant/agency boundary looks like.
Revize your Shopify store, and lead with
customer experience
© Copyright 2024, All Rights Reserved
Revize your Shopify store, and lead with
customer experience
© Copyright 2024, All Rights Reserved
Revize your Shopify store, and lead with
customer experience
© Copyright 2024, All Rights Reserved
Revize your Shopify store, and lead with
customer experience
© Copyright 2024, All Rights Reserved



