The CDP Is the AI
The article examines the gap between cutting-edge AI in messaging systems at large platforms and what brands can actually purchase and implement. It highlights that most brands struggle with data quality and organization, while frontier systems use sophisticated reinforcement learning and adaptive models to optimize message delivery. The key distinction is between decision support tools (like send-time optimization) and true decision-making AI. The real gains come from sending fewer, more targeted messages, which requires a solid data foundation. Ultimately, the Customer Data Platform (CDP) itself is the AI.
Contents
The frontier
What brands can actually buy: support versus decisioning
Most brands aren't using any of it
Why the gap exists and is widening
Consequences
Cross-industry parallels
Open questions
What this means for the lifecycle role, and where it ends up
I was at a Bloomreach breakfast recently, a room full of fintech marketers from brands large and small, all there to talk about "AI". We didn't, really. Or rather, we talked about it the way you talk about a holiday you can't afford yet. The recurring theme, from everyone in the room regardless of company size, was getting their data into a usable shape in the first place. Nobody was stuck on which model to use or which clever feature to switch on. They were stuck on the fact that their data is a mess, spread across half a dozen systems that don't talk to each other, and until that's fixed none of the clever stuff means anything.
That's the whole piece, really, and I could stop there. But the gap that room kept bumping into is bigger and more structural than "we need to sort our data out", and it's widening.
At one end of the market there's a small number of consumer platforms running production machine learning for messaging that is several capability generations ahead of anything a brand can buy. At the other end there's the long tail of mid-market and SMB brands whose access to ML in their lifecycle stack runs from "a few predictive features in our ESP" to "nothing we've actually turned on". In between, a newer category of product is trying to sell brands something much closer to the frontier. The catch, and it's the catch the breakfast table kept circling, is that all of it depends on having your data in order, and most brands don't.
The thing the trade press calls "channel maturity" is more accurately described as sorting by AI sophistication. Some senders can afford to make a channel work. Most can't. And increasingly, the deciding factor isn't budget or headcount. It's whether your data is in a state that lets you do anything at all.
The frontier
Pinterest's notification system
A handful of consumer technology companies publish peer-reviewed work on production notification and messaging systems. These are not white papers or vendor case studies. These are KDD, RecSys, WSDM and CIKM submissions, written by PhD-staffed teams who have to put their methodology in front of academic reviewers and answer for it. The papers aren't exhaustive descriptions of what the companies do internally (they never are), but they're a useful floor on capability. The internal systems are at least as good as what gets published, usually better, and the act of publishing signals that the team has the organisational backing to do this work seriously.
A quick tour of the visible frontier:
Pinterest set a weekly notification budget per user, optimising against long-term site engagement rather than click-through, on the finding that the incremental value of a notification is highest for casual users; the heavy openers have high click-through because they engage with everything, not because the notification moved them.1
Duolingo used a bandit algorithm to pick which reminder template to send each user, and reported a 0.5% lift in daily active users and a 2% lift in new-user retention over a strong baseline.2
Twitter used model-based reinforcement learning to decide whether to send a push at all, modelling the effect over a multi-day horizon. The published trade-off is the interesting part: the settings that cut volume hardest pushed open rate up by as much as 14%, but those same settings reduced daily active users; only the most conservative setting, an open-rate gain of about 8%, improved daily actives at all, and then by 0.2%. Maximising the headline number and serving the real objective pointed in opposite directions.3
LinkedIn framed notification decisioning as offline reinforcement learning, a Double Deep Q-Network with a conservatism penalty, trained on logged data and deployed: sessions up a quarter of a percent, click-through up a couple of points, notification volume down, all at once.4 By 2026 the same lineage had reached email: BanditLP pairs neural Thompson Sampling with a linear program large enough for billions of variables to choose, under business constraints, what each member is sent.5
Zillow governs email and push volume with a boosted-tree classifier deciding send-or-don't per user, tuned to keep 98% of the clicks while shedding the surplus sends and the unsubscribes they cause. No reinforcement learning required, which is its own lesson: the cheapest method on this list still wins by sending less.6
Meta treated Instagram's notification slots as an auction: the 550-plus internal teams that want to message you bid against each other (with the platform able to subsidise bids) so no single user is flooded by competing product surfaces. In test it sent slightly fewer notifications, lifted click-through and left reach untouched, across 77 million users per arm.7
And the frontier keeps moving. PushGen, deployed at Kuaishou, the Chinese short-video platform, and presented at WSDM in February 2026, generates push copy with an LLM under style controls, then ranks the candidates with a learned reward model that predicts click-through and picks the winner, across hundreds of millions of users a day.8 Pinterest's TransAct points the same way: a transformer reading a user's realtime activity, now feeding ranking across Homefeed, Search and Notifications, with push open rate and email click-through up a point or two each.9
These systems share four characteristics:
Built on first-party event streams collected by the platform owner at full session-level granularity.
Operated by in-house engineering teams that include researchers, ML engineers and platform engineers in numbers most B2C brands could not staff if they tried.
Tuned against the platform's own long-term value functions, sessions, weekly actives, multi-day retention, sitewide engagement, rather than the open or the click.
Premised on the user's response being something that changes as a function of the messages sent, rather than a fixed signal to be exploited.
That last characteristic is the one almost everything sold to brands gets wrong. A static model asks which users are most likely to open or buy and aims at them. An adaptive one asks a different question: how does sending this, now, change what this user does, against the version of them you left alone? The response isn't a fixed trait you discover and exploit. It's an outcome the message itself moves, up for some people and down for others, which is why the same model has to be willing to send nothing at all. Optimise for who looks likely to engage and you keep messaging the people who were going to engage regardless, while quietly annoying the ones a message pushes the wrong way.
Duolingo's notification architecture
The lift on the real objective, sessions or daily actives, is almost always under a single percent; the bigger-looking gains sit on proxy metrics like open rate. The frontier's advantage isn't enormous lift. It's that a platform with hundreds of millions of users can bank a reliable 0.3% on sessions, which is a vast sum in absolute terms, while a mid-market brand chasing the same 0.3% on a fraction of the base can't justify the engineering to capture it. The gap is one of scale economics, not magnitude.
The gains nearly all come from sending less, or more precisely from sending differently. Volume falls while engagement holds or rises. The comparison is relative, mind: these platforms are trimming a per-user volume that already sits well above what most brands send, so "cutting volume" starts from a regime the long tail was never in. And even within that regime, "send less" is too blunt a reading, because these systems aren't cutting uniformly; they're reallocating, more to some users, none to others. Twitter's best setting actually raised the per-user send ceiling even as total sends dropped, because the policy got more selective about who was worth the headroom. The skill is knowing whom not to message, which is harder to learn than whom to message and needs exactly the per-user signal the long tail lacks. Uplift studies make the same point: the cumulative incremental effect peaks well before you've reached the whole list, usually far earlier than intuition suggests, and past that peak more targeting reduces it, because you start reaching people the message turns off rather than wins over. Send to the whole list and you are well into zero-or-negative territory.
What brands can actually buy: support versus decisioning
The distinction most often collapsed here is between machine learning that supports a marketer's decision and machine learning that makes it.
Most of the ML that lands in a brand's lifecycle stack is decision support. A smaller, newer category is decision making. Conflating the two is how you end up badly overestimating what the average brand is actually running.
Decision support is what the big customer engagement platforms ship:
Send-time and best-channel optimisation that pick the hour and the route for a message you already decided to send.
Predictive scores for churn, conversion, purchase propensity and lifetime value that you can build a segment around.
Predictive RFM and other "AI" segments that group users by recency, frequency, spend or likely next action.
Product recommendations off purchase history.
Subject-line suggestions and generative copy, the specialism Phrasee, now Jacquard, built on well before the LLM wave, generating brand-voice variants and ranking them on past performance.
Frequency capping and send-volume governing.
Most of these do something. Whether they do much is harder to say: whether send-time optimisation actually beats sending at a sensible fixed hour, for instance, rests almost entirely on vendor case studies, while the independent academic work that exists tends to favour fixed, scheduled times and to warn about notification overload.10 But the common thread, lift or no lift, is that the marketer still designs the journey. The ML scores, suggests, ranks and optimises at specific points inside a structure the human built. It assists the decision. It doesn't make it.
This is the bracket almost every vendor sits in:
Salesforce Marketing Cloud's Einstein, with send-time, content selection and predictive scoring, now wrapped into the Agentforce branding.
Adobe CX Enterprise (rebranded from Experience Cloud at Summit 2026), with Sensei, the AI Assistant, predictive audiences, automated send-time, content selection, generative variants, and the new tier of AI Coworkers and purpose-built agents.
Klaviyo's predictive analytics for CLV and churn, plus send-time, subject-line optimisation and generative copy.
Iterable's predictive goals, send-time optimisation, Brand Affinity and channel optimisation.
HubSpot's Breeze across the workflow.
Emarsys, now folded into SAP Engagement Cloud, with its AI Scores for conversion, churn and spend, continuously-updated predictive segments, Predict product recommendations, and a growing pile of generative helpers like natural-language catalogue search and an AI report builder.
Braze's Sage AI for send-time, channel, predictive churn and content suggestions.
The likes of CleverTap, MoEngage and Insider play here too, though they're more niche, strong in mobile-first and in particular regions rather than across the board.
If you read three of these vendors' marketing pages back to back you can't tell them apart. The vocabulary is identical and the screenshots are interchangeable. That isn't a coincidence. There's a finite set of things a multi-tenant ML feature can do well, and the vendors have all converged on roughly the same set: send-time, predictive segments, generative copy, journey-level orchestration with some scoring at the branch points
[truncated for AI cost control]