We usually think about AI inside chat as text in, text out. That model works, but it is also limiting. Some tasks are much better when an agent can generate an interface, control it and show the result to the user instead of trying to present everything in text: charts, 3D scenes, dashboards, forms, maps, custom visualizations, and so on.
What are MCP Apps?
MCP Apps are a way for agents to not only call tools, but also produce UI that can be rendered for the user. Instead of returning only text, an agent can trigger an app-like experience through tool-calling. This way the agent can:
- Create UI experiences dynamically
- Control what the user sees
- Update the UI based on tool outputs
- Render interactive components that are much richer than plain text
This turns the agent from “something that responds” into “something that can orchestrate and present apps”.
Why bring this to WhatsApp?
WhatsApp is a natural place to talk to an agent. But the UI options exposed by Meta are limited to the components the platform gives you. With MCP Apps however, the agent is no longer limited to native chat components. The conversation can still begin in WhatsApp, but when the task requires something richer, the agent can generate a UI and show the user a dedicated webview page embbeded inside the conversation.
WhatsApp now becomes the entry point, not the limit.
How the demo works
This demo connects a WhatsApp webhook to a CopilotKit agent with MCP Apps support. At a high level:
- A user sends a message on WhatsApp
- WhatsApp webhook receives the message
- Backend forwards the request to a CopilotKit agent
- Agent streams events while it reasons and calls tools
- If the agent decides to call an MCP App, the backend intercepts that MCP App event
- MCP App snapshot is saved
- Backend generates a viewer URL tied to that snapshot
- User receives an interactive call-to-action URL message on WhatsApp with a button to open the generated app
- User opens the viewer page and sees the UI created by the agent
The viewer is a React Component that handles the rendering of the MCP App snapshot. When the page loads, it fetches the stored activity event and uses the useCopilotKit and useAgent hooks to set up the agentic context for the specific thread and message that generated the app, while the actual UI is rendered using the MCPAppsActivityRenderer component.
This pattern is useful well beyond whatsapp
Any messaging platform can become the front door to an full-stack agentic application. The user starts in chat, but the experience can expand into generated UI whenever the task calls for it.