By Lana Cook
Imagine you have a new employee working at the front desk of your cozy hotel in the Alps. A guest, freshly landed, sends a message: “I heard you have rooms with breathtaking views. What do they look like, and how much would it cost to upgrade?”
To seize this upsell opportunity, your new employee must:
- Check which rooms with views are available
- Find images of those rooms, ideally showcasing the scenery from the window
- Select the one that would be the best fit for the guest
- Write an enticing description of the room’s features and the view
- Inform the guest of the upgrade cost
Even if all the information is in the PMS, consolidating everything takes time and can be especially difficult for new staff. And if images or room details aren’t readily accessible, it can slow down even the most experienced team member.
Now, imagine there’s a tool that understands the inquiry, scans hundreds of files in seconds, pulls the relevant images, generates a description, checks availability and pricing, and packages everything up for you.
All you need to do is review, make any final tweaks, and send it off to the guest. Better yet, picture this entire exchange happening between the guest and a chatbot, with no human intervention.
That’s the power of multimodal AI.
Multimodal AI represents the next generation of AI systems that can process and integrate different data types and formats simultaneously. For example, while a traditional AI-powered chatbot might provide room availability from structured data and a list of features from a library of text descriptions, multi-modal AI brings together all the information stored within a hotel’s systems and combines it automatically to generate outputs in multiple formats that are richer and more contextually relevant. What is multi-modal AI?
As Nikhil Shah, Cloudbeds’ Head of Data Science, explains, multimodal AI “transforms absolutely any type of content into numerical data points, which reside within what’s known as an embedding space. There, we have vector-based representations of the content, whether it’s room images, maintenance, voice notes, or training materials. This enables hoteliers to create a comprehensive library of knowledge specific to their hotel and locate anything from a single search bar.”
Listen to Nikhil and Eric Ellis, Senior Director of UX Design at Cloudbeds, discuss multi-modal AI during Passport 2024.
A simple example of multimodal AI in action is asking platforms like ChatGPT to create an image from a textual prompt or build a chart from a data set. However, multimodality extends to integrating diverse data points.
For instance, if I ask ChatGPT, “I’ll be traveling to Paris next weekend. I enjoy museums, love good food, and prefer walking tours. Can you create a two-day plan for me, also considering the weather forecast?” The exchange may appear purely textual, but it integrates multiple data points: my interests, the destination, the timeframe, and the weather.
Applications across industries
The global multimodal Al market size is expected to reach $8.4 billion by 2030, with a market growth of 32.3% CAGR. Here are a few examples of how other industries are using it.
Healthcare
If anomalies are detected in an X-ray, multimodal AI integrates this with results from other medical exams, the genetic markers and current conditions of the patient, to assess risk and suggest a course of action for a complete diagnosis.
eCommerce
Multimodal AI integrates data from different sources—in-store cameras, website interactions, purchase patterns, and social media—to gain deeper insights into customer behavior, personalize their shopping experience and recommend products.
Agriculture
Farmers already use satellite imagery, soil sensor readings, and weather data to optimize practices and inputs. Multimodal AI links this data into a unified system, enabling them to correlate satellite imagery of crop health with soil conditions and weather forecasts to make precise decisions about irrigation, fertilization, or pest control.
Transportation
GPS for precise location data and route optimization is standard in fleet management. Multimodal AI enhances this by integrating live video feeds to identify obstacles, radar to measure the speed and distance of nearby objects, and lidar (Light Detection and Ranging) to create accurate 3D maps of the surroundings, making operations safer and more efficient.
Finance
Multimodal AI strengthens fraud detection by combining transaction details, spending patterns, geolocation, and surveillance data. For instance, if a credit card transaction occurs in a foreign country while the user’s phone geolocation shows them in their home city, the AI flags the activity as suspicious, cross-checking it against historical spending patterns and ATM footage (if available) to identify potential fraud.
Why is multimodal AI important for hotels?
As Cloudbeds’ CEO Adam Harris said on the Hotel Tech Insider podcast, a successful hotel operation relies on five pillars. The first three have always existed: finding and retaining customers, improving the guest journey, and improving operations. The fourth pillar is data. Thanks to digitalization, there’s no lack of it in hotels.
However, this abundance risks being siloed across too many platforms. “Data is everywhere. The average number of systems that are powering a hotel is 19 right now,” Harris noted. This challenge creates the need for the fifth pillar: intelligence—the ability to control all the data points generated by the other four and derive insight from them. “That’s where harnessing the correct forms of AI creates superhuman front desk staff, better guest experiences, better ways of reaching guests, more revenues from guests, and better customer journeys overall,” said Harris.
Multimodal AI is one of these essential technologies. The ability to communicate using different formats has always been a hallmark of human interaction. Multimodality transforms AI from smart software into an expert assistant, bringing it closer to how humans think and operate.
Multimodal AI systems are more resilient to noise and missing data. If one modality is unreliable, unavailable, or incomplete, the system can depend on others to ensure consistent and accurate outputs.
5 ways hotels can use multi-modal AI
The potential of multi-modal AI in hospitality is endless. Here are five examples of its possible uses.
1. Answering guest queries
If a guest asks via chat what equipment the gym offers, the AI can instantly locate stored images or videos of the gym, generate a descriptive text based on them, and send it to the guest. This approach combines visual and textual data to deliver accurate, engaging responses quickly, enhancing the guest experience.
2. Upselling amenities and services
If a traveler enquires about family-friendly room options, the AI can combine visual and contextual data (e.g., room availability and the number of travelers) to present an image of a standard room while suggesting an upgrade to a family room, highlighting its additional space and features. If the traveler accepts, the AI can confirm the booking and send a payment link automatically.
3. Improving event planning
Multimodal AI can review images or videos of layouts from past events alongside guest feedback to identify what worked best. This helps teams replicate successful setups more efficiently, whether for corporate events, weddings, or conferences.
This functionality also supports room configurations. For example, if a couple is celebrating an anniversary, AI can detect this note in their guest profile and alert housekeeping. The team can then configure the room based on past setups – including a chilled bottle of champagne and rose petals.
4. Maintenance
Multimodal AI can analyze diverse data sources—maintenance logs and images or videos —to detect and address problems in hotel rooms, such as clogged drains or weak wifi.
Say that a guest reports a leaky tap to the front desk, maintenance can request a picture ahead of investigating the issue to determine a plan of action before heading up to the room. After the issue is fixed, the front office can then thank the guest for their patience and offer a voucher to compensate for the inconvenience caused.
5. Ad optimization
Before launching a marketing campaign, multimodal AI can analyze data from previous ones to identify which visuals and messages performed best. This information is used to create highly engaging ads and landing pages.
Cloudbeds: Using multi-modal AI to transform hotel operations
Cloudbeds Intelligence, coming in 2025, is an AI layer built into the Cloudbeds platform that leverages casual and multimodal AI to boost revenue, optimize time and costs, and improve the guest experience.
Cloudbeds Intelligence leverages multimodal AI to transform staff training and enablement, helping teams standardize processes and respond to guest queries fast to improve satisfaction and capitalize on revenue-generating opportunities.