How can I design a Data Model?
Before we answer that question, it’s worth looking back at the traditional web analytics model we see most often used in analytics. This model is most commonly used by analytics teams investigating user behavior on the web.
In this example, we can see the data summarized into three tables: page views, sessions and users. Page views are aggregated into sessions- with one row of data per session- and we might also have a table for users if we had reliable user ids over time.
This format, while great because it's affordable, may also present a few limitations: the sessions themselves may not be on point with your business model or needs.
- What can we expect from a session when people are using mobile devices?
- How are sessions measured in a reliable way for businesses with multiple?
It's difficult to measure sessions in a reliable way for businesses with multiple target users. For example, let's say that one customer's user journey starts at your site and ends two months later on a different website you have no control over.
- When it comes to unit of interaction, page views are not always the ideal measurement. Instead, let’s look at other measurements like searches or video engagement.
- SQL skills are a critical part of analyzing and storing data. Without SQL skills, you can't do anything more than what is available to you on the surface.
- Different departments in your organization may not be able to extract value from the data this way.
It's important to keep these limitations in mind, but it's not all bad news. Whatever you can do with the rewriter is still beneficial—there's something here that we can work with if we don't have much choice.
When using this traditional model, it's important to consider who the table is capturing. Page views could be classified as an engagement level that a user has with the platform. A session of interactions over time could be seen as a cycle. At the highest level, a user can be classified as an entity - someone involved in an interaction within your business.
The framework can be used as-is or it can be expanded to accommodate other use cases and platforms. We break down this into three parts: the interactions level, the cycle level and the entity level. When we look at interactions, we see one row per interaction. For example, page views would be a typical interaction.
Dimensions related to interaction:
- Referrer: timestamps
Dimensions related to the cycle include:
- Session ID, session index
- Funnel, Funnel step
Dimension related to the entities:
- Page titles, url
- User ID, location
Interaction metrics:
- Average time spent on pages, scroll depth
But page views are just one way to measure the success of your content. Other measures might include:
- Clicks
- Screen views
- Ad Impressions
- Email opens
- Bids
- Transactions
- Form changes
- Purchases
- Searches
- Logins
Cycles are a fundamental component of the game.
In this level, we have one row for every cycle--or in the case of sessions, these could be broken down as:
- Dimensions related to the cycle: Session ID, Session index
- Dimension related to the entities: User ID, Location
- Cycle metrics: Page views, conversions
Cycles mostly consist of a number of interactions that take place over a specific period of time. Some examples of cycles are:
- Course completions
- Strategy to Increase Lifetime Customers
- Fun content to help you discover new things
- Game level competitions
There are many different types of cyclical information that you can capture for your business, but ultimately it depends on what makes sense for you.
The Entity Level
The entity level includes one row per entity. If we take users as our example here, the information captured could be broken down into:
- Dimensions related to the entity: User ID, Location
- Entity Metrics: Sessions, page views
- Other examples of entities could be: Products, Articles, Videos, Campaigns, Apps, Data pipelines
With this framework we have a great foundation for designing a data model from scratch. But there are other technical aspects to consider when designing your data models:
This can help you figure out what entities you should be focusing on, including graphs and visualizations.
Who is consuming this data and what are they interested in?
This will indicate what entities you need to think about
What is the use case for this data?
This will dictate how aggregated the tables need to be (whether at a granular level or high level)
How will this data be consumed?
This will inform how the tables need to be structured
What type of business are you running? How many customers do you have? How do people interact with your product? These are all questions that will help us design the best data model for your company.
Introduction to Data Models in Practice: Examples in Ecommerce and Subscription Businesses
It's worth looking at a few examples of data models for specific consumer groups to get an idea of how your own data model could work.
This is important for any business, but especially so for Ecommerce companies. So first, let's look at an example (i.e Target) of how to model data for:
- Marketing Team
- Inventory Team
If we’ll consider an example for a Subscription (i.e Netflix), modelling data for the:
- Product Team
- Content Team
ECOMMERCE: MARKETING WANT TO UNDERSTAND HOW TO OPTIMIZE THEIR MARKETING BUDGET
Let’s consider a Marketing team within an ecommerce company. They might want to understand how their efforts are driving revenue. Specifically, they may ask:
• What marketing channels lead to the sessions with the most dollars of product purchased?
• What marketing channels or campaigns are best at winning repeat customers?
This example is all about sessions. We want to know which sessions are bringing the most revenue. The page views will tell us which pages are most engaging and drove the most traffic. Therefore we need one row for each page view, with those aggregated up to the sessions level.
We track some customer data from purchases, like the total purchase value of each transaction and which products were purchased. But you might be wondering how we identify loyal customers. This would already be a really useful report; you might want to look at one row for each campaign, with the campaign's total sessions, revenue generated, purchases made, and so on.
It may be helpful to aggregate at the user level in some cases. It doesn't matter if a user made one purchase or ten, we still want to understand the importance of each individual customer. For example, if one campaign drives more lifetime purchases per user than another, it shouldn't be ignored.
But data's not just for marketers. For example, the inventory team has a lot of questions too.
- What are the best products? Which ones are trending?
- Which product positioning and pricing strategies should they employ to sell their existing inventory?
Though these questions are quite different from the questions the marketing team posed, we need to make sure the information we have is still relevant.
We don't want to just know when a customer purchases a product, but also when they click for a product detail view, add it to their cart, or remove it from the cart. We want to know every single interaction they have with the site to give us a full picture of how they use our site and what we need to improve. We might also want to aggregate all product interactions and purchases into a table, with one row for each product, that shows us the frequency of play and interaction before the purchase.
These data models may look very different with little overlap. However, when data is sent into the sessions table, it can be helpful to include a ‘purchases’ filter to isolate purchase actions for marketing.
Over time, the marketing team might start to develop an interest in the detailed functionality of your product, which will be easy to include in their reports.