Single Cohort Analysis -- using entity segments

Cohort analysis involves identifying a group of users and tracking them over time.

The two key steps are:

  1. Identify the cohort - Usually the group of users are identified by a specific time period – for example, students who graduated in 2010. Or customers who made a purchase in on specific day or month.

  2. Trending - Once you identified the cohort group, you want to a track a metric of the cohort over time. For example, what was the average salary of students graduating in 2010 over the next five years.

Let’s look at a common cohort analysis involving retention %

In Arcadia we can do single cohort analysis using Segment entities.

Step 1 – identify the cohort

Navigate to dataset details page and within Segments sub-page, create a new segment.

There are a few things we need to fill out:

  1. Name for the cohort, which will be referenced when creating visual
  2. Filters that define the cohort - in the below example we are choosing customers who had orders within the month of Jan 2009 and at least one sales transaction greater than 100
  3. Lastly, we need a unique identifier for each entity in the cohort. This has to be a column in the dataset. In the below example, we have chosen “Customer Name”.

Step 2 – trend the cohort

Create a new visual and configure the time period and metric. In the below example we are going to trend by month the distinct count of customers.

To enable cohort, we need to navigate to the “Segment” panel and choose the cohort we just created.

Step 3

To get the retention metric we need to use a analytic function (window functions) as explain here:

In our example we used the following:

count(distinct([Customer Name])) / first_value ( count(distinct([Customer Name]))) 
over (order by from_timestamp([Order Date], 'yyyy-MM')) as '% Active'

“Under the hood” SQL

To understand how this all works, you can reference the generated SQL. You can look at the Activity Log page

In our example the final SQL looks like below. Notice the join that’s generated back to the main table.

SELECT
from_timestamp((TA_0.order_date), ‘yyyy-MM’) as year and month of Order Date, count(distinct((TA_0.customer_name))) / first_value ( count(distinct((TA_0.customer_name)))) over (order by from_timestamp((TA_0.order_date), ‘yyyy-MM’)) as % Active, count(distinct ((TA_0.customer_name))) as countdistinct(Customer Name)
FROM robb.Superstore_LV TA_0 join
(SELECT TA_0.customer_name as arc_u10_0 FROM robb.Superstore_LV TA_0 WHERE (((TA_0.order_date) between ‘2009-01-01’ and ‘2009-01-31’ AND (TA_0.sales) > 100)) GROUP BY 1) arc_u10
ON TA_0.customer_name=arc_u10_0

WHERE (TA_0.order_date between ‘2009-01-01’ and ‘2009-12-31’) GROUP BY 1 LIMIT 5000

1 Like

With cohorts you can also specify “not in” – for example, find all users that are not in the entity segment.

You can do this through the Segment selection. You can change “is” to “not” which would give you the negated set.