« Back to help center

Contiamo Labs

This guide introduces the advanced features available through Contiamo Labs.

Contents

Contiamo Labs

1. Introduction

Contiamo Labs let you harness all the power of Python's programming language and libraries to process and analyse your data. With Labs, you can:

1.1. Notebooks

Labs analyses are contained in notebooks. Notebooks hold and organize your code for a data analysis task, from cleaning and analysing your data to visualizing or uploading it.

A notebook is a collection of cells. Each cell contains Python code that can be executed, and will generate an output immediately below the cell. Cells can also contain formatted Markdown text, which is useful to describe the various steps in your data analysis.

labs_notebook

1.2. Getting started

To get started, navigate to the Labs section of your project. If you cannot find Labs in the navigation bar at the top, it means that Labs have not been enabled for the project. Please contact us to do so: support@contiamo.com.

Click Add new notebook to create your first notebook.

2. Basic notebook operation

2.1. Interface

The notebook menu sits on top of the notebook itself. Besides standard menus such as File and Edit, the Cell menu lets you run all or part of the cells in the notebook. The Contiamo menu contains special items that will be described later.

labs_notebook

The notebook toolbar (below the menu) has useful shortcuts to add, cut, copy, and move cells around the notebook. The dropdown tool lets you switch between Python code and Markdown.

2.2. Contents, comments, and versions

In a markdown cell, headings can be inserted in the notebook by creating a markdown like this:

# Title 1

The headings that you create form a table of contents in the Contents section, which is accessible by clicking on the arrow in the top right corner.

notebook_contents

In the Comments section, you can discuss with colleagues and notify them by email by tagging them with @[email address].

Finally, Versions lets you go back to (and restore) earlier versions of the notebook.

2.3. Keyboard shortcuts

The notebook workflow also offers convenient features such as tab completion, and shortcut keys such as:

Notebooks are saved automatically every two minutes, but Ctrl + s will save manually. A full list of shortcuts is available under the Help menu.

3. Get data into Labs

There are three main ways to query data into Labs.

3.1. Query data from the Explore section

The first option is to query the desired data in the Explore section and then query it into Contiamo.

3.1.1. Create a published query

A shared or published query is a query that can be accessed from outside the Contiamo app.

To begin, build a query (or open an existing one) in the Explore section of your project. You can then publish the query with the export tools on the upper right corner of the results area:

create_public_query

This will show the following dialog:

create_public_query_dialog

Once the query is created, it will be given a public URL as well as a labs identifier (which typically looks like this: 'query:olap:12345678:12345:Th8NYf4yTrwk...').

Caution: a public query URL lets anyone with the link access the query's data. The long, random token ensures that it cannot be guessed, but you need to make sure that you only share it with people who are authorized to see the data.

3.1.2. Get the query identifier

If you did not get the identifier from the creation confirmation dialog, there are two ways of retrieving a query's identifier: from the Manage section of the project (Manage > Shared resources > Queries), or directly from the notebook:

3.1.3. Query data into Labs

The command to download data from the explorer section into a notebook is:

df = %contiamo query query:olap:12345678:12345:Th8NYf4yTrwk...

This will return the data from the query defined by the labs identifier.

3.1.4. Check for errors

Unless you are conducting a manual analysis, it is recommended to check for download errors before proceeding.

%config Contiamo.raise_errors = True

When raise_errors is False, the code will print an error message but it will not interrupt notebook execution. That's nicer when running a notebook manually cell-by-cell, but is not ideal when running the entire notebook.

try:
    df = %contiamo query $labs_id
except ContiamoException:
    # do something

The detailed list of errors raised by the %contiamo command - all derived from ContiamoException - is described in the code of our API library. (The %contiamo "magic" is a shortcut to the library.)

3.2. Send a SQL query

This is explained in the API library section.

3.3. Pull data from an external source

Anything that can be done in Python can be done in Labs. For instance, you can pull data from external sources via their API. Let's illustrate that by an example.

Suppose you want to access the data in JSON format at the following URL: http://statistik.basketball-bundesliga.de/xchg-dtag/json/epg.json

First, import the relevant pacakages (the requests package would be a good alternative) and load the data:

import urllib.request as ur
import urllib.parse as par
import json

url = 'http://statistik.basketball-bundesliga.de/xchg-dtag/json/epg.json'
html = ur.urlopen(url).read()
data = json.loads(html.decode('utf-8'))

Then use pandas’ json_normalize function to convert it into a dataframe:

import pandas as pd
df = pd.io.json.json_normalize(data['games']['game'])

Done!

4. Analyse and visualize data

4.1. Basic analysis and visualization

The data from a query is returned in a pandas dataframe. A dataframe is essentially a table, with each row corresponding to a time and/or date, and each column corresponding to either a metric or a dimension.

The pandas library comes with a full suite of data manipulation and analysis tools, as well as basic plotting functionality. This pandas tutorial provides an introduction to dataframes.

The first look into your data should begin with df.head() or df.tail(), which will output the first (or last) few rows. You can also print the entire dataframe, as the notebook will safely limit the output even for a very large dataframe.

More information can be gathered with df.describe(), which will provide information such as count, mean, standard deviation, etc. for numeric columns. Finally, df.dtypes will simply list the columns in the dataframe, and their data types.

Going further: this demo notebook showcases a typical pandas analysis.

4.2. Advanced visualizations

For basic plotting, you might want to try df.plot(), or refer to the pandas tutorial for more examples.

For more advanced visualizations, we recommend the Seaborn visualization library (we may be able to include other libraries on request). The Seaborn library makes it easy to draw the following charts:

Here is how to import the library in a notebook:

import seaborn as sns
%matplotlib inline

The second line is required for charts to be displayed in the notebook. You can set the size of the charts as follows:

sns.set_context("poster")  # display large charts

For example, here is how to plot a heatmap with one line of code. First, load the data:

flights_long = sns.load_dataset("flights")
flights = flights_long.pivot("month", "year", "passengers")

Then plot the heatmap:

sns.heatmap(flights, annot=True, fmt="d", linewidths=.5);

seaborn_heatmap

Going further: this demo notebook shows how to download data from Contiamo, analyse it, and plot a heatmap with the result.

4.3. Advanced analysis

Our Python installation comes more advanced analysis libraries such as scikit-learn for machine learning.

Going further: this demo notebook demonstrates a common machine-learning technique (clustering) applied to a demo dataset containing transaction information for a mobile app retailer.

5. Upload data into Contiamo

With Labs, you can upload the results of your analyses back into Contiamo in order to make full use of Contiamo's powerful charting and collaboration tools.

In order to upload data into Contiamo, you need to create a data contract. Please see this documentation on how to do so.

5.1. Get the contract identifier

Getting contract identifiers in your notebook follows the same steps as getting a query identifier:

5.2. Discover the data structure

We assume you have the data you want to upload in a dataframe called df.

In order to upload data to the contract, you need to discover the data structure, i.e. tell Contiamo what columns and data types to expect. Discovery is done with this line of code:

%contiamo discover -d df $contract_identifier

A typical error at this stage occurs when the data contract already contains data. When that happens, you need to clear the data first:

%contiamo purge $contract_identifier

If you encounter another error and cannot find a simple explanation, please contact us: support@contiamo.com.

You can check (and edit) the details of the data structure that was discovered in the Manage section of the Contiamo app, specifically in the data contract view under Setup > Columns.

5.3. Upload the data

Once the data structure is set up, you are ready to upload the data:

%contiamo upload -d df $contract_identifier

Check the output for possible errors. If you are automating a task, here is the way to do it programmatically:

try:
    df = %contiamo upload -d df $contract_identifier
except ContiamoException:
    # do something

Again, the errors raised by the %contiamo command are detailed in the code of the API library.

That's it: the data has been uploaded to Contiamo, and is ready for use. You can create charts, dashboards, and share your results with others.

6. Schedule automated tasks

In the demo notebook we created a dataframe with additional information on customer behavior. To use these results in Contiamo, we would create a dedicated data contract, upload the dataframe as described above, and finally schedule an automated execution of the notebook to update the data contract on a regular basis.

To schedule an automated execution of your notebook, click on the Execution at the top (you must not be editing the notebook to see it):

scheduled_execution

The following dialog will open :

scheduled_execution_dialog

You can either use standard schedules for execution (daily at midnight for example) or define a custom schedule. Custom schedules use CRON expressions (you can validate CRON expressions here).

7. The Contiamo API library

Contiamo Labs comes with an API library that allows you to perform any operation that you would do via the Contiamo app (and more). For instance, you can:

The Contiamo API library can also be downloaded and used in any environment. To get an overview of available functions, the API library is available in Github. We cover two examples here in order to introduce some features of the library.

7.1. SQL queries

You can query data in the Contiamo app by creating a SQL query in the Explore section. You can also send a SQL query directly from Labs. The command would be:

import contiamo

contiamo_client = contiamo.resources.Client('api_key')
project = contiamo_client.Project('project_id')
df = project.query_sql(datasource_id, 'SELECT * FROM table_name LIMIT 10;')

7.1.1. Setting up the API client

In order to set up the API client and instantiate the project resource, you need to create or get an API key for the project, and get the project identifier.

To create or get an API key for your project, go to Manage > API keys.

api_keys

Click on Add new api key or copy an existing one to paste it into your notebook (in place of api_key).

7.1.2. Retrieving table names

A table name is usually contract_x where x is the name of the data contract in the datasource. You can also find the table name by creating a SQL query in the Explore section and navigating the data sources in the left-hand pane.

7.2. Example use case

Say you want to create several identical dashboards for different countries. To do this, you need to copy an initial dashboard and change the filter in each widget. Instead of doing it manually, you can write a notebook that does that for you.

7.2.1. Get identifiers

In order get the dashboard identifier, open the dashboard you want to copy in the Contiamo app. The URL should look like this : https://app.contiamo.com/48123456/dashboards/1234. Here, 48123456 is the project identifier and 1234 is the dashboard identifier.

You can find the database identifier by opening the corresponding data source in the Manage section and looking at the URL as well: https://app.contiamo.com/48590121/manage/apps/666572174. Here, the database identifier is 666123456.

7.2.2. Set up a client with authenticated access

We explained above how to set up an API client and instantiate a project.

import contiamo

contiamo_client = contiamo.resources.Client('api_key') 
project = contiamo_client.Project('project_id') 

Now that you have access to your project, let’s access the dashboard you want to copy.

7.2.3. Duplicate the dashboard

List

If you don’t know your dashboard identifier yet, you can access the list of dashboards in your project with the list function:

project.Dashboard.list()

Retrieve

Once you have your identifier, you can retrieve the dashboard model:

dashboard = project.Dashboard.retrieve(dashboard_id)

Create

Then, you need to create a new dashboard where you will copy the widgets from your initial dashboard. This can be done with the create function. You may have noticed that dashboards are represented by models derived from dictionaries. You can therefore create a copy of the model of your initial dashboard and give it a new name like this:

keys_to_copy = ['edit_style', 'has_style', 'layout_type']
dashboard_copy = {key: dashboard[key] for key in keys_to_copy}
dashboard_copy['name'] = `New Dashboard`

Then use the create function to create a new dashboard with the based on the new model:

new_dashboard = project.Dashboard.create(dashboard_copy)

You should now see a new dashboard in your Dashboard section called 'New Dashboard'.

7.2.4. Copy the widgets

The same functions (list, retrieve, and create) are available for widgets. You can thus access all the widgets from your initial dashboard with:

widgets = dashboard.Widget.list(instantiate=True)

Let’s suppose you want to change the filter 'Europe' into the filter ‘Asia’ in the ‘Continent’ dimension of your widget. One way is to browse your widgets list, copy the widgets, change the filters and finally create new widgets into your new dashboard with the create function :

import copy

for widget in widgets:
    widget_copy = copy.deepcopy(widget)
    # remove existing widget id (a new one will be returned)
    widget_copy['query'].pop('id', None)

    for filt in widget_copy['query']['filters']:
        # If the filter is the filter to be changed, change it
        if filt['dimension']['key'] == 'Continent' and filt['options']['values'] == 'Europe':
            filt['options']['values'] = 'Asia'

    # Create the widget in the dashboard
    new_widget = new_dashboard.Widget.create(widget_copy)

You will now see the same widgets as your initial dashboard in your new dashboard, but with ‘Asia’ filter, instead of ‘Europe’.