¬ę Back to help center

Contiamo Labs

This guide introduces the advanced features available through Contiamo Labs.

Contents

Contiamo Labs

1. Introduction

Contiamo Labs let you harness all the power of Python's programming language and libraries to process and analyse your data. With Labs, you can:

1.1. Notebooks

Labs analyses are contained in notebooks. Notebooks hold and organize your code for a data analysis task, from cleaning and analysing your data to visualizing or uploading it.

A notebook is a collection of cells. Each cell contains Python code that can be executed, and will generate an output immediately below the cell. Cells can also contain formatted Markdown text, which is useful to describe the various steps in your data analysis.

labs_notebook

1.2. Getting started

To get started, navigate to the Labs section of your project. If you cannot find Labs in the navigation bar at the top, it means that Labs have not been enabled for the project. Please contact us to do so: support@contiamo.com.

Click Add new notebook to create your first notebook.

2. Basic notebook operation

2.1. Interface

The notebook menu sits on top of the notebook itself. Besides standard menus such as File and Edit, the Cell menu lets you run all or part of the cells in the notebook. The Contiamo menu contains special items that will be described later.

labs_notebook

The notebook toolbar (below the menu) has useful shortcuts to add, cut, copy, and move cells around the notebook. The dropdown tool lets you switch between Python code and Markdown.

2.2. Keyboard shortcuts

The notebook workflow also offers convenient features such as tab completion, and shortcut keys such as:

Notebooks are saved automatically every two minutes, but Ctrl + s will save manually. A full list of shortcuts is available under the Help menu.

3. Query data into Labs

The basic command to download data into a notebook is:

df = %contiamo query 'labs_id'

This will return the data from the query defined by the labs id. In order to get an identifier, you will need to create a shared query.

3.1. Create a shared query

A shared query is a query that can be accessed from outside the Contiamo app.

To begin, build a query (or open an existing one) in the Explore section of your project. You can create a labs identifier with the export tools on the upper right corner of the results area:

create_public_query

This will create a public URL as well as a labs identifier. A typical identifier will look like this: 'query:12345678:12345:Th8NYf4yTrwk...'

Caution: a public query URL lets anyone with the link access the query's data. The long, random token ensures that it cannot be guessed, but you need to make sure that you only share it with people who are authorized to see the data.

3.2. Get the query identifier

There are two ways of getting a query's identifier: from the Manage section of the project (Manage > Shared resources > Queries), or directly from the notebook:

3.3. Check for errors

Unless you are conducting a manual analysis, it is recommended to check for download errors before proceeding. If an error occurs, the query result contains the key 'http_error'. A code to check for errors could look like this:

df = %contiamo query $query_identifier
if 'http_error' in df:
    # an error has occurred, do something

4. Analyse and visualize data

4.1. Basic analysis and visualization

The data from a query is returned in a pandas dataframe. A dataframe is essentially a table, with each row corresponding to a time and/or date, and each column corresponding to either a metric or a dimension.

The pandas library comes with a full suite of data manipulation and analysis tools, as well as basic plotting functionality. This pandas tutorial provides an introduction to dataframes.

The first look into your data should begin with df.head() or df.tail(), which will output the first (or last) few rows. You can also print the entire dataframe, as the notebook will safely limit the output even for a very large dataframe.

More information can be gathered with df.describe(), which will provide information such as count, mean, standard deviation, etc. for numeric columns. Finally, df.dtypes will simply list the columns in the dataframe, and their data types.

For basic plotting, you might want to try df.plot(), or refer to the pandas tutorial for more examples. However we recommend using the Seaborn library.

4.2. Advanced visualizations

We recommend the Seaborn visualization library (we may be able to include other libraries on request). The Seaborn library makes it easy to draw the following charts:

Here is how to import the library in a notebook:

import seaborn as sns
%matplotlib inline

The second line is required for charts to be displayed in the notebook. You can set the size of the charts as follows:

sns.set_context("poster")  # display large charts

For example, here is how to plot a heatmap with one line of code. First, load the data:

flights_long = sns.load_dataset("flights")
flights = flights_long.pivot("month", "year", "passengers")

Then plot the heatmap:

sns.heatmap(flights, annot=True, fmt="d", linewidths=.5)

seaborn_heatmap

4.3. Advanced analysis

Our Python installation comes more advanced analysis libraries such as scikit-learn for machine learning. Please contact us for further information: support@contiamo.com.

5. Upload data into Contiamo

With Labs, you can upload the results of your analyses back into Contiamo in order to make full use of Contiamo's powerful charting and collaboration tools.

In order to upload data into Contiamo, you need to create a data contract. Please see this documentation on how to do so.

5.1. Get the contract identifier

Getting contract identifiers in your notebook follows the same steps as getting a query identifier:

5.2. Discover the data structure

We assume you have the data you want to upload in a dataframe called df.

In order to upload data to the contract, you need to discover the data structure, i.e. tell Contiamo what columns and data types to expect. Discovery is done with this line of code:

%contiamo discover -d df $contract_identifier

To check whether discovery was successful, check the output of the execution. The output will look like this:

If you encounter a failure at this stage and cannot find a simple explanation, please contact us: support@contiamo.com.

5.3. Upload the data

Once the data structure is set up, you are ready to upload the data:

%contiamo upload -d df $contract_identifier

Again, check the output for possible errors. If you are automating a task, here is the way to do it programmatically:

result = %contiamo discover -d df $contract_identifier
if 'http_error' in result:
    # do something

That's it: the data has been uploaded to Contiamo, and is ready for use. You can create charts, dashboards, and share your results with others.

6. Schedule automated tasks

Please contact us: : support@contiamo.com.