Getting Started

First, welcome to MiPasa!

Use the step-by-step guide, tutorials, and examples to learn how to use MiPasa Platform and MiPasa’s Client SDK (which is available here: https://github.com/hacera/mipasa-client-sdk)

MiPasa — Web-based Python IDE: Step-by-Step Guide

Sign Up and Try MiPasa

To run your very first app on the MiPasa IDE, complete the following steps:

  1. Go to the Sign Up page: https://app.mipasa.com
  2. Create a username / password, and confirm your email. You will be logged in and directed to the MiPasa homepage.
  3. Verify that you have an API Key: https://app.mipasa.com/profile?tab=APIs

Hello World

  1. Navigate to Code from the top menu bar and create a New Code
  2. Copy-paste the following code for “Hello World” and save the code with the same name:
import mipasa
print("Hello World")
  1. Save and click Run
  2. You will see the output below the code:
Hello World

Data Access

  1. Navigate to Datasets from the top menu bar. You will see a list of Datasets
  2. You can search the datasets or click through using Most Used datasets on the right
  3. Search for dataset “ECDC” and click on “Quickview” to verify the dataset
    • This dataset will have multiple csv files. Click on them to verify the data entries. You can also download the csv files on your desktop
  4. You have now verified a dataset to be used in app development

Writing Code

  1. MiPasa provides an easy way to access the datasets
  2. Navigate to Code from the top menu bar and create a New Code
  3. Create a new client, which is the main entry point for interaction with the API:
import mipasa
client = mipasa.Client()
  1. Add a dataset and set dataset to the latest version:
ecdc = client.get_feed_by_name('ECDC')
ecdc_latest = ecdc.get_latest_version()
  1. In the following code example, you can set the latest date to print the data in your IDE and run:
import dateutil

ecdc_cases = ecdc.get_file('Output_ECDC_Cases.csv').get_as_csv()
ecdc_deaths = ecdc.get_file('Output_ECDC_Deaths.csv').get_as_csv()

# find the latest date in the table. table is not sorted, so we have to scan it
latest_date = ecdc_cases[1][2]
for row in ecdc_cases[1:]:
  if not row[2]:
    continue
  if dateutil.parser.isoparse(row[2]) > dateutil.parser.isoparse(latest_date):
    latest_date = row[2]

# return latest date only
ecdc_cases = [ecdc_cases[0]] + [x for x in ecdc_cases[1:] if x[2] == latest_date]

# return latest date only
ecdc_deaths = [ecdc_deaths[0]] + [x for x in ecdc_deaths[1:] if x[2] == latest_date]

ecdc_cases = [ecdc_cases[0]]+sorted(ecdc_cases[1:], key=lambda x: int(x[3]), reverse=True)

print('ECDC latest date: %s' % latest_date)
print('')
print('Cases by country:')
print('%-24s %-12s %-8s %-8s' % ('Date', 'Country', 'Cases', 'Deaths'))
for case in ecdc_cases[1:]:
  deaths = [x for x in ecdc_deaths if x[1] == case[1] and x[2] == case[2]]
  deaths_num = deaths[0][3] if deaths else '0'
  print('%-24s %-12s %-8s %-8s' % (case[2], case[1], case[3], deaths_num))
  1. MiPasa IDE allows you to Save and Share the code

Creating Notebooks

  1. MiPasa provides Jupyter Notebooks support within the IDE
  2. Navigate to Notebooks from the top menu bar and create a New Notebook
  3. Assign a name to the Notebook and save
  4. You can add narrative texts, analysis methodology, live code, etc. all in one place; for example, set the cell type to “Markdown” and write:
# This is an example of using MiPasa and Jupyter for live code, equations, visualizations and narrative text
  1. Add a new code cell to create a basic visualization of the dataset:
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import numpy as np
import dateutil.parser

import mipasa

client = mipasa.Client()
ecdc = client.get_feed_by_name('ECDC')
cases_csv = ecdc.get_file('Output_ECDC_Cases.csv').get_as_csv()
deaths_csv = ecdc.get_file('Output_ECDC_Deaths.csv').get_as_csv()

def plot_csv(name, cases_csv):

  csv_header = cases_csv[0]
  idx_country = csv_header.index('countryCode2')
  idx_date = csv_header.index('date')
  idx_cases = csv_header.index('cases')
  
  countries_by_date = {}
  all_countries = []
  last_date = None
  for row in cases_csv[1:]:
    row_date = row[idx_date]
    row_country = row[idx_country]
    row_cases = row[idx_cases]
    if row_country not in all_countries:
      all_countries.append(row_country)
    if row_date not in countries_by_date:
      countries_by_date[row_date] = {}
    countries_by_date[row_date][row_country] = int(row_cases)
    last_date = row_date
    
  # normalize countries
  for date in countries_by_date:
    for country in all_countries:
      if country not in countries_by_date[date]:
        countries_by_date[date][country] = 0
    
  # sort by date
  dates = sorted([dateutil.parser.parse(k) for k in countries_by_date])
  x_values = [x.date() for x in dates]
  y_values = {}
  y_values_combined = {}
  
  last_date_numbers = countries_by_date[last_date]
  all_countries = sorted(all_countries, key=lambda x: last_date_numbers[x], reverse=True)
  
  top_countries = all_countries[:5]
  
  for date in countries_by_date:
    for country in countries_by_date[date]:
      if country in top_countries:
        if country not in y_values:
          y_values[country] = []
        y_values[country].append(countries_by_date[date][country])
      else:
        if country not in y_values_combined:
          y_values_combined[country] = []
        y_values_combined[country].append(countries_by_date[date][country])
      
  y_values_other = []
  for country in y_values_combined:
    lendiff = len(y_values_combined[country]) - len(y_values_other)
    if lendiff > 0:
      y_values_other += [0] * lendiff
    for i in range(len(y_values_combined[country])):
      y_values_other[i] += y_values_combined[country][i]
  
  fig = plt.gcf()
  fig.set_size_inches(16,8)
  
  plt.xticks(rotation=45)
  
  ax = plt.gca()
  
  formatter = mdates.DateFormatter("%m-%d-%y")
  ax.xaxis.set_major_formatter(formatter)
  
  locator = mdates.DayLocator(interval=14)
  ax.xaxis.set_major_locator(locator)
  
  for country in y_values:
    plt.plot(x_values, y_values[country], label=country)
    
  plt.plot(x_values, y_values_other, label='Other')
    
  plt.title(name)
  plt.legend()
  plt.show()

plot_csv('Deaths (ECDC)', deaths_csv)
plot_csv('Cases (ECDC)', cases_csv)
  1. Save and run the code to review the following visualizations in the Notebook:

ECDC_Cases ECDC_Deaths

  1. You can save and share the Notebook. Sharing will generate such a URL:
http://app.mipasa.com/notebooks/view/ECDC-Notebook-Example
  1. An existing or a new user of MiPasa can view the Notebook. New users will be required to Sign Up to start collaborating with the Notebooks.

MiPasa Python Client SDK: Step-by-Step Guide

Prerequisites

These are minimal requirements that this flow was tested with. For software, it’s good to use greater versions or sometimes even lower, but not advised.

  1. MiPasa API Key
  2. Python 3.8

Download the SDK

Get the latest MiPasa SDK from https://github.com/hacera/mipasa-client-sdk

Writing Code

  1. MiPasa provides an easy way to access and transform accurate datasets
  2. Create a new client, which is the main entry point for interaction with the API. Note that unlike our web-based IDE, this will require you to provide an API Key:
import mipasa
client = mipasa.Client('<your API Key goes here>')
  1. You can access all of your authorized data feeds:
all_feeds = client.list_feeds()
  1. Or you can load a data feed by name:
feed = client.get_feed_by_name('ECDC')
  1. A feed (dataset file name) may have multiple versions, which can be accessed by:
feed_versions = feed.list_versions()
  1. Alternatively, you can get the latest version of the feed:
feed_latest_version = feed.get_latest_version()
  1. You can save the latest version of a feed:
import codecs

def save_data(d, filename):
    print('Saving data into %s'%filename)
    with codecs.open(filename, 'w', encoding='utf-8') as f:
        f.write(d)

feed_latest_data = feed.get_file('Source.csv').get(feed_latest_version.id)
save_data(feed_latest_data, 'ECDC_Latest_RawData.csv')
  1. You can fetch transformed data for the latest version. In case of ECDC, this should be ‘ECDC_Cases’:
import csv

def save_csv(c, filename):
    print('Saving CSV into %s'%filename)
    with codecs.open(filename, 'w', encoding='utf-8') as f:
        writer = csv.writer(f)
        for row in c:
            writer.writerow(row)

feed_latest_transformed_data = feed_latest_version.get_transformed('ECDC_Cases')
save_csv(feed_latest_transformed_data, 'ECDC_Latest_Transformed_Cases.csv')
  1. Find any non-latest version of a dataset:
non_latest_version = [x for x in feed_versions if x.id != feed_latest_version.id and x.is_transformed()][0]
  1. Dataset can be updated (otherwise data is not loaded for range queries — like list_versions, where we got non-latest one from)
non_latest_version.update()
non_latest_data = feed.get_file('Source.csv').get(non_latest_version.id)
save_data(non_latest_data, 'ECDC_Old_RawData.csv')
  1. You can also get transformed data (as CSV) for the non-latest version, assuming it’s present:
non_latest_version.get_transformed('ECDC_Cases')
save_csv(non_latest_transformed_data, 'ECDC_Old_Transformed_Cases.csv')

Note that transformed data is also provided in the form of newer API, involving files property. For more details see the SDK documentation.