Getting Started
First, welcome to MiPasa!
Use the step-by-step guide, tutorials, and examples to learn how to use MiPasa Platform and MiPasa’s Client SDK (which is available here: https://github.com/hacera/mipasa-client-sdk)
MiPasa — Web-based Python IDE: Step-by-Step Guide
Sign Up and Try MiPasa
To run your very first app on the MiPasa IDE, complete the following steps:
- Go to the Sign Up page: https://app.mipasa.com
- Create a username / password, and confirm your email. You will be logged in and directed to the MiPasa homepage.
- Verify that you have an API Key: https://app.mipasa.com/profile?tab=api
Hello World
- Navigate to Code from the top menu bar and create a New Code
- Copy-paste the following code for “Hello World” and save the code with the same name:
import mipasa
print("Hello World")
- Save and click Run
- You will see the output below the code:
Hello World
Data Access
- Navigate to Datasets from the top menu bar. You will see a list of Datasets
- You can search the datasets or click through using Most Used datasets on the right
-
Search for dataset “ECDC” and click on “Quickview” to verify the dataset
- This dataset will have multiple csv files. Click on them to verify the data entries. You can also download the csv files on your desktop
- You have now verified a dataset to be used in app development
Writing Code
- MiPasa provides an easy way to access the datasets
- Navigate to Code from the top menu bar and create a New Code
- Create a new client, which is the main entry point for interaction with the API:
import mipasa
client = mipasa.Client()
- Add a dataset and set dataset to the latest version:
ecdc = client.get_feed_by_name('ECDC')
ecdc_latest = ecdc.get_latest_version()
- In the following code example, you can set the latest date to print the data in your IDE and run:
import dateutil
ecdc_cases = ecdc.get_file('Output_ECDC_Cases.csv').get_as_csv()
ecdc_deaths = ecdc.get_file('Output_ECDC_Deaths.csv').get_as_csv()
# find the latest date in the table. table is not sorted, so we have to scan it
latest_date = ecdc_cases[1][2]
for row in ecdc_cases[1:]:
if not row[2]:
continue
if dateutil.parser.isoparse(row[2]) > dateutil.parser.isoparse(latest_date):
latest_date = row[2]
# return latest date only
ecdc_cases = [ecdc_cases[0]] + [x for x in ecdc_cases[1:] if x[2] == latest_date]
# return latest date only
ecdc_deaths = [ecdc_deaths[0]] + [x for x in ecdc_deaths[1:] if x[2] == latest_date]
ecdc_cases = [ecdc_cases[0]]+sorted(ecdc_cases[1:], key=lambda x: int(x[3]), reverse=True)
print('ECDC latest date: %s' % latest_date)
print('')
print('Cases by country:')
print('%-24s %-12s %-8s %-8s' % ('Date', 'Country', 'Cases', 'Deaths'))
for case in ecdc_cases[1:]:
deaths = [x for x in ecdc_deaths if x[1] == case[1] and x[2] == case[2]]
deaths_num = deaths[0][3] if deaths else '0'
print('%-24s %-12s %-8s %-8s' % (case[2], case[1], case[3], deaths_num))
- MiPasa IDE allows you to Save and Share the code
Creating Notebooks
- MiPasa provides Jupyter Notebooks support within the IDE
- Navigate to Notebooks from the top menu bar and create a New Notebook
- Assign a name to the Notebook and save
- You can add narrative texts, analysis methodology, live code, etc. all in one place; for example, set the cell type to “Markdown” and write:
# This is an example of using MiPasa and Jupyter for live code, equations, visualizations and narrative text
- Add a new code cell to create a basic visualization of the dataset:
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import numpy as np
import dateutil.parser
import mipasa
client = mipasa.Client()
ecdc = client.get_feed_by_name('ECDC')
cases_csv = ecdc.get_file('Output_ECDC_Cases.csv').get_as_csv()
deaths_csv = ecdc.get_file('Output_ECDC_Deaths.csv').get_as_csv()
def plot_csv(name, cases_csv):
csv_header = cases_csv[0]
idx_country = csv_header.index('countryCode2')
idx_date = csv_header.index('date')
idx_cases = csv_header.index('cases')
countries_by_date = {}
all_countries = []
last_date = None
for row in cases_csv[1:]:
row_date = row[idx_date]
row_country = row[idx_country]
row_cases = row[idx_cases]
if row_country not in all_countries:
all_countries.append(row_country)
if row_date not in countries_by_date:
countries_by_date[row_date] = {}
countries_by_date[row_date][row_country] = int(row_cases)
last_date = row_date
# normalize countries
for date in countries_by_date:
for country in all_countries:
if country not in countries_by_date[date]:
countries_by_date[date][country] = 0
# sort by date
dates = sorted([dateutil.parser.parse(k) for k in countries_by_date])
x_values = [x.date() for x in dates]
y_values = {}
y_values_combined = {}
last_date_numbers = countries_by_date[last_date]
all_countries = sorted(all_countries, key=lambda x: last_date_numbers[x], reverse=True)
top_countries = all_countries[:5]
for date in countries_by_date:
for country in countries_by_date[date]:
if country in top_countries:
if country not in y_values:
y_values[country] = []
y_values[country].append(countries_by_date[date][country])
else:
if country not in y_values_combined:
y_values_combined[country] = []
y_values_combined[country].append(countries_by_date[date][country])
y_values_other = []
for country in y_values_combined:
lendiff = len(y_values_combined[country]) - len(y_values_other)
if lendiff > 0:
y_values_other += [0] * lendiff
for i in range(len(y_values_combined[country])):
y_values_other[i] += y_values_combined[country][i]
fig = plt.gcf()
fig.set_size_inches(16,8)
plt.xticks(rotation=45)
ax = plt.gca()
formatter = mdates.DateFormatter("%m-%d-%y")
ax.xaxis.set_major_formatter(formatter)
locator = mdates.DayLocator(interval=14)
ax.xaxis.set_major_locator(locator)
for country in y_values:
plt.plot(x_values, y_values[country], label=country)
plt.plot(x_values, y_values_other, label='Other')
plt.title(name)
plt.legend()
plt.show()
plot_csv('Deaths (ECDC)', deaths_csv)
plot_csv('Cases (ECDC)', cases_csv)
- Save and run the code to review the following visualizations in the Notebook:
- You can save and share the Notebook. Sharing will generate such a URL:
http://app.mipasa.com/notebooks/view/ECDC-Notebook-Example
- An existing or a new user of MiPasa can view the Notebook. New users will be required to Sign Up to start collaborating with the Notebooks.
MiPasa Python Client SDK: Step-by-Step Guide
Prerequisites
These are minimal requirements that this flow was tested with. For software, it’s good to use greater versions or sometimes even lower, but not advised.
- MiPasa API Key
- Python 3.8
Download the SDK
Get the latest MiPasa SDK from https://github.com/hacera/mipasa-client-sdk
Writing Code
- MiPasa provides an easy way to access and transform accurate datasets
- Create a new client, which is the main entry point for interaction with the API. Note that unlike our web-based IDE, this will require you to provide an API Key:
import mipasa
client = mipasa.Client('<your API Key goes here>')
- You can access all of your authorized data feeds:
all_feeds = client.list_feeds()
- Or you can load a data feed by name:
feed = client.get_feed_by_name('ECDC')
- A feed (dataset file name) may have multiple versions, which can be accessed by:
feed_versions = feed.list_versions()
- Alternatively, you can get the latest version of the feed:
feed_latest_version = feed.get_latest_version()
- You can save the latest version of a feed:
import codecs
def save_data(d, filename):
print('Saving data into %s'%filename)
with codecs.open(filename, 'w', encoding='utf-8') as f:
f.write(d)
feed_latest_data = feed.get_file('Source.csv').get(feed_latest_version.id)
save_data(feed_latest_data, 'ECDC_Latest_RawData.csv')
- You can fetch transformed data for the latest version. In case of ECDC, this should be ‘ECDC_Cases’:
import csv
def save_csv(c, filename):
print('Saving CSV into %s'%filename)
with codecs.open(filename, 'w', encoding='utf-8') as f:
writer = csv.writer(f)
for row in c:
writer.writerow(row)
feed_latest_transformed_data = feed_latest_version.get_transformed('ECDC_Cases')
save_csv(feed_latest_transformed_data, 'ECDC_Latest_Transformed_Cases.csv')
- Find any non-latest version of a dataset:
non_latest_version = [x for x in feed_versions if x.id != feed_latest_version.id and x.is_transformed()][0]
- Dataset can be updated (otherwise data is not loaded for range queries — like list_versions, where we got non-latest one from)
non_latest_version.update()
non_latest_data = feed.get_file('Source.csv').get(non_latest_version.id)
save_data(non_latest_data, 'ECDC_Old_RawData.csv')
- You can also get transformed data (as CSV) for the non-latest version, assuming it’s present:
non_latest_version.get_transformed('ECDC_Cases')
save_csv(non_latest_transformed_data, 'ECDC_Old_Transformed_Cases.csv')
Note that transformed data is also provided in the form of newer API, involving files
property.
For more details see the SDK documentation.