Data Visualization#

After all data has been generated, a graph is usually created to help visualize the relationship and draw some conclusions about the dataset. This document will show how to do this using seaborn package.

The use case described here is from the paper Modelling the Joint Effect of Social Determinants and Peers on Obesity Among Canadian Adults which is described in Customizing Node Values. The model used in the paper will be referred as obesity model.

Creating graphs#

Using seaborn to graph will require an object of class pandas.DataFrame.

A csv file that has data collected from running the obesity model with different values of p, 20 replications each, is used here. The model was also run with different types of networks, small world and scale-free, which can be indicated from the Type column. The first five rows of the file:

	p	Network Type	variable	Obesity
0	0.0	Small world	Run 1	0.944109272
1	0.1	Small world	Run 1	0.946973076
2	0.2	Small world	Run 1	0.945395483
3	0.4	Small world	Run 1	0.941793814
4	0.5	Small world	Run 1	0.924613951

The complete dataset can be found here: obesity_serial.csv

The file can be loaded to a panda dataframe using pandas.pd.read_csv():

import pandas as pd

runs_data = pd.read_csv("obesity_serial.csv")

The graph can be plotted and shown using both seaborn and matplotlib. In this case, line plot is used. The complete code is shown below:

import seaborn as sns
import pandas as pd
import matplotlib.pyplot as plt

runs = pd.read_csv("obesity_serial.csv")
sns.set_style("darkgrid")
sns.lineplot(data=runs, x="p", y="Obesity", hue="Network Type",
            err_style="bars")
plt.show()

The result graph is displayed below:

Comparing runtimes#

Graphs can be used to compare the runtime across multiple categories.

In order to explore how different types of networks influence the runtime of the model for the parallel version, data was collected by running the obesity model with different types of networks and increasing number of agents. The data are then organized into a csv file. The first five rows are displayed below:

Number of Agents	Type	Time (minutes)
1000	barabasiGreedy	0.191486667
5000	barabasiGreedy	1.3587
10000	barabasiGreedy	4.947505556
40000	barabasiGreedy	88.92053333

The complete csv file can be found here: parallel_times.csv

In order to use the above file for graphing, a pandas dataframe should be created and then seaborn and matplotlib to graph it. Parameter hue can be set as Type to group the data. Parameter

import seaborn as sns
import pandas as pd
import matplotlib.pyplot as plt

runs = pd.read_csv("parallel_times.csv")
sns.set_style("darkgrid")
sns.lineplot(data=runs, x="Agents", y="Time(minutes)", hue="Type", err_style="bars")
plt.show()

The resulted graph is displayed below:

There are different ways to organize the dataset to help with data Visualization. More details can be found on seaborn package page

cuda-hybrid 0.1.5 documentation

Data Visualization

Contents

Data Visualization#

Creating graphs#

Comparing runtimes#