Data Visualization
Contents
Data Visualization#
After all data has been generated, a graph is usually created to help visualize the relationship and draw some conclusions
about the dataset. This document will show how to do this using seaborn
package.
The use case described here is from the paper Modelling the Joint Effect of Social Determinants and Peers on Obesity Among Canadian Adults which is described in Customizing Node Values. The model used in the paper will be referred as obesity model.
Creating graphs#
Using seaborn
to graph will require an object of class pandas.DataFrame
.
A csv file that has data collected from running the obesity model with different values of p, 20 replications each, is used here. The model was also run with different types of networks, small world and scale-free, which can be indicated from the Type column. The first five rows of the file:
p |
Network Type |
variable |
Obesity |
|
0 |
0.0 |
Small world |
Run 1 |
0.944109272 |
1 |
0.1 |
Small world |
Run 1 |
0.946973076 |
2 |
0.2 |
Small world |
Run 1 |
0.945395483 |
3 |
0.4 |
Small world |
Run 1 |
0.941793814 |
4 |
0.5 |
Small world |
Run 1 |
0.924613951 |
The complete dataset can be found here: obesity_serial.csv
The file can be loaded to a panda dataframe using pandas.pd.read_csv()
:
import pandas as pd
runs_data = pd.read_csv("obesity_serial.csv")
The graph can be plotted and shown using both seaborn
and matplotlib
. In this case, line plot is used. The complete
code is shown below:
import seaborn as sns
import pandas as pd
import matplotlib.pyplot as plt
runs = pd.read_csv("obesity_serial.csv")
sns.set_style("darkgrid")
sns.lineplot(data=runs, x="p", y="Obesity", hue="Network Type",
err_style="bars")
plt.show()
The result graph is displayed below:
Comparing runtimes#
Graphs can be used to compare the runtime across multiple categories.
In order to explore how different types of networks influence the runtime of the model for the parallel version, data was collected by running the obesity model with different types of networks and increasing number of agents. The data are then organized into a csv file. The first five rows are displayed below:
Number of Agents |
Type |
Time (minutes) |
1000 |
barabasiGreedy |
0.191486667 |
5000 |
barabasiGreedy |
1.3587 |
10000 |
barabasiGreedy |
4.947505556 |
40000 |
barabasiGreedy |
88.92053333 |
The complete csv file can be found here: parallel_times.csv
In order to use the above file for graphing, a pandas dataframe should be created and then seaborn
and matplotlib
to graph it. Parameter hue
can be set as Type
to group the data. Parameter
import seaborn as sns
import pandas as pd
import matplotlib.pyplot as plt
runs = pd.read_csv("parallel_times.csv")
sns.set_style("darkgrid")
sns.lineplot(data=runs, x="Agents", y="Time(minutes)", hue="Type", err_style="bars")
plt.show()
The resulted graph is displayed below:
There are different ways to organize the dataset to help with data Visualization. More details can be found on seaborn package page