This notebook is an appendix to our study. Its aim is to demonstrate the data characteristics of the SMD dataset. To extract this information we are going to perform Exploratory Data Analysis (EDA) on the data, using DataPrep.EDA
[1] which is an easy-to-use tool well integrated into Python and Jupyter Notebook for viewing data characteristics and understanding the data in an interactive way.
[1] Jinglin Peng, Weiyuan Wu, Brandon Lockhart, Song Bian, Jing Nathan Yan, Linghao Xu, Zhixuan Chi, Jeffrey M. Rzeszotarski, and Jiannan Wang. DataPrep.EDA: Task-Centric Exploratory Data Analysis for Statistical Modeling in Python. SIGMOD 2021.
import pandas as pd
from dataprep.eda import create_report
# Ignore warnings
import warnings
warnings.filterwarnings('ignore')
The Server Machine Dataset (SMD) [2] is a minute-based sampled dataset collected over 5 weeks at a large Internet company by Su et. al. The anomalies have been labeled by domain experts based on incident reports. There are 38 channels in the dataset in total.
SMD is a multi-entity dataset of 28 entities where each of them is a different physical unit of the same type. All entites share the same dimensionality and same type of features. For the sake of this characteristic report, we concatenate all the entities into a single dataframe.
ds = pd.read_csv('SMD\ds.csv', index_col=0)
The data analysis can be run with the following command. The report consists of the following sections:
protocol
feature belongs to this category. Here only the Stats, PieChart and Word Frequency tabs carry information, as word length is not important in the case of this feature.create_report(ds)
Number of Variables | 39 |
---|---|
Number of Rows | 1.4168e+06 |
Missing Cells | 0 |
Missing Cells (%) | 0.0% |
Duplicate Rows | 0 |
Duplicate Rows (%) | 0.0% |
Total Size in Memory | 432.4 MB |
Average Row Size in Memory | 320.0 B |
Variable Types |
|
16 and 17 have similar distributions | Similar Distribution |
---|---|
16 and 26 have similar distributions | Similar Distribution |
16 and 28 have similar distributions | Similar Distribution |
16 and 37 have similar distributions | Similar Distribution |
17 and 26 have similar distributions | Similar Distribution |
17 and 28 have similar distributions | Similar Distribution |
17 and 37 have similar distributions | Similar Distribution |
20 and 21 have similar distributions | Similar Distribution |
20 and 27 have similar distributions | Similar Distribution |
20 and 30 have similar distributions | Similar Distribution |
21 and 27 have similar distributions | Similar Distribution |
---|---|
21 and 30 have similar distributions | Similar Distribution |
26 and 28 have similar distributions | Similar Distribution |
26 and 37 have similar distributions | Similar Distribution |
27 and 30 have similar distributions | Similar Distribution |
28 and 37 have similar distributions | Similar Distribution |
34 and 35 have similar distributions | Similar Distribution |
0 is skewed | Skewed |
1 is skewed | Skewed |
2 is skewed | Skewed |
3 is skewed | Skewed |
---|---|
4 is skewed | Skewed |
5 is skewed | Skewed |
6 is skewed | Skewed |
8 is skewed | Skewed |
9 is skewed | Skewed |
10 is skewed | Skewed |
11 is skewed | Skewed |
12 is skewed | Skewed |
13 is skewed | Skewed |
14 is skewed | Skewed |
---|---|
15 is skewed | Skewed |
16 is skewed | Skewed |
17 is skewed | Skewed |
18 is skewed | Skewed |
19 is skewed | Skewed |
20 is skewed | Skewed |
21 is skewed | Skewed |
22 is skewed | Skewed |
23 is skewed | Skewed |
24 is skewed | Skewed |
---|---|
25 is skewed | Skewed |
26 is skewed | Skewed |
27 is skewed | Skewed |
28 is skewed | Skewed |
29 is skewed | Skewed |
30 is skewed | Skewed |
31 is skewed | Skewed |
32 is skewed | Skewed |
33 is skewed | Skewed |
34 is skewed | Skewed |
---|---|
35 is skewed | Skewed |
36 is skewed | Skewed |
37 is skewed | Skewed |
7 has constant value "0.0" | Constant |
7 has constant length 3 | Constant Length |
y has constant length 3 | Constant Length |
4 has 22454 (1.58%) negatives | Negatives |
5 has 17535 (1.24%) negatives | Negatives |
6 has 37258 (2.63%) negatives | Negatives |
25 has 31029 (2.19%) negatives | Negatives |
---|---|
4 has 1024993 (72.34%) zeros | Zeros |
8 has 278472 (19.65%) zeros | Zeros |
9 has 1038811 (73.32%) zeros | Zeros |
10 has 560574 (39.57%) zeros | Zeros |
12 has 879758 (62.09%) zeros | Zeros |
16 has 1412781 (99.71%) zeros | Zeros |
17 has 1414488 (99.84%) zeros | Zeros |
24 has 264534 (18.67%) zeros | Zeros |
26 has 1416779 (100.0%) zeros | Zeros |
28 has 1416399 (99.97%) zeros | Zeros |
---|---|
29 has 290219 (20.48%) zeros | Zeros |
31 has 349628 (24.68%) zeros | Zeros |
32 has 995593 (70.27%) zeros | Zeros |
33 has 98221 (6.93%) zeros | Zeros |
34 has 116955 (8.25%) zeros | Zeros |
35 has 110732 (7.82%) zeros | Zeros |
36 has 1313347 (92.7%) zeros | Zeros |
37 has 1369420 (96.65%) zeros | Zeros |
numerical
Approximate Distinct Count | 1844 |
---|---|
Approximate Unique (%) | 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Memory Size | 22669200 |
Mean | 0.237 |
Minimum | -0.1 |
Maximum | 5 |
Zeros | 70459 |
Zeros (%) | 5.0% |
Negatives | 850 |
Negatives (%) | 0.1% |
Minimum | -0.1 |
---|---|
5-th Percentile | 0.0101 |
Q1 | 0.05263 |
Median | 0.1528 |
Q3 | 0.35 |
95-th Percentile | 0.7368 |
Maximum | 5 |
Range | 5.1 |
IQR | 0.2974 |
Mean | 0.237 |
---|---|
Standard Deviation | 0.2633 |
Variance | 0.06932 |
Sum | 335809.3549 |
Skewness | 2.5409 |
Kurtosis | 12.8042 |
Coefficient of Variation | 1.1108 |
numerical
Approximate Distinct Count | 36881 |
---|---|
Approximate Unique (%) | 2.6% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Memory Size | 22669200 |
Mean | 0.1709 |
Minimum | -0.02017 |
Maximum | 5 |
Zeros | 14547 |
Zeros (%) | 1.0% |
Negatives | 153 |
Negatives (%) | 0.0% |
Minimum | -0.02017 |
---|---|
5-th Percentile | 0.00075736 |
Q1 | 0.01439 |
Median | 0.07727 |
Q3 | 0.2341 |
95-th Percentile | 0.6179 |
Maximum | 5 |
Range | 5.0202 |
IQR | 0.2197 |
Mean | 0.1709 |
---|---|
Standard Deviation | 0.259 |
Variance | 0.06706 |
Sum | 242125.4835 |
Skewness | 4.3766 |
Kurtosis | 39.1504 |
Coefficient of Variation | 1.5153 |
numerical
Approximate Distinct Count | 35087 |
---|---|
Approximate Unique (%) | 2.5% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Memory Size | 22669200 |
Mean | 0.218 |
Minimum | -0.02836 |
Maximum | 5 |
Zeros | 2409 |
Zeros (%) | 0.2% |
Negatives | 686 |
Negatives (%) | 0.0% |
Minimum | -0.02836 |
---|---|
5-th Percentile | 0.001128 |
Q1 | 0.02235 |
Median | 0.1173 |
Q3 | 0.3042 |
95-th Percentile | 0.7431 |
Maximum | 5 |
Range | 5.0284 |
IQR | 0.2819 |
Mean | 0.218 |
---|---|
Standard Deviation | 0.3286 |
Variance | 0.108 |
Sum | 308916.6374 |
Skewness | 5.0615 |
Kurtosis | 45.7576 |
Coefficient of Variation | 1.5072 |
numerical
Approximate Distinct Count | 34010 |
---|---|
Approximate Unique (%) | 2.4% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Memory Size | 22669200 |
Mean | 0.2595 |
Minimum | -0.03027 |
Maximum | 5 |
Zeros | 31854 |
Zeros (%) | 2.2% |
Negatives | 944 |
Negatives (%) | 0.1% |
Minimum | -0.03027 |
---|---|
5-th Percentile | 0.00097495 |
Q1 | 0.02277 |
Median | 0.1421 |
Q3 | 0.3893 |
95-th Percentile | 0.8087 |
Maximum | 5 |
Range | 5.0303 |
IQR | 0.3665 |
Mean | 0.2595 |
---|---|
Standard Deviation | 0.391 |
Variance | 0.1529 |
Sum | 367714.7863 |
Skewness | 5.2495 |
Kurtosis | 46.9588 |
Coefficient of Variation | 1.5064 |
numerical
Approximate Distinct Count | 296 |
---|---|
Approximate Unique (%) | 0.0% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Memory Size | 22669200 |
Mean | 0.1194 |
Minimum | -4 |
Maximum | 1.2267 |
Zeros | 1024993 |
Zeros (%) | 72.3% |
Negatives | 22454 |
Negatives (%) | 1.6% |
Minimum | -4 |
---|---|
5-th Percentile | 0 |
Q1 | 0 |
Median | 0 |
Q3 | 0.1579 |
95-th Percentile | 1 |
Maximum | 1.2267 |
Range | 5.2267 |
IQR | 0.1579 |
Mean | 0.1194 |
---|---|
Standard Deviation | 0.6249 |
Variance | 0.3906 |
Sum | 169122.2118 |
Skewness | -4.1754 |
Kurtosis | 27.3869 |
Coefficient of Variation | 5.2355 |
numerical
Approximate Distinct Count | 163678 |
---|---|
Approximate Unique (%) | 11.6% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Memory Size | 22669200 |
Mean | 0.699 |
Minimum | -4 |
Maximum | 2.2833 |
Zeros | 70 |
Zeros (%) | 0.0% |
Negatives | 17535 |
Negatives (%) | 1.2% |
Minimum | -4 |
---|---|
5-th Percentile | 0.15 |
Q1 | 0.5324 |
Median | 0.7887 |
Q3 | 0.9713 |
95-th Percentile | 1 |
Maximum | 2.2833 |
Range | 6.2833 |
IQR | 0.4389 |
Mean | 0.699 |
---|---|
Standard Deviation | 0.4761 |
Variance | 0.2266 |
Sum | 990381.451 |
Skewness | -5.6769 |
Kurtosis | 53.4048 |
Coefficient of Variation | 0.6811 |
numerical
Approximate Distinct Count | 79900 |
---|---|
Approximate Unique (%) | 5.6% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Memory Size | 22669200 |
Mean | 0.4705 |
Minimum | -4 |
Maximum | 5 |
Zeros | 235 |
Zeros (%) | 0.0% |
Negatives | 37258 |
Negatives (%) | 2.6% |
Minimum | -4 |
---|---|
5-th Percentile | 0.07098 |
Q1 | 0.2921 |
Median | 0.4885 |
Q3 | 0.7359 |
95-th Percentile | 1.0286 |
Maximum | 5 |
Range | 9 |
IQR | 0.4438 |
Mean | 0.4705 |
---|---|
Standard Deviation | 0.7978 |
Variance | 0.6365 |
Sum | 666647.0109 |
Skewness | -2.4515 |
Kurtosis | 22.3142 |
Coefficient of Variation | 1.6956 |
categorical
Approximate Distinct Count | 1 |
---|---|
Approximate Unique (%) | 0.0% |
Missing | 0 |
Missing (%) | 0.0% |
Memory Size | 96344100 |
Mean | 3 |
---|---|
Standard Deviation | 0 |
Median | 3 |
Minimum | 3 |
Maximum | 3 |
1st row | 0.0 |
---|---|
2nd row | 0.0 |
3rd row | 0.0 |
4th row | 0.0 |
5th row | 0.0 |
Count | 0 |
---|---|
Lowercase Letter | 0 |
Space Separator | 0 |
Uppercase Letter | 0 |
Dash Punctuation | 0 |
Decimal Number | 2833650 |
numerical
Approximate Distinct Count | 29175 |
---|---|
Approximate Unique (%) | 2.1% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Memory Size | 22669200 |
Mean | 0.04985 |
Minimum | -0.002441 |
Maximum | 5 |
Zeros | 278472 |
Zeros (%) | 19.7% |
Negatives | 375 |
Negatives (%) | 0.0% |
Minimum | -0.002441 |
---|---|
5-th Percentile | 0 |
Q1 | 0.00022 |
Median | 0.005878 |
Q3 | 0.04263 |
95-th Percentile | 0.2648 |
Maximum | 5 |
Range | 5.0024 |
IQR | 0.04241 |
Mean | 0.04985 |
---|---|
Standard Deviation | 0.1141 |
Variance | 0.01302 |
Sum | 70622.6875 |
Skewness | 8.5283 |
Kurtosis | 230.0325 |
Coefficient of Variation | 2.2895 |
numerical
Approximate Distinct Count | 39779 |
---|---|
Approximate Unique (%) | 2.8% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Memory Size | 22669200 |
Mean | 0.007438 |
Minimum | 0 |
Maximum | 5 |
Zeros | 1038811 |
Zeros (%) | 73.3% |
Negatives | 0 |
Negatives (%) | 0.0% |
Minimum | 0 |
---|---|
5-th Percentile | 0 |
Q1 | 0 |
Median | 0 |
Q3 | 1.9e-05 |
95-th Percentile | 0.00931 |
Maximum | 5 |
Range | 5 |
IQR | 1.9e-05 |
Mean | 0.007438 |
---|---|
Standard Deviation | 0.1072 |
Variance | 0.01149 |
Sum | 10538.7316 |
Skewness | 36.8187 |
Kurtosis | 1589.1999 |
Coefficient of Variation | 14.4079 |
numerical
Approximate Distinct Count | 203316 |
---|---|
Approximate Unique (%) | 14.3% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Memory Size | 22669200 |
Mean | 0.08062 |
Minimum | -0.004525 |
Maximum | 5 |
Zeros | 560574 |
Zeros (%) | 39.6% |
Negatives | 2 |
Negatives (%) | 0.0% |
Minimum | -0.004525 |
---|---|
5-th Percentile | 0 |
Q1 | 0 |
Median | 0.03075 |
Q3 | 0.1108 |
95-th Percentile | 0.3329 |
Maximum | 5 |
Range | 5.0045 |
IQR | 0.1108 |
Mean | 0.08062 |
---|---|
Standard Deviation | 0.1451 |
Variance | 0.02106 |
Sum | 114228.4369 |
Skewness | 8.5608 |
Kurtosis | 191.2875 |
Coefficient of Variation | 1.8 |
numerical
Approximate Distinct Count | 4245 |
---|---|
Approximate Unique (%) | 0.3% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Memory Size | 22669200 |
Mean | 0.09614 |
Minimum | -0.02174 |
Maximum | 5 |
Zeros | 36028 |
Zeros (%) | 2.5% |
Negatives | 10 |
Negatives (%) | 0.0% |
Minimum | -0.02174 |
---|---|
5-th Percentile | 0.000222 |
Q1 | 0.02264 |
Median | 0.06122 |
Q3 | 0.125 |
95-th Percentile | 0.32 |
Maximum | 5 |
Range | 5.0217 |
IQR | 0.1024 |
Mean | 0.09614 |
---|---|
Standard Deviation | 0.1155 |
Variance | 0.01333 |
Sum | 136214.57 |
Skewness | 3.8444 |
Kurtosis | 55.2878 |
Coefficient of Variation | 1.2011 |
numerical
Approximate Distinct Count | 827 |
---|---|
Approximate Unique (%) | 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Memory Size | 22669200 |
Mean | 0.04265 |
Minimum | -0.02941 |
Maximum | 5 |
Zeros | 879758 |
Zeros (%) | 62.1% |
Negatives | 174 |
Negatives (%) | 0.0% |
Minimum | -0.02941 |
---|---|
5-th Percentile | 0 |
Q1 | 0 |
Median | 0 |
Q3 | 0.04494 |
95-th Percentile | 0.25 |
Maximum | 5 |
Range | 5.0294 |
IQR | 0.04494 |
Mean | 0.04265 |
---|---|
Standard Deviation | 0.1106 |
Variance | 0.01223 |
Sum | 60433.3107 |
Skewness | 10.0452 |
Kurtosis | 269.9524 |
Coefficient of Variation | 2.5924 |
numerical
Approximate Distinct Count | 307718 |
---|---|
Approximate Unique (%) | 21.7% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Memory Size | 22669200 |
Mean | 0.1448 |
Minimum | -0.07564 |
Maximum | 5 |
Zeros | 37 |
Zeros (%) | 0.0% |
Negatives | 5592 |
Negatives (%) | 0.4% |
Minimum | -0.07564 |
---|---|
5-th Percentile | 0.004073 |
Q1 | 0.03103 |
Median | 0.07188 |
Q3 | 0.1838 |
95-th Percentile | 0.5656 |
Maximum | 5 |
Range | 5.0756 |
IQR | 0.1528 |
Mean | 0.1448 |
---|---|
Standard Deviation | 0.1855 |
Variance | 0.03441 |
Sum | 205213.1103 |
Skewness | 2.7447 |
Kurtosis | 21.6071 |
Coefficient of Variation | 1.2807 |
numerical
Approximate Distinct Count | 98543 |
---|---|
Approximate Unique (%) | 7.0% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Memory Size | 22669200 |
Mean | 0.07671 |
Minimum | -0.01048 |
Maximum | 3.5318 |
Zeros | 28780 |
Zeros (%) | 2.0% |
Negatives | 260 |
Negatives (%) | 0.0% |
Minimum | -0.01048 |
---|---|
5-th Percentile | 0.000185 |
Q1 | 0.007271 |
Median | 0.02802 |
Q3 | 0.1066 |
95-th Percentile | 0.2883 |
Maximum | 3.5318 |
Range | 3.5423 |
IQR | 0.0993 |
Mean | 0.07671 |
---|---|
Standard Deviation | 0.1166 |
Variance | 0.0136 |
Sum | 108683.6301 |
Skewness | 3.3068 |
Kurtosis | 20.7888 |
Coefficient of Variation | 1.5205 |
numerical
Approximate Distinct Count | 236974 |
---|---|
Approximate Unique (%) | 16.7% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Memory Size | 22669200 |
Mean | 0.1095 |
Minimum | -0.01854 |
Maximum | 5 |
Zeros | 122 |
Zeros (%) | 0.0% |
Negatives | 297 |
Negatives (%) | 0.0% |
Minimum | -0.01854 |
---|---|
5-th Percentile | 0.0007693 |
Q1 | 0.01572 |
Median | 0.06394 |
Q3 | 0.1356 |
95-th Percentile | 0.3563 |
Maximum | 5 |
Range | 5.0185 |
IQR | 0.1198 |
Mean | 0.1095 |
---|---|
Standard Deviation | 0.1873 |
Variance | 0.03508 |
Sum | 155120.7804 |
Skewness | 9.0982 |
Kurtosis | 163.1029 |
Coefficient of Variation | 1.7108 |
numerical
Approximate Distinct Count | 286 |
---|---|
Approximate Unique (%) | 0.0% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Memory Size | 22669200 |
Mean | 0.00046563 |
Minimum | 0 |
Maximum | 5 |
Zeros | 1412781 |
Zeros (%) | 99.7% |
Negatives | 0 |
Negatives (%) | 0.0% |
Minimum | 0 |
---|---|
5-th Percentile | 0 |
Q1 | 0 |
Median | 0 |
Q3 | 0 |
95-th Percentile | 0 |
Maximum | 5 |
Range | 5 |
IQR | 0 |
Mean | 0.00046563 |
---|---|
Standard Deviation | 0.04034 |
Variance | 0.001627 |
Sum | 659.7188 |
Skewness | 113.9587 |
Kurtosis | 13582.1409 |
Coefficient of Variation | 86.6288 |
numerical
Approximate Distinct Count | 348 |
---|---|
Approximate Unique (%) | 0.0% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Memory Size | 22669200 |
Mean | 0.00058212 |
Minimum | 0 |
Maximum | 5 |
Zeros | 1414488 |
Zeros (%) | 99.8% |
Negatives | 0 |
Negatives (%) | 0.0% |
Minimum | 0 |
---|---|
5-th Percentile | 0 |
Q1 | 0 |
Median | 0 |
Q3 | 0 |
95-th Percentile | 0 |
Maximum | 5 |
Range | 5 |
IQR | 0 |
Mean | 0.00058212 |
---|---|
Standard Deviation | 0.03762 |
Variance | 0.001415 |
Sum | 824.7551 |
Skewness | 108.1668 |
Kurtosis | 13168.8969 |
Coefficient of Variation | 64.6238 |
numerical
Approximate Distinct Count | 749827 |
---|---|
Approximate Unique (%) | 52.9% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Memory Size | 22669200 |
Mean | 0.2776 |
Minimum | -0.1237 |
Maximum | 5 |
Zeros | 116 |
Zeros (%) | 0.0% |
Negatives | 1579 |
Negatives (%) | 0.1% |
Minimum | -0.1237 |
---|---|
5-th Percentile | 0.003805 |
Q1 | 0.04892 |
Median | 0.1929 |
Q3 | 0.4463 |
95-th Percentile | 0.816 |
Maximum | 5 |
Range | 5.1237 |
IQR | 0.3974 |
Mean | 0.2776 |
---|---|
Standard Deviation | 0.289 |
Variance | 0.0835 |
Sum | 393244.6778 |
Skewness | 2.1436 |
Kurtosis | 12.7118 |
Coefficient of Variation | 1.0411 |
numerical
Approximate Distinct Count | 668912 |
---|---|
Approximate Unique (%) | 47.2% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Memory Size | 22669200 |
Mean | 0.2507 |
Minimum | -0.1202 |
Maximum | 5 |
Zeros | 493 |
Zeros (%) | 0.0% |
Negatives | 2089 |
Negatives (%) | 0.1% |
Minimum | -0.1202 |
---|---|
5-th Percentile | 0.004766 |
Q1 | 0.03823 |
Median | 0.1618 |
Q3 | 0.4182 |
95-th Percentile | 0.7492 |
Maximum | 5 |
Range | 5.1202 |
IQR | 0.38 |
Mean | 0.2507 |
---|---|
Standard Deviation | 0.2637 |
Variance | 0.06955 |
Sum | 355254.2388 |
Skewness | 1.7121 |
Kurtosis | 7.9645 |
Coefficient of Variation | 1.0518 |
numerical
Approximate Distinct Count | 371989 |
---|---|
Approximate Unique (%) | 26.3% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Memory Size | 22669200 |
Mean | 0.2868 |
Minimum | -0.1211 |
Maximum | 5 |
Zeros | 238 |
Zeros (%) | 0.0% |
Negatives | 1471 |
Negatives (%) | 0.1% |
Minimum | -0.1211 |
---|---|
5-th Percentile | 0.00908 |
Q1 | 0.06422 |
Median | 0.2032 |
Q3 | 0.4674 |
95-th Percentile | 0.7797 |
Maximum | 5 |
Range | 5.1211 |
IQR | 0.4032 |
Mean | 0.2868 |
---|---|
Standard Deviation | 0.281 |
Variance | 0.07899 |
Sum | 406290.3428 |
Skewness | 2.0053 |
Kurtosis | 11.6484 |
Coefficient of Variation | 0.9801 |
numerical
Approximate Distinct Count | 372735 |
---|---|
Approximate Unique (%) | 26.3% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Memory Size | 22669200 |
Mean | 0.2851 |
Minimum | -0.1333 |
Maximum | 3.6841 |
Zeros | 431 |
Zeros (%) | 0.0% |
Negatives | 1819 |
Negatives (%) | 0.1% |
Minimum | -0.1333 |
---|---|
5-th Percentile | 0.01154 |
Q1 | 0.06369 |
Median | 0.203 |
Q3 | 0.4677 |
95-th Percentile | 0.7955 |
Maximum | 3.6841 |
Range | 3.8174 |
IQR | 0.404 |
Mean | 0.2851 |
---|---|
Standard Deviation | 0.2698 |
Variance | 0.07281 |
Sum | 403886.5177 |
Skewness | 1.3783 |
Kurtosis | 3.6382 |
Coefficient of Variation | 0.9466 |
numerical
Approximate Distinct Count | 282912 |
---|---|
Approximate Unique (%) | 20.0% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Memory Size | 22669200 |
Mean | 0.2734 |
Minimum | -0.1636 |
Maximum | 5 |
Zeros | 33562 |
Zeros (%) | 2.4% |
Negatives | 11053 |
Negatives (%) | 0.8% |
Minimum | -0.1636 |
---|---|
5-th Percentile | 0.001422 |
Q1 | 0.03847 |
Median | 0.1667 |
Q3 | 0.4101 |
95-th Percentile | 0.9357 |
Maximum | 5 |
Range | 5.1636 |
IQR | 0.3716 |
Mean | 0.2734 |
---|---|
Standard Deviation | 0.3113 |
Variance | 0.09688 |
Sum | 387331.2179 |
Skewness | 2.2549 |
Kurtosis | 15.884 |
Coefficient of Variation | 1.1386 |
numerical
Approximate Distinct Count | 57646 |
---|---|
Approximate Unique (%) | 4.1% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Memory Size | 22669200 |
Mean | 0.4545 |
Minimum | -4 |
Maximum | 5 |
Zeros | 13307 |
Zeros (%) | 0.9% |
Negatives | 13500 |
Negatives (%) | 1.0% |
Minimum | -4 |
---|---|
5-th Percentile | 0.004011 |
Q1 | 0.1596 |
Median | 0.4166 |
Q3 | 0.7825 |
95-th Percentile | 0.9587 |
Maximum | 5 |
Range | 9 |
IQR | 0.6229 |
Mean | 0.4545 |
---|---|
Standard Deviation | 0.3717 |
Variance | 0.1382 |
Sum | 643989.2317 |
Skewness | 0.09406 |
Kurtosis | 5.6031 |
Coefficient of Variation | 0.8178 |
numerical
Approximate Distinct Count | 10911 |
---|---|
Approximate Unique (%) | 0.8% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Memory Size | 22669200 |
Mean | 0.1858 |
Minimum | -0.1243 |
Maximum | 5 |
Zeros | 264534 |
Zeros (%) | 18.7% |
Negatives | 1418 |
Negatives (%) | 0.1% |
Minimum | -0.1243 |
---|---|
5-th Percentile | 0 |
Q1 | 0.01967 |
Median | 0.08812 |
Q3 | 0.2794 |
95-th Percentile | 0.636 |
Maximum | 5 |
Range | 5.1243 |
IQR | 0.2597 |
Mean | 0.1858 |
---|---|
Standard Deviation | 0.2286 |
Variance | 0.05226 |
Sum | 263184.4349 |
Skewness | 2.4317 |
Kurtosis | 18.319 |
Coefficient of Variation | 1.2307 |
numerical
Approximate Distinct Count | 45952 |
---|---|
Approximate Unique (%) | 3.2% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Memory Size | 22669200 |
Mean | 0.4243 |
Minimum | -4 |
Maximum | 5 |
Zeros | 26584 |
Zeros (%) | 1.9% |
Negatives | 31029 |
Negatives (%) | 2.2% |
Minimum | -4 |
---|---|
5-th Percentile | 0.00047047 |
Q1 | 0.1443 |
Median | 0.3867 |
Q3 | 0.6884 |
95-th Percentile | 0.9843 |
Maximum | 5 |
Range | 9 |
IQR | 0.5441 |
Mean | 0.4243 |
---|---|
Standard Deviation | 0.3673 |
Variance | 0.1349 |
Sum | 601168.6073 |
Skewness | -0.02471 |
Kurtosis | 5.8152 |
Coefficient of Variation | 0.8657 |
numerical
Approximate Distinct Count | 16 |
---|---|
Approximate Unique (%) | 0.0% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Memory Size | 22669200 |
Mean | 2.4122e-05 |
Minimum | 0 |
Maximum | 2 |
Zeros | 1416779 |
Zeros (%) | 100.0% |
Negatives | 0 |
Negatives (%) | 0.0% |
Minimum | 0 |
---|---|
5-th Percentile | 0 |
Q1 | 0 |
Median | 0 |
Q3 | 0 |
95-th Percentile | 0 |
Maximum | 2 |
Range | 2 |
IQR | 0 |
Mean | 2.4122e-05 |
---|---|
Standard Deviation | 0.004775 |
Variance | 2.2801e-05 |
Sum | 34.1765 |
Skewness | 227.1958 |
Kurtosis | 58723.3344 |
Coefficient of Variation | 197.956 |
numerical
Approximate Distinct Count | 356059 |
---|---|
Approximate Unique (%) | 25.1% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Memory Size | 22669200 |
Mean | 0.2886 |
Minimum | -0.147 |
Maximum | 5 |
Zeros | 170 |
Zeros (%) | 0.0% |
Negatives | 1766 |
Negatives (%) | 0.1% |
Minimum | -0.147 |
---|---|
5-th Percentile | 0.01211 |
Q1 | 0.06682 |
Median | 0.2045 |
Q3 | 0.4644 |
95-th Percentile | 0.7957 |
Maximum | 5 |
Range | 5.147 |
IQR | 0.3976 |
Mean | 0.2886 |
---|---|
Standard Deviation | 0.2818 |
Variance | 0.0794 |
Sum | 408828.4148 |
Skewness | 1.936 |
Kurtosis | 10.4887 |
Coefficient of Variation | 0.9765 |
numerical
Approximate Distinct Count | 380 |
---|---|
Approximate Unique (%) | 0.0% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Memory Size | 22669200 |
Mean | 4.9993e-05 |
Minimum | 0 |
Maximum | 1.2138 |
Zeros | 1416399 |
Zeros (%) | 100.0% |
Negatives | 0 |
Negatives (%) | 0.0% |
Minimum | 0 |
---|---|
5-th Percentile | 0 |
Q1 | 0 |
Median | 0 |
Q3 | 0 |
95-th Percentile | 0 |
Maximum | 1.2138 |
Range | 1.2138 |
IQR | 0 |
Mean | 4.9993e-05 |
---|---|
Standard Deviation | 0.00565 |
Variance | 3.192e-05 |
Sum | 70.8318 |
Skewness | 138.0468 |
Kurtosis | 20984.151 |
Coefficient of Variation | 113.0107 |
numerical
Approximate Distinct Count | 1420 |
---|---|
Approximate Unique (%) | 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Memory Size | 22669200 |
Mean | 0.2019 |
Minimum | -0.1013 |
Maximum | 5 |
Zeros | 290219 |
Zeros (%) | 20.5% |
Negatives | 9 |
Negatives (%) | 0.0% |
Minimum | -0.1013 |
---|---|
5-th Percentile | 0 |
Q1 | 0.008 |
Median | 0.08621 |
Q3 | 0.25 |
95-th Percentile | 1 |
Maximum | 5 |
Range | 5.1013 |
IQR | 0.242 |
Mean | 0.2019 |
---|---|
Standard Deviation | 0.2878 |
Variance | 0.08281 |
Sum | 286080.9242 |
Skewness | 2.3134 |
Kurtosis | 9.9679 |
Coefficient of Variation | 1.4252 |
numerical
Approximate Distinct Count | 359684 |
---|---|
Approximate Unique (%) | 25.4% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Memory Size | 22669200 |
Mean | 0.2897 |
Minimum | -0.1333 |
Maximum | 3.8545 |
Zeros | 291 |
Zeros (%) | 0.0% |
Negatives | 2075 |
Negatives (%) | 0.1% |
Minimum | -0.1333 |
---|---|
5-th Percentile | 0.01247 |
Q1 | 0.06882 |
Median | 0.2179 |
Q3 | 0.4656 |
95-th Percentile | 0.7968 |
Maximum | 3.8545 |
Range | 3.9879 |
IQR | 0.3968 |
Mean | 0.2897 |
---|---|
Standard Deviation | 0.2693 |
Variance | 0.07252 |
Sum | 410428.0661 |
Skewness | 1.4425 |
Kurtosis | 4.2629 |
Coefficient of Variation | 0.9296 |
numerical
Approximate Distinct Count | 6847 |
---|---|
Approximate Unique (%) | 0.5% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Memory Size | 22669200 |
Mean | 0.1928 |
Minimum | -0.09112 |
Maximum | 5 |
Zeros | 349628 |
Zeros (%) | 24.7% |
Negatives | 1024 |
Negatives (%) | 0.1% |
Minimum | -0.09112 |
---|---|
5-th Percentile | 0 |
Q1 | 0.006543 |
Median | 0.1329 |
Q3 | 0.3086 |
95-th Percentile | 0.6137 |
Maximum | 5 |
Range | 5.0911 |
IQR | 0.3021 |
Mean | 0.1928 |
---|---|
Standard Deviation | 0.2223 |
Variance | 0.0494 |
Sum | 273130.6066 |
Skewness | 3.013 |
Kurtosis | 37.9967 |
Coefficient of Variation | 1.1529 |
numerical
Approximate Distinct Count | 238 |
---|---|
Approximate Unique (%) | 0.0% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Memory Size | 22669200 |
Mean | 0.05249 |
Minimum | -0.25 |
Maximum | 5 |
Zeros | 995593 |
Zeros (%) | 70.3% |
Negatives | 28 |
Negatives (%) | 0.0% |
Minimum | -0.25 |
---|---|
5-th Percentile | 0 |
Q1 | 0 |
Median | 0 |
Q3 | 0.02326 |
95-th Percentile | 0.25 |
Maximum | 5 |
Range | 5.25 |
IQR | 0.02326 |
Mean | 0.05249 |
---|---|
Standard Deviation | 0.1239 |
Variance | 0.01535 |
Sum | 74362.7734 |
Skewness | 8.1927 |
Kurtosis | 235.8816 |
Coefficient of Variation | 2.3609 |
numerical
Approximate Distinct Count | 2242 |
---|---|
Approximate Unique (%) | 0.2% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Memory Size | 22669200 |
Mean | 0.08535 |
Minimum | -0.3999 |
Maximum | 5 |
Zeros | 98221 |
Zeros (%) | 6.9% |
Negatives | 1957 |
Negatives (%) | 0.1% |
Minimum | -0.3999 |
---|---|
5-th Percentile | 0 |
Q1 | 0.01087 |
Median | 0.0375 |
Q3 | 0.08591 |
95-th Percentile | 0.3684 |
Maximum | 5 |
Range | 5.3999 |
IQR | 0.07504 |
Mean | 0.08535 |
---|---|
Standard Deviation | 0.1531 |
Variance | 0.02345 |
Sum | 120921.7883 |
Skewness | 7.1375 |
Kurtosis | 148.7047 |
Coefficient of Variation | 1.7943 |
numerical
Approximate Distinct Count | 80798 |
---|---|
Approximate Unique (%) | 5.7% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Memory Size | 22669200 |
Mean | 0.2932 |
Minimum | -0.2485 |
Maximum | 5 |
Zeros | 116955 |
Zeros (%) | 8.3% |
Negatives | 2009 |
Negatives (%) | 0.1% |
Minimum | -0.2485 |
---|---|
5-th Percentile | 0 |
Q1 | 0.08751 |
Median | 0.2123 |
Q3 | 0.4731 |
95-th Percentile | 0.8137 |
Maximum | 5 |
Range | 5.2485 |
IQR | 0.3856 |
Mean | 0.2932 |
---|---|
Standard Deviation | 0.264 |
Variance | 0.0697 |
Sum | 415434.0529 |
Skewness | 1.6971 |
Kurtosis | 14.2395 |
Coefficient of Variation | 0.9004 |
numerical
Approximate Distinct Count | 81247 |
---|---|
Approximate Unique (%) | 5.7% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Memory Size | 22669200 |
Mean | 0.289 |
Minimum | -0.2485 |
Maximum | 5 |
Zeros | 110732 |
Zeros (%) | 7.8% |
Negatives | 2008 |
Negatives (%) | 0.1% |
Minimum | -0.2485 |
---|---|
5-th Percentile | 0 |
Q1 | 0.07296 |
Median | 0.208 |
Q3 | 0.4719 |
95-th Percentile | 0.8049 |
Maximum | 5 |
Range | 5.2485 |
IQR | 0.3989 |
Mean | 0.289 |
---|---|
Standard Deviation | 0.2652 |
Variance | 0.07033 |
Sum | 409511.5784 |
Skewness | 1.685 |
Kurtosis | 14.0013 |
Coefficient of Variation | 0.9175 |
numerical
Approximate Distinct Count | 51874 |
---|---|
Approximate Unique (%) | 3.7% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Memory Size | 22669200 |
Mean | 0.02269 |
Minimum | 0 |
Maximum | 1.5474 |
Zeros | 1313347 |
Zeros (%) | 92.7% |
Negatives | 0 |
Negatives (%) | 0.0% |
Minimum | 0 |
---|---|
5-th Percentile | 0 |
Q1 | 0 |
Median | 0 |
Q3 | 0 |
95-th Percentile | 0.2557 |
Maximum | 1.5474 |
Range | 1.5474 |
IQR | 0 |
Mean | 0.02269 |
---|---|
Standard Deviation | 0.09487 |
Variance | 0.009 |
Sum | 32151.211 |
Skewness | 4.6221 |
Kurtosis | 22.0397 |
Coefficient of Variation | 4.1807 |
numerical
Approximate Distinct Count | 31728 |
---|---|
Approximate Unique (%) | 2.2% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Memory Size | 22669200 |
Mean | 0.01113 |
Minimum | 0 |
Maximum | 1.1255 |
Zeros | 1369420 |
Zeros (%) | 96.7% |
Negatives | 0 |
Negatives (%) | 0.0% |
Minimum | 0 |
---|---|
5-th Percentile | 0 |
Q1 | 0 |
Median | 0 |
Q3 | 0 |
95-th Percentile | 0 |
Maximum | 1.1255 |
Range | 1.1255 |
IQR | 0 |
Mean | 0.01113 |
---|---|
Standard Deviation | 0.06904 |
Variance | 0.004766 |
Sum | 15764.503 |
Skewness | 6.9768 |
Kurtosis | 51.4064 |
Coefficient of Variation | 6.2045 |
categorical
Approximate Distinct Count | 2 |
---|---|
Approximate Unique (%) | 0.0% |
Missing | 0 |
Missing (%) | 0.0% |
Memory Size | 96344100 |
Mean | 3 |
---|---|
Standard Deviation | 0 |
Median | 3 |
Minimum | 3 |
Maximum | 3 |
1st row | 0.0 |
---|---|
2nd row | 0.0 |
3rd row | 0.0 |
4th row | 0.0 |
5th row | 0.0 |
Count | 0 |
---|---|
Lowercase Letter | 0 |
Space Separator | 0 |
Uppercase Letter | 0 |
Dash Punctuation | 0 |
Decimal Number | 2833650 |
The reports shows that 37 of 38 input features are numerical, with the one exception being just a unique value in the whole dataset (for every timestep and every entity). The features show a Normal distribution of values, with many of them where only half of the Bell curve is present (the values are centered around 0 but only positive measurements are possible). The box-plot representations show a high number of potential anomalies.The dataset has no missing values.
Looking at the Correlation Matrix we observe that the vast majority of features are positively correlated, with a number of them showing a high amount of correlation.