도수분포표와 히스토그램

범주 변수

다음과 같이 혈액형이 있다고 하자.

blood = ['A', 'A', 'A', 'B', 'B', 'AB', 'O']

numpy를 이용한 도수 분포표

import numpy as np
np.unique(blood, return_counts=True)

(array(['A', 'AB', 'B', 'O'], dtype='<U2'), array([3, 1, 2, 1], dtype=int64))

pandas를 이용하는 방법

import pandas as pd
pd.Series(blood).value_counts()

A     3
B     2
O     1
AB    1
dtype: int64

시각화:

import seaborn as sns
sns.countplot(blood)

<matplotlib.axes._subplots.AxesSubplot at 0x151634bb0b8>

x = [1, 1, 1, 2, 3, 5, 5, 7, 8, 9]

데이터의 범위를 4구간으로 나눔

hist, edges = np.histogram(x, 4)

edges

array([1., 3., 5., 7., 9.])

첫번째 구간 1~3의 빈도는 4, 두번째 구간 3~5의 빈도는 1, 등등

hist

array([4, 1, 2, 3], dtype=int64)

시각화:

sns.distplot(x, bins=4, kde=False)

<matplotlib.axes._subplots.AxesSubplot at 0x15164622198>