시계열 데이터

pandas를 이용해 시간의 흐름에 따라 변화하는 시계열 데이터를 다루는 방법을 알아보겠습니다.

예제 데이터

예제 데이터를 엽니다.

import pandas as pd
df = pd.read_csv('https://github.com/mwaskom/seaborn-data/raw/master/flights.csv')

df.head()

	year	month	passengers
0	1949	January	112
1	1949	February	118
2	1949	March	132
3	1949	April	129
4	1949	May	121

map

January, February 등과 같이 영어로 된 달 이름을 1, 2, ..와 같이 수로 바꾸겠습니다. 먼저 아래와 같이 사전을 만듭니다.

month2int = {
    'January': 1,
    'February': 2,
    'March': 3,
    'April': 4,
    'May': 5,
    'June': 6,
    'July': 7,
    'August': 8,
    'September': 9,
    'October': 10,
    'November': 11,
    'December': 12
}

map 메소드를 이용하면 사전을 이용해서 컬럼의 값을 일괄변환할 수 있습니다.

df['month'] = df['month'].map(month2int)

month 컬럼이 수로 바뀐 걸을 볼 수 있습니다.

df.head()

	year	month	passengers
0	1949	1	112
1	1949	2	118
2	1949	3	132
3	1949	4	129
4	1949	5	121

날짜 만들기

이제 year와 month를 합쳐서 날짜로 만들어보겠습니다. 우선 day 컬럼을 만들고 이 컬럼을 모두 1로 채웁니다.

df['day'] = 1

pd.to_datetime 함수를 사용하면 연, 월, 일, 세 컬럼을 모아 날짜 형식으로 변환할 수 있습니다.

df['date'] = pd.to_datetime(df[['year', 'month', 'day']])

시각화

pandas의 내장된 시각화 기능으로 날짜별로 승객(passengers)의 변화를 그래프로 그려보겠습니다.

df.plot(x='date', y='passengers')

<matplotlib.axes._subplots.AxesSubplot at 0x17edf56cc88>

이동평균

rolling 메소드를 이용하면 구간별로 통계를 낼 수 있습니다. 아래와 같이 하면 12개월 단위로 이동평균을 구합니다.

df['1y'] = df['passengers'].rolling(window=12).mean()

월별 승객 수(passengers)의 그래프에 12개월 이동 평균선을 빨간색으로 덧그립니다.

ax = df.plot(x='date', y='passengers')
df.plot(x='date', y='1y', color='red', ax=ax)

<matplotlib.axes._subplots.AxesSubplot at 0x17edf767748>