pandas 如何在时间序列数据上“get_dummies"

pandas how to #39;get_dummies#39; on time series data( pandas 如何在时间序列数据上“get_dummies)
本文介绍了 pandas 如何在时间序列数据上“get_dummies"的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着跟版网的小编来一起学习吧!

问题描述

如果我有一些时间序列数据:(弥补)

If I had some time series data: (make some up)

import numpy as np
import pandas as pd
np.random.seed(11)

rows,cols = 50000,2
data = np.random.rand(rows,cols) 
tidx = pd.date_range('2019-01-01', periods=rows, freq='H') 
df = pd.DataFrame(data, columns=['Temperature','Value'], index=tidx)

我如何使用 get_dummies?只看Pandas 文档,我不知道我是否可以申请我如何制作一种热编码.

How could I utilize get_dummies? Just looking at the Pandas documentation, I dont know if I can apply to how I am making one hot encoding.

例如,我知道如何制作一个虚拟变量来表示每周时间变量的唯一方法是一种非常笨拙的冗余代码方法.有人可以给我建议如何更好地做到这一点吗?

For example, the only way I know how to make a dummy variable to represent time-of-week variables is a very clunky redundant code method. Can someone give me advise on how to do this better?

#create dummy variables
df['month'] = df.index.month
df['year'] = df.index.year
df['day_of_week'] = df.index.dayofweek
df['hour'] = df.index.strftime('%H').astype('int')

df['hour_0'] = np.where(df['hour'].isin([0]), 1, 0)
df['hour_1'] = np.where(df['hour'].isin([1]), 1, 0)
df['hour_2'] = np.where(df['hour'].isin([2]), 1, 0)
df['hour_3'] = np.where(df['hour'].isin([3]), 1, 0)
df['hour_4'] = np.where(df['hour'].isin([4]), 1, 0)
df['hour_5'] = np.where(df['hour'].isin([5]), 1, 0)
df['hour_6'] = np.where(df['hour'].isin([6]), 1, 0)
df['hour_7'] = np.where(df['hour'].isin([7]), 1, 0)
df['hour_8'] = np.where(df['hour'].isin([8]), 1, 0)
df['hour_9'] = np.where(df['hour'].isin([9]), 1, 0)
df['hour_10'] = np.where(df['hour'].isin([10]), 1, 0)
df['hour_11'] = np.where(df['hour'].isin([11]), 1, 0)
df['hour_12'] = np.where(df['hour'].isin([12]), 1, 0)
df['hour_13'] = np.where(df['hour'].isin([13]), 1, 0)
df['hour_14'] = np.where(df['hour'].isin([14]), 1, 0)
df['hour_15'] = np.where(df['hour'].isin([15]), 1, 0)
df['hour_16'] = np.where(df['hour'].isin([16]), 1, 0)
df['hour_17'] = np.where(df['hour'].isin([17]), 1, 0)
df['hour_18'] = np.where(df['hour'].isin([18]), 1, 0)
df['hour_19'] = np.where(df['hour'].isin([19]), 1, 0)
df['hour_20'] = np.where(df['hour'].isin([20]), 1, 0)
df['hour_21'] = np.where(df['hour'].isin([21]), 1, 0)
df['hour_22'] = np.where(df['hour'].isin([22]), 1, 0)
df['hour_23'] = np.where(df['hour'].isin([23]), 1, 0)

df['monday'] = np.where(df['day_of_week'].isin([0]), 1, 0)
df['tuesday'] = np.where(df['day_of_week'].isin([1]), 1, 0)
df['wednesday'] = np.where(df['day_of_week'].isin([2]), 1, 0)
df['thursday'] = np.where(df['day_of_week'].isin([3]), 1, 0)
df['friday'] = np.where(df['day_of_week'].isin([4]), 1, 0)
df['saturday'] = np.where(df['day_of_week'].isin([5]), 1, 0)
df['sunday'] = np.where(df['day_of_week'].isin([6]), 1, 0)

df['january'] = np.where(df['month'].isin([1]), 1, 0)
df['february'] = np.where(df['month'].isin([2]), 1, 0)
df['march'] = np.where(df['month'].isin([3]), 1, 0)
df['april'] = np.where(df['month'].isin([4]), 1, 0)
df['may'] = np.where(df['month'].isin([5]), 1, 0)
df['june'] = np.where(df['month'].isin([6]), 1, 0)
df['july'] = np.where(df['month'].isin([7]), 1, 0)
df['august'] = np.where(df['month'].isin([8]), 1, 0)
df['september'] = np.where(df['month'].isin([9]), 1, 0)
df['october'] = np.where(df['month'].isin([10]), 1, 0)
df['november'] = np.where(df['month'].isin([11]), 1, 0)
df['december'] = np.where(df['month'].isin([12]), 1, 0)

df['year19'] = np.where(df['year'].isin([2019]), 1, 0)
df['year20'] = np.where(df['year'].isin([2020]), 1, 0)
df['year21'] = np.where(df['year'].isin([2021]), 1, 0)
df['year22'] = np.where(df['year'].isin([2022]), 1, 0)
df['year23'] = np.where(df['year'].isin([2023]), 1, 0)
df['year24'] = np.where(df['year'].isin([2024]), 1, 0)

然后我正在试验 ML 算法的最终数据框将是:

And then my final dataframe which I am experimenting with ML algorithms would be:

df2 = df[['Temperature', 'Value', 
            'hour_0' , 'hour_1' , 'hour_2' , 'hour_3' , 'hour_4' , 'hour_5' , 'hour_6' ,
            'hour_7' , 'hour_8' , 'hour_9' , 'hour_10' , 'hour_11' , 'hour_12' , 'hour_13' , 
            'hour_14' , 'hour_15' , 'hour_16' , 'hour_17' , 'hour_18' , 'hour_19' , 'hour_20' , 
            'hour_21' , 'hour_22' , 'hour_23' , 
            'monday' , 'tuesday' , 'wednesday' , 'thursday' , 'friday' , 'saturday' , 'sunday' , 
            'january' , 'february' , 'march' , 'april' , 'may' , 'june' , 'july' , 'august' , 
            'september' , 'october' , 'november' , 'december' , 
            'year19' , 'year20' , 'year21' , 'year22' , 'year23' , 'year24']]

编辑更新代码尝试

EDIT UPDATED CODE ATTEMPT

import numpy as np
import pandas as pd
np.random.seed(11)

rows,cols = 50000,2
data = np.random.rand(rows,cols) 
tidx = pd.date_range('2019-01-01', periods=rows, freq='H') 
df = pd.DataFrame(data, columns=['Temperature','Value'], index=tidx)

df['hour'] = df.index.strftime('%H').astype('int')
df['day_of_week'] = df.index.dayofweek
df['month'] = df.index.month
df['year'] = df.index.year

hour_dummies = pd.get_dummies(df['hour'], prefix='hour')

day_mapping = {0: 'monday', 1: 'tuesday', 2: 'wednesday', 3: 'thursday', 4: 'friday', 5: 'saturday', 6: 'sunday'}
day_dummies = pd.get_dummies(df['day_of_week'].map(day_mapping))

month_mapping = {0: 'jan', 1: 'feb', 2: 'mar', 3: 'apr', 4: 'may', 5: 'jun', 6: 'jul',
                 7: 'aug', 8: 'sep', 9: 'oct', 10: 'nov', 11: 'dec'}
month_dummies = pd.get_dummies(df['month'].map(month_mapping))

year_mapping = {0: 'year_2019', 1: 'year_2020', 2: 'year_2021', 3: 'year_2022', 4: 'year_2023', 5: 'year_2024'}
year_dummies = pd.get_dummies(df['year'].map(year_mapping))

df = df.join(hour_dummies)
df = df.join(day_dummies)
df = df.join(month_dummies)
df = df.join(year_dummies)

推荐答案

可以从时间索引中提取相应的信息,然后使用pd.get_dummies.例如

You can extract the corresponding information from the time index, then use pd.get_dummies. For example

# day name
day_names = pd.get_dummies(df.index.day_name())

# hours
hours = pd.get_dummies(df.index.hour, prefix='hour')

# months
months = pd.get_dummies(df.index.month_name())

# year
years = pd.get_dummies(df.index.year, prefix='year')

然后concat:

df = pd.concat((df, hours, day_names), axis=1)

这篇关于 pandas 如何在时间序列数据上“get_dummies"的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持跟版网!

本站部分内容来源互联网,如果有图片或者内容侵犯了您的权益,请联系我们,我们会在确认后第一时间进行删除!

相关文档推荐

Seasonal Decomposition of Time Series by Loess with Python(Loess 用 Python 对时间序列进行季节性分解)
Resample a time series with the index of another time series(使用另一个时间序列的索引重新采样一个时间序列)
How can I simply calculate the rolling/moving variance of a time series in python?(如何在 python 中简单地计算时间序列的滚动/移动方差?)
How to use Dynamic Time warping with kNN in python(如何在python中使用动态时间扭曲和kNN)
Keras LSTM: a time-series multi-step multi-features forecasting - poor results(Keras LSTM:时间序列多步多特征预测 - 结果不佳)
Python pandas time series interpolation and regularization(Python pandas 时间序列插值和正则化)