Python pandas 时间序列插值和正则化

Python pandas time series interpolation and regularization(Python pandas 时间序列插值和正则化)
本文介绍了Python pandas 时间序列插值和正则化的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着跟版网的小编来一起学习吧!

问题描述

我是第一次使用 Python Pandas.我有 csv 格式的 5 分钟滞后流量数据:

I am using Python Pandas for the first time. I have 5-min lag traffic data in csv format:

...
2015-01-04 08:29:05,271238
2015-01-04 08:34:05,329285
2015-01-04 08:39:05,-1
2015-01-04 08:44:05,260260
2015-01-04 08:49:05,263711
...

有几个问题:

  • 对于某些时间戳,缺少数据 (-1)
  • 缺少条目(也是连续 2/3 小时)
  • 观察的频率并不完全是 5 分钟,但实际上偶尔会损失几秒钟

我想获得一个定期的时间序列,因此每(正好)5 分钟输入一次(并且没有缺失值).我已经成功地使用以下代码对时间序列进行了插值,以使用此代码逼近 -1 值:

I would like to obtain a regular time series, so with entries every (exactly) 5 minutes (and no missing valus). I have successfully interpolated the time series with the following code to approximate the -1 values with this code:

ts = pd.TimeSeries(values, index=timestamps)
ts.interpolate(method='cubic', downcast='infer')

如何对观察的频率进行插值和正则化?谢谢大家的帮助.

How can I both interpolate and regularize the frequency of the observations? Thank you all for the help.

推荐答案

-1s 改成 NaNs:

Change the -1s to NaNs:

ts[ts==-1] = np.nan

然后对数据进行重新采样,使其具有 5 分钟的频率.

Then resample the data to have a 5 minute frequency.

ts = ts.resample('5T')

请注意,默认情况下,如果两个测量值在同一个 5 分钟内,resample 会将这些值一起平均.

Note that, by default, if two measurements fall within the same 5 minute period, resample averages the values together.

最后,您可以根据时间对时间序列进行线性插值:

Finally, you could linearly interpolate the time series according to the time:

ts = ts.interpolate(method='time')

<小时>

由于您的数据看起来已经具有大约 5 分钟的频率,因此您可能需要以较短的频率重新采样,因此三次或样条插值可以平滑曲线:


Since it looks like your data already has roughly a 5-minute frequency, you might need to resample at a shorter frequency so cubic or spline interpolation can smooth out the curve:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

values = [271238, 329285, -1, 260260, 263711]
timestamps = pd.to_datetime(['2015-01-04 08:29:05',
                             '2015-01-04 08:34:05',
                             '2015-01-04 08:39:05',
                             '2015-01-04 08:44:05',
                             '2015-01-04 08:49:05'])

ts = pd.Series(values, index=timestamps)
ts[ts==-1] = np.nan
ts = ts.resample('T').mean()

ts.interpolate(method='spline', order=3).plot()
ts.interpolate(method='time').plot()
lines, labels = plt.gca().get_legend_handles_labels()
labels = ['spline', 'time']
plt.legend(lines, labels, loc='best')
plt.show()

这篇关于Python pandas 时间序列插值和正则化的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持跟版网!

本站部分内容来源互联网,如果有图片或者内容侵犯了您的权益,请联系我们,我们会在确认后第一时间进行删除!

相关文档推荐

Seasonal Decomposition of Time Series by Loess with Python(Loess 用 Python 对时间序列进行季节性分解)
Resample a time series with the index of another time series(使用另一个时间序列的索引重新采样一个时间序列)
How can I simply calculate the rolling/moving variance of a time series in python?(如何在 python 中简单地计算时间序列的滚动/移动方差?)
How to use Dynamic Time warping with kNN in python(如何在python中使用动态时间扭曲和kNN)
Keras LSTM: a time-series multi-step multi-features forecasting - poor results(Keras LSTM:时间序列多步多特征预测 - 结果不佳)
Compute a compounded return series in Python(在 Python 中计算复合回报序列)