将 pandas 列从字符串 Quarters 和 Years 数组转换为 datetime 列,其中列内有混合格式

Converting a pandas column from an array of string Quarters and Years to a datetime column where there is mixed formatting within the column(将 pandas 列从字符串 Quarters 和 Years 数组转换为 datetime 列,其中列内有混合格式) - IT屋-程序员软
本文介绍了将 pandas 列从字符串 Quarters 和 Years 数组转换为 datetime 列,其中列内有混合格式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着跟版网的小编来一起学习吧!

问题描述

这是我之前提出的问题的延伸.

This is an extension of an earlier question i had.

将 Pandas 列从字符串 Quarters 和 Years 数组转换为日期时间列

我有一个这样的数据框,其中日期混乱.

I have a dataframe like this where the dates are jumbled up.

我想将它们转换为日期时间对象.

I want to convert them to datetime objects.

所以 3Q '11 会变成 2011-09-30Q1 '20 将变为 2020-03-31

So 3Q '11 would become 2011-09-30 Q1 '20 would become 2020-03-31

Date    Data
3Q '11  11.12
4Q '11  15.43
1Q '12  11.8
2Q '12  17
1Q '13  19.5
2Q '13  14.62
3Q '13  14.1
4Q '13  26
1Q '14  16.4
2Q '14  13.3
3Q '14  12.3
4Q '14  21.4
1Q '15  12.6
2Q '15  11
3Q '15  9.9
4Q '15  16.1
1Q '16  10.3
Q2 '16  10
Q3 '16  9.3
Q4 '16  13.1
Q1 '17  8.9
Q2 '17  11.4
Q3 '17  10.3
Q4 '17  13.2
Q1 '18  9.1
Q2 '18  11.6
Q3 '18  9.7
Q4 '18  12.9
Q1 '19  9.9
Q2 '19  12.3
Q3 '19  11.8
Q4 '19  15.9
Q1 '20  6.9
Q2 '20  12.4
Q3 '20  13.9

如果行全部匹配,我有以下公式来处理不同的数据帧,其中每行包含 Q 后跟数字或数字后跟 Q,

I have the following formula to handle the different dataframes if the rows all match where either every row contains Q followed by a number or a number followed by a Q,

if df['Date'][0].startswith('Q') == True:
    df['Date'] = df['Date'].str.replace(" ","").str.split("'")
    df['Date'] = (pd.to_datetime("20"+df['Date'].str[::-1].str.join('')) + pd.offsets.QuarterEnd(0))
else:
    df['Date'] = df['Date'].str.replace("'","20").str.split(" ")
    df['Date'] = pd.to_datetime(df['Date'].str.join('')) + pd.offsets.QuarterEnd(0)

但是,在这种情况下,数据框有两种数据,其中日期在同一帧中同时写为 Q3 或 3Q,我如何在应用其中之一之前对数据进行规范化?

However, in this case, the dataframe has both kinds of data where the dates are written written as both Q3 or 3Q within the same frame, how do i normalise the data before applying one of these?

推荐答案

你可以使用Series.replace 以获得正确的周期顺序,然后应用解决方案转换为日期时间:

You can use Series.replace for correct order of periods and then apply solution for convert to datetimes:

df = pd.DataFrame({'Date': ["3Q '11", "4Q '11", "1Q '12", "2Q '12", "1Q '13",
                            "Q2 '19", "Q3 '19", "Q4 '19", "Q1 '20"], 
                   'Data': [11.12, 15.43, 11.8, 17.0, 19.5, 12.3, 11.8, 15.9, 6.9]})
print (df)
     Date   Data
0  3Q '11  11.12
1  4Q '11  15.43
2  1Q '12  11.80
3  2Q '12  17.00
4  1Q '13  19.50
5  Q2 '19  12.30
6  Q3 '19  11.80
7  Q4 '19  15.90
8  Q1 '20   6.90


df['Date'] = df['Date'].replace(r"^(d+)([Q])D*(d+)$", r'20321', regex=True)
df['Date'] = df['Date'].replace(r"^([Q]d+)D*(d+)$", r'2021', regex=True)


print (df)
     Date   Data
0  2011Q3  11.12
1  2011Q4  15.43
2  2012Q1  11.80
3  2012Q2  17.00
4  2013Q1  19.50
5  2019Q2  12.30
6  2019Q3  11.80
7  2019Q4  15.90
8  2020Q1   6.90

另一个想法是使用索引:

Another idea is use indexing:

m =  df['Date'].str.startswith('Q')
df['Date'] = ('20' + df['Date'].str[-2:] + df['Date'].str[:2]
                  .where(m, df['Date'].str[1] + df['Date'].str[0]))
print (df)
     Date   Data
0  2011Q3  11.12
1  2011Q4  15.43
2  2012Q1  11.80
3  2012Q2  17.00
4  2013Q1  19.50
5  2019Q2  12.30
6  2019Q3  11.80
7  2019Q4  15.90
8  2020Q1   6.90

    

这篇关于将 pandas 列从字符串 Quarters 和 Years 数组转换为 datetime 列,其中列内有混合格式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持跟版网!

本站部分内容来源互联网,如果有图片或者内容侵犯了您的权益,请联系我们,我们会在确认后第一时间进行删除!

相关文档推荐

Seasonal Decomposition of Time Series by Loess with Python(Loess 用 Python 对时间序列进行季节性分解)
Resample a time series with the index of another time series(使用另一个时间序列的索引重新采样一个时间序列)
How can I simply calculate the rolling/moving variance of a time series in python?(如何在 python 中简单地计算时间序列的滚动/移动方差?)
How to use Dynamic Time warping with kNN in python(如何在python中使用动态时间扭曲和kNN)
Keras LSTM: a time-series multi-step multi-features forecasting - poor results(Keras LSTM:时间序列多步多特征预测 - 结果不佳)
Python pandas time series interpolation and regularization(Python pandas 时间序列插值和正则化)