NumPy 或 Pandas:将数组类型保持为整数,同时具有 NaN 值

NumPy or Pandas: Keeping array type as integer while having a NaN value(NumPy 或 Pandas:将数组类型保持为整数,同时具有 NaN 值)
本文介绍了NumPy 或 Pandas:将数组类型保持为整数,同时具有 NaN 值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着跟版网的小编来一起学习吧!

问题描述

是否有一种首选方法可以将 numpy 数组的数据类型固定为 int (或 int64 或其他),同时仍然里面有一个元素列为 numpy.NaN?

Is there a preferred way to keep the data type of a numpy array fixed as int (or int64 or whatever), while still having an element inside listed as numpy.NaN?

特别是,我正在将内部数据结构转换为 Pandas DataFrame.在我们的结构中,我们有仍然有 NaN 的整数类型列(但列的 dtype 是 int).如果我们将其设为 DataFrame,似乎会将所有内容重铸为浮点数,但我们真的很想成为 int.

In particular, I am converting an in-house data structure to a Pandas DataFrame. In our structure, we have integer-type columns that still have NaN's (but the dtype of the column is int). It seems to recast everything as a float if we make this a DataFrame, but we'd really like to be int.

想法?

尝试过的事情:

我尝试使用 pandas.DataFrame 下的 from_records() 函数和 coerce_float=False 但这没有帮助.我还尝试使用 NumPy 掩码数组和 NaN fill_value,这也不起作用.所有这些都导致列数据类型变为浮点数.

I tried using the from_records() function under pandas.DataFrame, with coerce_float=False and this did not help. I also tried using NumPy masked arrays, with NaN fill_value, which also did not work. All of these caused the column data type to become a float.

推荐答案

此功能已添加到 pandas(从 0.24 版本开始):https://pandas.pydata.org/pandas-docs/version/0.24/whatsnew/v0.24.0.html#optional-integer-na-support

This capability has been added to pandas (beginning with version 0.24): https://pandas.pydata.org/pandas-docs/version/0.24/whatsnew/v0.24.0.html#optional-integer-na-support

此时,它需要使用扩展dtype Int64(大写),而不是默认dtype int64(小写).

At this point, it requires the use of extension dtype Int64 (capitalized), rather than the default dtype int64 (lowercase).

这篇关于NumPy 或 Pandas:将数组类型保持为整数,同时具有 NaN 值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持跟版网!

本站部分内容来源互联网,如果有图片或者内容侵犯了您的权益,请联系我们,我们会在确认后第一时间进行删除!

相关文档推荐

Multiprocessing on Windows breaks(Windows 上的多处理中断)
How to use a generator as an iterable with Multiprocessing map function(如何将生成器用作具有多处理映射功能的可迭代对象)
read multiple files using multiprocessing(使用多处理读取多个文件)
Why does importing module in #39;__main__#39; not allow multiprocessig to use module?(为什么在__main__中导入模块不允许multiprocessig使用模块?)
Trouble using a lock with multiprocessing.Pool: pickling error(使用带有 multiprocessing.Pool 的锁时遇到问题:酸洗错误)
Python sharing a dictionary between parallel processes(Python 在并行进程之间共享字典)