<bdo id='KdnUl'></bdo><ul id='KdnUl'></ul>
<i id='KdnUl'><tr id='KdnUl'><dt id='KdnUl'><q id='KdnUl'><span id='KdnUl'><b id='KdnUl'><form id='KdnUl'><ins id='KdnUl'></ins><ul id='KdnUl'></ul><sub id='KdnUl'></sub></form><legend id='KdnUl'></legend><bdo id='KdnUl'><pre id='KdnUl'><center id='KdnUl'></center></pre></bdo></b><th id='KdnUl'></th></span></q></dt></tr></i><div id='KdnUl'><tfoot id='KdnUl'></tfoot><dl id='KdnUl'><fieldset id='KdnUl'></fieldset></dl></div>

  1. <small id='KdnUl'></small><noframes id='KdnUl'>

    1. <tfoot id='KdnUl'></tfoot>
    2. <legend id='KdnUl'><style id='KdnUl'><dir id='KdnUl'><q id='KdnUl'></q></dir></style></legend>

      分块、处理和在 Pandas/Python 中合并数据集

      Chunking, processing amp; merging dataset in Pandas/Python(分块、处理和在 Pandas/Python 中合并数据集)

      <small id='uPxDQ'></small><noframes id='uPxDQ'>

          <tbody id='uPxDQ'></tbody>
      • <i id='uPxDQ'><tr id='uPxDQ'><dt id='uPxDQ'><q id='uPxDQ'><span id='uPxDQ'><b id='uPxDQ'><form id='uPxDQ'><ins id='uPxDQ'></ins><ul id='uPxDQ'></ul><sub id='uPxDQ'></sub></form><legend id='uPxDQ'></legend><bdo id='uPxDQ'><pre id='uPxDQ'><center id='uPxDQ'></center></pre></bdo></b><th id='uPxDQ'></th></span></q></dt></tr></i><div id='uPxDQ'><tfoot id='uPxDQ'></tfoot><dl id='uPxDQ'><fieldset id='uPxDQ'></fieldset></dl></div>
        <tfoot id='uPxDQ'></tfoot>

        • <legend id='uPxDQ'><style id='uPxDQ'><dir id='uPxDQ'><q id='uPxDQ'></q></dir></style></legend>
              <bdo id='uPxDQ'></bdo><ul id='uPxDQ'></ul>
                本文介绍了分块、处理和在 Pandas/Python 中合并数据集的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着跟版网的小编来一起学习吧!

                问题描述

                有一个大数据集,包含一个字符串.我只想通过 read_fwf 使用宽度打开它,如下所示:

                There is a large dataset, containing a strings. I just want to open it via read_fwf using widths, like this:

                widths = [3, 7, ..., 9, 7]
                tp = pandas.read_fwf(file, widths=widths, header=None)
                

                这将有助于我标记数据,但系统崩溃(适用于 nrows=20000).然后我决定按块(例如 20000 行)来做,像这样:

                It would help me to mark the data, But the system crashes (works with nrows=20000). Then I decided to do it by chunk (e.g. 20000 rows), like this:

                cs = 20000
                for chunk in pd.read_fwf(file, widths=widths, header=None, chunksize=ch)
                ...:  <some code using chunk>
                

                我的问题是:在对块进行一些处理(标记行、删除或修改列)之后,我应该如何在循环中将块合并(连接?)回到 .csv 文件中?还是有别的办法?

                My question is: what should I do in a loop to merge (concatenate?) the chunks back in a .csv file after some processing of chunk (marking the row, dropping or modyfiing the column)? Or there is another way?

                推荐答案

                我会假设自从阅读了整个文件

                I'm going to assume that since reading the entire file

                tp = pandas.read_fwf(file, widths=widths, header=None)
                

                失败,但分块读取有效,文件太大而无法一次读取,并且您遇到了 MemoryError.

                fails but reading in chunks works, that the file is too big to be read at once and that you encountered a MemoryError.

                在这种情况下,如果您可以分块处理数据,然后将结果连接到 CSV,您可以使用 chunk.to_csv 将 CSV 写入块:

                In that case, if you can process the data in chunks, then to concatenate the results in a CSV, you could use chunk.to_csv to write the CSV in chunks:

                filename = ...
                for chunk in pd.read_fwf(file, widths=widths, header=None, chunksize=ch)
                    # process the chunk
                    chunk.to_csv(filename, mode='a')
                

                注意 mode='a' 以追加模式打开文件,这样每个chunk.to_csv 调用被附加到同一个文件中.

                Note that mode='a' opens the file in append mode, so that the output of each chunk.to_csv call is appended to the same file.

                这篇关于分块、处理和在 Pandas/Python 中合并数据集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持跟版网!

                本站部分内容来源互联网,如果有图片或者内容侵犯了您的权益,请联系我们,我们会在确认后第一时间进行删除!

                相关文档推荐

                Adding config modes to Plotly.Py offline - modebar(将配置模式添加到 Plotly.Py 离线 - 模式栏)
                Plotly: How to style a plotly figure so that it doesn#39;t display gaps for missing dates?(Plotly:如何设置绘图图形的样式,使其不显示缺失日期的间隙?)
                python save plotly plot to local file and insert into html(python将绘图保存到本地文件并插入到html中)
                Plotly: What color cycle does plotly express follow?(情节:情节表达遵循什么颜色循环?)
                How to save plotly express plot into a html or static image file?(如何将情节表达图保存到 html 或静态图像文件中?)
                Plotly: How to make a line plot from a pandas dataframe with a long or wide format?(Plotly:如何使用长格式或宽格式的 pandas 数据框制作线图?)

                <legend id='tLCYi'><style id='tLCYi'><dir id='tLCYi'><q id='tLCYi'></q></dir></style></legend>
                <tfoot id='tLCYi'></tfoot>

                    <small id='tLCYi'></small><noframes id='tLCYi'>

                      <tbody id='tLCYi'></tbody>
                  • <i id='tLCYi'><tr id='tLCYi'><dt id='tLCYi'><q id='tLCYi'><span id='tLCYi'><b id='tLCYi'><form id='tLCYi'><ins id='tLCYi'></ins><ul id='tLCYi'></ul><sub id='tLCYi'></sub></form><legend id='tLCYi'></legend><bdo id='tLCYi'><pre id='tLCYi'><center id='tLCYi'></center></pre></bdo></b><th id='tLCYi'></th></span></q></dt></tr></i><div id='tLCYi'><tfoot id='tLCYi'></tfoot><dl id='tLCYi'><fieldset id='tLCYi'></fieldset></dl></div>
                          <bdo id='tLCYi'></bdo><ul id='tLCYi'></ul>