<small id='TNvik'></small><noframes id='TNvik'>

  • <legend id='TNvik'><style id='TNvik'><dir id='TNvik'><q id='TNvik'></q></dir></style></legend>

  • <tfoot id='TNvik'></tfoot>
  • <i id='TNvik'><tr id='TNvik'><dt id='TNvik'><q id='TNvik'><span id='TNvik'><b id='TNvik'><form id='TNvik'><ins id='TNvik'></ins><ul id='TNvik'></ul><sub id='TNvik'></sub></form><legend id='TNvik'></legend><bdo id='TNvik'><pre id='TNvik'><center id='TNvik'></center></pre></bdo></b><th id='TNvik'></th></span></q></dt></tr></i><div id='TNvik'><tfoot id='TNvik'></tfoot><dl id='TNvik'><fieldset id='TNvik'></fieldset></dl></div>

          <bdo id='TNvik'></bdo><ul id='TNvik'></ul>

        如何将 pandas DataFrame 中的列取消嵌套(分解)成多行

        How to unnest (explode) a column in a pandas DataFrame, into multiple rows(如何将 pandas DataFrame 中的列取消嵌套(分解)成多行)
          <tbody id='25s5X'></tbody>

      1. <legend id='25s5X'><style id='25s5X'><dir id='25s5X'><q id='25s5X'></q></dir></style></legend>

      2. <small id='25s5X'></small><noframes id='25s5X'>

            <bdo id='25s5X'></bdo><ul id='25s5X'></ul>
            <tfoot id='25s5X'></tfoot>

              • <i id='25s5X'><tr id='25s5X'><dt id='25s5X'><q id='25s5X'><span id='25s5X'><b id='25s5X'><form id='25s5X'><ins id='25s5X'></ins><ul id='25s5X'></ul><sub id='25s5X'></sub></form><legend id='25s5X'></legend><bdo id='25s5X'><pre id='25s5X'><center id='25s5X'></center></pre></bdo></b><th id='25s5X'></th></span></q></dt></tr></i><div id='25s5X'><tfoot id='25s5X'></tfoot><dl id='25s5X'><fieldset id='25s5X'></fieldset></dl></div>
                  本文介绍了如何将 pandas DataFrame 中的列取消嵌套(分解)成多行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着跟版网的小编来一起学习吧!

                  问题描述

                  I have the following DataFrame where one of the columns is an object (list type cell):

                  df=pd.DataFrame({'A':[1,2],'B':[[1,2],[1,2]]})
                  df
                  Out[458]: 
                     A       B
                  0  1  [1, 2]
                  1  2  [1, 2]
                  

                  My expected output is:

                     A  B
                  0  1  1
                  1  1  2
                  3  2  1
                  4  2  2
                  

                  What should I do to achieve this?


                  Related question

                  pandas: When cell contents are lists, create a row for each element in the list

                  Good question and answer but only handle one column with list(In my answer the self-def function will work for multiple columns, also the accepted answer is use the most time consuming apply , which is not recommended, check more info When should I ever want to use pandas apply() in my code?)

                  解决方案

                  I know object dtype columns makes the data hard to convert with pandas functions. When I receive data like this, the first thing that came to mind was to "flatten" or unnest the columns.

                  I am using pandas and Python functions for this type of question. If you are worried about the speed of the above solutions, check out user3483203's answer, since it's using numpy and most of the time numpy is faster. I recommend Cython or numba if speed matters.


                  Method 0 [pandas >= 0.25] Starting from pandas 0.25, if you only need to explode one column, you can use the pandas.DataFrame.explode function:

                  df.explode('B')
                  
                         A  B
                      0  1  1
                      1  1  2
                      0  2  1
                      1  2  2
                  

                  Given a dataframe with an empty list or a NaN in the column. An empty list will not cause an issue, but a NaN will need to be filled with a list

                  df = pd.DataFrame({'A': [1, 2, 3, 4],'B': [[1, 2], [1, 2], [], np.nan]})
                  df.B = df.B.fillna({i: [] for i in df.index})  # replace NaN with []
                  df.explode('B')
                  
                     A    B
                  0  1    1
                  0  1    2
                  1  2    1
                  1  2    2
                  2  3  NaN
                  3  4  NaN
                  


                  Method 1 apply + pd.Series (easy to understand but in terms of performance not recommended . )

                  df.set_index('A').B.apply(pd.Series).stack().reset_index(level=0).rename(columns={0:'B'})
                  Out[463]:
                     A  B
                  0  1  1
                  1  1  2
                  0  2  1
                  1  2  2
                  


                  Method 2 Using repeat with DataFrame constructor , re-create your dataframe (good at performance, not good at multiple columns )

                  df=pd.DataFrame({'A':df.A.repeat(df.B.str.len()),'B':np.concatenate(df.B.values)})
                  df
                  Out[465]:
                     A  B
                  0  1  1
                  0  1  2
                  1  2  1
                  1  2  2
                  

                  Method 2.1 for example besides A we have A.1 .....A.n. If we still use the method(Method 2) above it is hard for us to re-create the columns one by one .

                  Solution : join or merge with the index after 'unnest' the single columns

                  s=pd.DataFrame({'B':np.concatenate(df.B.values)},index=df.index.repeat(df.B.str.len()))
                  s.join(df.drop('B',1),how='left')
                  Out[477]:
                     B  A
                  0  1  1
                  0  2  1
                  1  1  2
                  1  2  2
                  

                  If you need the column order exactly the same as before, add reindex at the end.

                  s.join(df.drop('B',1),how='left').reindex(columns=df.columns)
                  


                  Method 3 recreate the list

                  pd.DataFrame([[x] + [z] for x, y in df.values for z in y],columns=df.columns)
                  Out[488]:
                     A  B
                  0  1  1
                  1  1  2
                  2  2  1
                  3  2  2
                  

                  If more than two columns, use

                  s=pd.DataFrame([[x] + [z] for x, y in zip(df.index,df.B) for z in y])
                  s.merge(df,left_on=0,right_index=True)
                  Out[491]:
                     0  1  A       B
                  0  0  1  1  [1, 2]
                  1  0  2  1  [1, 2]
                  2  1  1  2  [1, 2]
                  3  1  2  2  [1, 2]
                  


                  Method 4 using reindex or loc

                  df.reindex(df.index.repeat(df.B.str.len())).assign(B=np.concatenate(df.B.values))
                  Out[554]:
                     A  B
                  0  1  1
                  0  1  2
                  1  2  1
                  1  2  2
                  
                  #df.loc[df.index.repeat(df.B.str.len())].assign(B=np.concatenate(df.B.values))
                  


                  Method 5 when the list only contains unique values:

                  df=pd.DataFrame({'A':[1,2],'B':[[1,2],[3,4]]})
                  from collections import ChainMap
                  d = dict(ChainMap(*map(dict.fromkeys, df['B'], df['A'])))
                  pd.DataFrame(list(d.items()),columns=df.columns[::-1])
                  Out[574]:
                     B  A
                  0  1  1
                  1  2  1
                  2  3  2
                  3  4  2
                  


                  Method 6 using numpy for high performance:

                  newvalues=np.dstack((np.repeat(df.A.values,list(map(len,df.B.values))),np.concatenate(df.B.values)))
                  pd.DataFrame(data=newvalues[0],columns=df.columns)
                     A  B
                  0  1  1
                  1  1  2
                  2  2  1
                  3  2  2
                  


                  Method 7 using base function itertools cycle and chain: Pure python solution just for fun

                  from itertools import cycle,chain
                  l=df.values.tolist()
                  l1=[list(zip([x[0]], cycle(x[1])) if len([x[0]]) > len(x[1]) else list(zip(cycle([x[0]]), x[1]))) for x in l]
                  pd.DataFrame(list(chain.from_iterable(l1)),columns=df.columns)
                     A  B
                  0  1  1
                  1  1  2
                  2  2  1
                  3  2  2
                  


                  Generalizing to multiple columns

                  df=pd.DataFrame({'A':[1,2],'B':[[1,2],[3,4]],'C':[[1,2],[3,4]]})
                  df
                  Out[592]:
                     A       B       C
                  0  1  [1, 2]  [1, 2]
                  1  2  [3, 4]  [3, 4]
                  

                  Self-def function:

                  def unnesting(df, explode):
                      idx = df.index.repeat(df[explode[0]].str.len())
                      df1 = pd.concat([
                          pd.DataFrame({x: np.concatenate(df[x].values)}) for x in explode], axis=1)
                      df1.index = idx
                  
                      return df1.join(df.drop(explode, 1), how='left')
                  
                  
                  unnesting(df,['B','C'])
                  Out[609]:
                     B  C  A
                  0  1  1  1
                  0  2  2  1
                  1  3  3  2
                  1  4  4  2
                  


                  Column-wise Unnesting

                  All above method is talking about the vertical unnesting and explode , If you do need expend the list horizontal, Check with pd.DataFrame constructor

                  df.join(pd.DataFrame(df.B.tolist(),index=df.index).add_prefix('B_'))
                  Out[33]:
                     A       B       C  B_0  B_1
                  0  1  [1, 2]  [1, 2]    1    2
                  1  2  [3, 4]  [3, 4]    3    4
                  

                  Updated function

                  def unnesting(df, explode, axis):
                      if axis==1:
                          idx = df.index.repeat(df[explode[0]].str.len())
                          df1 = pd.concat([
                              pd.DataFrame({x: np.concatenate(df[x].values)}) for x in explode], axis=1)
                          df1.index = idx
                  
                          return df1.join(df.drop(explode, 1), how='left')
                      else :
                          df1 = pd.concat([
                                           pd.DataFrame(df[x].tolist(), index=df.index).add_prefix(x) for x in explode], axis=1)
                          return df1.join(df.drop(explode, 1), how='left')
                  

                  Test Output

                  unnesting(df, ['B','C'], axis=0)
                  Out[36]:
                     B0  B1  C0  C1  A
                  0   1   2   1   2  1
                  1   3   4   3   4  2
                  

                  Update 2021-02-17 with original explode function

                  def unnesting(df, explode, axis):
                      if axis==1:
                          df1 = pd.concat([df[x].explode() for x in explode], axis=1)
                          return df1.join(df.drop(explode, 1), how='left')
                      else :
                          df1 = pd.concat([
                                           pd.DataFrame(df[x].tolist(), index=df.index).add_prefix(x) for x in explode], axis=1)
                          return df1.join(df.drop(explode, 1), how='left')
                  

                  这篇关于如何将 pandas DataFrame 中的列取消嵌套(分解)成多行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持跟版网!

                  本站部分内容来源互联网,如果有图片或者内容侵犯了您的权益,请联系我们,我们会在确认后第一时间进行删除!

                  相关文档推荐

                  Split a Pandas column of lists into multiple columns(将 Pandas 的列表列拆分为多列)
                  How does the @property decorator work in Python?(@property 装饰器在 Python 中是如何工作的?)
                  What is the difference between old style and new style classes in Python?(Python中的旧样式类和新样式类有什么区别?)
                  How to break out of multiple loops?(如何打破多个循环?)
                  How to put the legend out of the plot(如何将传说从情节中剔除)
                  Why is the output of my function printing out quot;Nonequot;?(为什么我的函数输出打印出“无?)

                    <bdo id='jtdtq'></bdo><ul id='jtdtq'></ul>
                    1. <tfoot id='jtdtq'></tfoot><legend id='jtdtq'><style id='jtdtq'><dir id='jtdtq'><q id='jtdtq'></q></dir></style></legend>

                      <small id='jtdtq'></small><noframes id='jtdtq'>

                    2. <i id='jtdtq'><tr id='jtdtq'><dt id='jtdtq'><q id='jtdtq'><span id='jtdtq'><b id='jtdtq'><form id='jtdtq'><ins id='jtdtq'></ins><ul id='jtdtq'></ul><sub id='jtdtq'></sub></form><legend id='jtdtq'></legend><bdo id='jtdtq'><pre id='jtdtq'><center id='jtdtq'></center></pre></bdo></b><th id='jtdtq'></th></span></q></dt></tr></i><div id='jtdtq'><tfoot id='jtdtq'></tfoot><dl id='jtdtq'><fieldset id='jtdtq'></fieldset></dl></div>

                              <tbody id='jtdtq'></tbody>