<i id='fDhHC'><tr id='fDhHC'><dt id='fDhHC'><q id='fDhHC'><span id='fDhHC'><b id='fDhHC'><form id='fDhHC'><ins id='fDhHC'></ins><ul id='fDhHC'></ul><sub id='fDhHC'></sub></form><legend id='fDhHC'></legend><bdo id='fDhHC'><pre id='fDhHC'><center id='fDhHC'></center></pre></bdo></b><th id='fDhHC'></th></span></q></dt></tr></i><div id='fDhHC'><tfoot id='fDhHC'></tfoot><dl id='fDhHC'><fieldset id='fDhHC'></fieldset></dl></div>
      1. <small id='fDhHC'></small><noframes id='fDhHC'>

          <bdo id='fDhHC'></bdo><ul id='fDhHC'></ul>

      2. <tfoot id='fDhHC'></tfoot>
        <legend id='fDhHC'><style id='fDhHC'><dir id='fDhHC'><q id='fDhHC'></q></dir></style></legend>

        逐行比较两个不同长度的数据帧,并为每行添加具有相等值的列

        Comparing two dataframes of different length row by row and adding columns for each row with equal value(逐行比较两个不同长度的数据帧,并为每行添加具有相等值的列)
      3. <legend id='srORN'><style id='srORN'><dir id='srORN'><q id='srORN'></q></dir></style></legend>
          <bdo id='srORN'></bdo><ul id='srORN'></ul>
          1. <i id='srORN'><tr id='srORN'><dt id='srORN'><q id='srORN'><span id='srORN'><b id='srORN'><form id='srORN'><ins id='srORN'></ins><ul id='srORN'></ul><sub id='srORN'></sub></form><legend id='srORN'></legend><bdo id='srORN'><pre id='srORN'><center id='srORN'></center></pre></bdo></b><th id='srORN'></th></span></q></dt></tr></i><div id='srORN'><tfoot id='srORN'></tfoot><dl id='srORN'><fieldset id='srORN'></fieldset></dl></div>
            <tfoot id='srORN'></tfoot>

            • <small id='srORN'></small><noframes id='srORN'>

                    <tbody id='srORN'></tbody>
                  本文介绍了逐行比较两个不同长度的数据帧,并为每行添加具有相等值的列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着跟版网的小编来一起学习吧!

                  问题描述

                  我在 python pandas 中有两个不同长度的数据框,如下所示:

                  I have two dataframes of different length in python pandas like this:

                  df1:                                 df2:
                  
                        Column1  Column2 Column3            ColumnA ColumnB 
                      0    1       a       r              0    1       a
                      1    2       b       u              1    1       d
                      2    3       c       k              2    1       e
                      3    4       d       j              3    2       r
                      4    5       e       f              4    2       w
                                                          5    3       y 
                                                          6    3       h
                  

                  我现在要做的是比较 df1 的 Column1 和 df2 的 ColumnA.对于每个命中",其中 df2 中的 ColumnA 中的一行与 df1 中 Column1 中的一行具有相同的值,我想将一列附加到 df1,其中 df2 的 ColumnB 对命中"的行具有,所以我的结果如下所示:

                  What I am trying to do now is comparing Column1 of df1 and ColumnA of df2. For each "hit", where a row in ColumnA in df2 has the same value as a row in Column1 in df1, I want to append a column to df1 with the vaule ColumnB of df2 has for the row where the "hit" was found, so that my result looks like this:

                  df1:
                  
                     Column1  Column2  Column3  Column4 Column5  Column6
                  0     1        a        r        a       d        e
                  1     2        b        u        r       w
                  2     3        c        k        y       h
                  3     4        d        j
                  4     5        e        f
                  

                  到目前为止我尝试过的是:

                  What I have tried so far was:

                  for row in df1, df2:
                     if df1[Column1] == df2[ColumnA]:
                        print 'yey!'
                  

                  这给了我一个错误,说我无法比较两个不同长度的数据帧.所以我尝试了:

                  which gave me an error saying I could not compare two dataframes of different length. So I tried:

                  for row in df1, df2:
                      if def2[def2['ColumnA'].isin(def1['column1'])]:
                          print 'lalala' 
                      else:
                          print 'Nope'
                  

                  就我获得输出而言,哪个有效",但我认为它不会遍历行并比较它们,因为它只打印 'lalala' 两次.于是我又研究了一番,找到了一种遍历数据框每一行的方法,即:

                  Which "works" in terms that I get an output, but I do not think it iterates over the rows and compares them, since it only prints 'lalala' two times. So I researched some more and found a way to iterate over each row of the dataframe, which is:

                  for index, row in df1.iterrows():
                      print row['Column1]
                  

                  但我不知道如何使用它来比较两个数据框的列并获得我想要的输出.

                  But I do not know how to use this to compare the columns of the two dataframes and get the output I desire.

                  非常感谢任何有关如何执行此操作的帮助.

                  Any help on how to do this would be really appreciated.

                  推荐答案

                  我推荐你使用DataFrame API,它允许在加入合并groupby 等.您可以在下面找到我的解决方案:

                  I recommend you to use DataFrame API which allows to operate with DF in terms of join, merge, groupby, etc. You can find my solution below:

                  import pandas as pd
                  
                  df1 = pd.DataFrame({'Column1': [1,2,3,4,5], 
                      'Column2': ['a','b','c','d','e'], 
                      'Column3': ['r','u','k','j','f']})
                  
                  df2 = pd.DataFrame({'Column1': [1,1,1,2,2,3,3], 'ColumnB': ['a','d','e','r','w','y','h']})
                  
                  dfs = pd.DataFrame({})
                  for name, group in df2.groupby('Column1'):
                      buffer_df = pd.DataFrame({'Column1': group['Column1'][:1]})
                      i = 0
                      for index, value in group['ColumnB'].iteritems():
                          i += 1
                          string = 'Column_' + str(i)
                          buffer_df[string] = value
                  
                      dfs = dfs.append(buffer_df)
                  
                  result = pd.merge(df1, dfs, how='left', on='Column1')
                  print(result)
                  

                  结果是:

                     Column1 Column2 Column3 Column_0 Column_1 Column_2
                  0        1       a       r        a        d        e
                  1        2       b       u        r        w      NaN
                  2        3       c       k        y        h      NaN
                  3        4       d       j      NaN      NaN      NaN
                  4        5       e       f      NaN      NaN      NaN
                  

                  附:更多详情:

                  1) 对于 df2,我通过Column1"生成 groups.单个 group 是一个数据框.示例如下:

                  1) for df2 I produce groups by 'Column1'. The single group is a data frame. Example below:

                     Column1 ColumnB
                  0        1       a
                  1        1       d
                  2        1       e
                  

                  2) 对于每个 group 我生成数据帧 buffer_df:

                  2) for each group I produce data frame buffer_df:

                     Column1 Column_0 Column_1 Column_2
                  0        1        a        d        e
                  

                  3) 之后我创建 DF dfs:

                  3) after that I create DF dfs:

                     Column1 Column_0 Column_1 Column_2
                  0        1        a        d        e
                  3        2        r        w      NaN
                  5        3        y        h      NaN
                  

                  4) 最后我为 df1dfs 执行左连接以获得所需的结果.

                  4) in the end I execute left join for df1 and dfs obtaining needed result.

                  2)* buffer_df 是迭代产生的:

                  2)* buffer_df is produced iteratively:

                  step0 (buffer_df = pd.DataFrame({'Column1': group['Column1'][:1]})):
                              Column1
                           5       3
                  
                  step1 (buffer_df['Column_0'] = group['ColumnB'][5]):      
                              Column1 Column_0
                           5       3       y
                  
                  step2 (buffer_df['Column_1'] = group['ColumnB'][5]):      
                              Column1 Column_0 Column_1
                           5       3       y       h
                  

                  这篇关于逐行比较两个不同长度的数据帧,并为每行添加具有相等值的列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持跟版网!

                  本站部分内容来源互联网,如果有图片或者内容侵犯了您的权益,请联系我们,我们会在确认后第一时间进行删除!

                  相关文档推荐

                  What happens when you compare 2 pandas Series(当你比较 2 个 pandas 系列时会发生什么)
                  Quickly find differences between two large text files(快速查找两个大文本文件之间的差异)
                  Python - Compare 2 files and output differences(Python - 比较 2 个文件和输出差异)
                  Why do comparisions between very large float values fail in python?(为什么在 python 中非常大的浮点值之间的比较会失败?)
                  Dictionary merge by updating but not overwriting if value exists(字典通过更新合并,但如果值存在则不覆盖)
                  Find entries of one text file in another file in python(在python中的另一个文件中查找一个文本文件的条目)
                1. <small id='ahSbL'></small><noframes id='ahSbL'>

                  <i id='ahSbL'><tr id='ahSbL'><dt id='ahSbL'><q id='ahSbL'><span id='ahSbL'><b id='ahSbL'><form id='ahSbL'><ins id='ahSbL'></ins><ul id='ahSbL'></ul><sub id='ahSbL'></sub></form><legend id='ahSbL'></legend><bdo id='ahSbL'><pre id='ahSbL'><center id='ahSbL'></center></pre></bdo></b><th id='ahSbL'></th></span></q></dt></tr></i><div id='ahSbL'><tfoot id='ahSbL'></tfoot><dl id='ahSbL'><fieldset id='ahSbL'></fieldset></dl></div>
                    • <bdo id='ahSbL'></bdo><ul id='ahSbL'></ul>
                      • <tfoot id='ahSbL'></tfoot>

                          <legend id='ahSbL'><style id='ahSbL'><dir id='ahSbL'><q id='ahSbL'></q></dir></style></legend>

                              <tbody id='ahSbL'></tbody>