<small id='64SZo'></small><noframes id='64SZo'>

<tfoot id='64SZo'></tfoot>
<legend id='64SZo'><style id='64SZo'><dir id='64SZo'><q id='64SZo'></q></dir></style></legend>
  1. <i id='64SZo'><tr id='64SZo'><dt id='64SZo'><q id='64SZo'><span id='64SZo'><b id='64SZo'><form id='64SZo'><ins id='64SZo'></ins><ul id='64SZo'></ul><sub id='64SZo'></sub></form><legend id='64SZo'></legend><bdo id='64SZo'><pre id='64SZo'><center id='64SZo'></center></pre></bdo></b><th id='64SZo'></th></span></q></dt></tr></i><div id='64SZo'><tfoot id='64SZo'></tfoot><dl id='64SZo'><fieldset id='64SZo'></fieldset></dl></div>

      • <bdo id='64SZo'></bdo><ul id='64SZo'></ul>

      使用预定义列表获取 pandas 列中匹配单词的计数

      Get count of matching word in string of pandas column with a predefined list(使用预定义列表获取 pandas 列中匹配单词的计数)

            1. <i id='Se1Xl'><tr id='Se1Xl'><dt id='Se1Xl'><q id='Se1Xl'><span id='Se1Xl'><b id='Se1Xl'><form id='Se1Xl'><ins id='Se1Xl'></ins><ul id='Se1Xl'></ul><sub id='Se1Xl'></sub></form><legend id='Se1Xl'></legend><bdo id='Se1Xl'><pre id='Se1Xl'><center id='Se1Xl'></center></pre></bdo></b><th id='Se1Xl'></th></span></q></dt></tr></i><div id='Se1Xl'><tfoot id='Se1Xl'></tfoot><dl id='Se1Xl'><fieldset id='Se1Xl'></fieldset></dl></div>
              <legend id='Se1Xl'><style id='Se1Xl'><dir id='Se1Xl'><q id='Se1Xl'></q></dir></style></legend>

              • <bdo id='Se1Xl'></bdo><ul id='Se1Xl'></ul>
                <tfoot id='Se1Xl'></tfoot>

                <small id='Se1Xl'></small><noframes id='Se1Xl'>

                  <tbody id='Se1Xl'></tbody>
                本文介绍了使用预定义列表获取 pandas 列中匹配单词的计数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着跟版网的小编来一起学习吧!

                问题描述

                我有一个 DataFrame 包含 indextext 列.

                I have a DataFrame contains index and text columns.

                例如:

                index | text
                1     | "I have a pen, but I lost it today."
                2     | "I have pineapple and pen, but I lost it today."
                

                现在我有一个很长的列表,我想将 text 中的每个单词与列表进行匹配.

                Now I have a long list, and I want to match each of the words in text with the list.

                假设:

                long_list = ['pen', 'pineapple']
                

                我想创建一个 FunctionTransformer 来匹配 long_list 中的单词与列值的每个单词,如果匹配,则返回计数.

                I would want to create a FunctionTransformer to match words in the long_list with each word of the column value, if there is a match, return the count.

                index | text                                             | count
                1     | "I have a pen, but I lost it today."             | 1
                2     | "I have pineapple and pen, but I lost it today." | 2
                

                我是这样做的:

                def count_words(df):
                    long_list = ['pen', 'pineapple']
                    count = 0
                    for c in df['tweet_text']:
                        if c in long_list:
                            count = count + 1
                            
                    df['count'] = count   
                    return df
                
                count_word = FunctionTransformer(count_words, validate=False)
                

                我如何开发其他 FunctionTransformer 的示例如下:

                An example of how I develop my other FunctionTransformer will be:

                def convert_twitter_datetime(df):
                    df['hour'] = pd.to_datetime(df['created_at'], format='%a %b %d %H:%M:%S +0000 %Y').dt.strftime('%H').astype(int)
                    return df
                
                convert_datetime = FunctionTransformer(convert_twitter_datetime, validate=False)
                

                推荐答案

                灵感来自@Quang Hoang 的回答

                Inspired by @Quang Hoang's answer

                import pandas as pd
                import sklearn as sk
                
                y=['pen', 'pineapple']
                
                def count_strings(X, y):
                    pattern = r'{}'.format('|'.join(y))
                    return X['text'].str.count(pattern)
                
                string_transformer = sk.preprocessing.FunctionTransformer(count_strings, kw_args={'y': y})
                df['count'] = string_transformer.fit_transform(X=df)
                

                结果

                    text                                              count
                1   "I have a pen, but I lost it today."                1
                2   "I have pineapple and pen, but I lost it today.     2
                

                对于下面的df2:

                #df2
                      text
                1     "I have a pen, but I lost it today. pen pen"
                2     "I have pineapple and pen, but I lost it today."
                

                我们得到

                string_transformer.transform(X=df2)
                #result
                1    3
                2    2
                Name: text, dtype: int64
                

                这表明,我们将函数转换为 sklearn 样式的对象.为了进一步抽象这一点,我们可以将列名作为关键字参数传递给 count_strings.

                This shows, that we converted the function to an sklearn-style object. To abstact this even further we can hand over the column name as key-word argument to count_strings.

                这篇关于使用预定义列表获取 pandas 列中匹配单词的计数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持跟版网!

                本站部分内容来源互联网,如果有图片或者内容侵犯了您的权益,请联系我们,我们会在确认后第一时间进行删除!

                相关文档推荐

                Adding config modes to Plotly.Py offline - modebar(将配置模式添加到 Plotly.Py 离线 - 模式栏)
                Plotly: How to style a plotly figure so that it doesn#39;t display gaps for missing dates?(Plotly:如何设置绘图图形的样式,使其不显示缺失日期的间隙?)
                python save plotly plot to local file and insert into html(python将绘图保存到本地文件并插入到html中)
                Plotly: What color cycle does plotly express follow?(情节:情节表达遵循什么颜色循环?)
                How to save plotly express plot into a html or static image file?(如何将情节表达图保存到 html 或静态图像文件中?)
                Plotly: How to make a line plot from a pandas dataframe with a long or wide format?(Plotly:如何使用长格式或宽格式的 pandas 数据框制作线图?)

                  <tbody id='lZ9IT'></tbody>

                    <small id='lZ9IT'></small><noframes id='lZ9IT'>

                  • <i id='lZ9IT'><tr id='lZ9IT'><dt id='lZ9IT'><q id='lZ9IT'><span id='lZ9IT'><b id='lZ9IT'><form id='lZ9IT'><ins id='lZ9IT'></ins><ul id='lZ9IT'></ul><sub id='lZ9IT'></sub></form><legend id='lZ9IT'></legend><bdo id='lZ9IT'><pre id='lZ9IT'><center id='lZ9IT'></center></pre></bdo></b><th id='lZ9IT'></th></span></q></dt></tr></i><div id='lZ9IT'><tfoot id='lZ9IT'></tfoot><dl id='lZ9IT'><fieldset id='lZ9IT'></fieldset></dl></div>
                    1. <tfoot id='lZ9IT'></tfoot>
                        <legend id='lZ9IT'><style id='lZ9IT'><dir id='lZ9IT'><q id='lZ9IT'></q></dir></style></legend>
                          <bdo id='lZ9IT'></bdo><ul id='lZ9IT'></ul>