<i id='MHkRh'><tr id='MHkRh'><dt id='MHkRh'><q id='MHkRh'><span id='MHkRh'><b id='MHkRh'><form id='MHkRh'><ins id='MHkRh'></ins><ul id='MHkRh'></ul><sub id='MHkRh'></sub></form><legend id='MHkRh'></legend><bdo id='MHkRh'><pre id='MHkRh'><center id='MHkRh'></center></pre></bdo></b><th id='MHkRh'></th></span></q></dt></tr></i><div id='MHkRh'><tfoot id='MHkRh'></tfoot><dl id='MHkRh'><fieldset id='MHkRh'></fieldset></dl></div>
  • <legend id='MHkRh'><style id='MHkRh'><dir id='MHkRh'><q id='MHkRh'></q></dir></style></legend>
  • <tfoot id='MHkRh'></tfoot>

      • <bdo id='MHkRh'></bdo><ul id='MHkRh'></ul>

        <small id='MHkRh'></small><noframes id='MHkRh'>

        python中的字符串比较但不是Levenshtein距离(我认为)

        string comparison in python but not Levenshtein distance (I think)(python中的字符串比较但不是Levenshtein距离(我认为))
        <tfoot id='nqS2F'></tfoot>
        <i id='nqS2F'><tr id='nqS2F'><dt id='nqS2F'><q id='nqS2F'><span id='nqS2F'><b id='nqS2F'><form id='nqS2F'><ins id='nqS2F'></ins><ul id='nqS2F'></ul><sub id='nqS2F'></sub></form><legend id='nqS2F'></legend><bdo id='nqS2F'><pre id='nqS2F'><center id='nqS2F'></center></pre></bdo></b><th id='nqS2F'></th></span></q></dt></tr></i><div id='nqS2F'><tfoot id='nqS2F'></tfoot><dl id='nqS2F'><fieldset id='nqS2F'></fieldset></dl></div>

        <legend id='nqS2F'><style id='nqS2F'><dir id='nqS2F'><q id='nqS2F'></q></dir></style></legend>

        <small id='nqS2F'></small><noframes id='nqS2F'>

          <tbody id='nqS2F'></tbody>
          <bdo id='nqS2F'></bdo><ul id='nqS2F'></ul>

                  本文介绍了python中的字符串比较但不是Levenshtein距离(我认为)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着跟版网的小编来一起学习吧!

                  问题描述

                  我在我正在阅读的一篇论文中发现了一个粗略的字符串比较,如下所示:

                  I found a crude string comparison in a paper I am reading done as follows:

                  他们使用的方程式如下(摘自论文,稍作改动以使其更通用和可读)由于作者的描述不是很清楚(使用作者的例子),我试图用我自己的话解释更多

                  The equation they use is as follows (extracted from the paper with small word changes to make it more general and readable) I have tried to explain a bit more in my own words since the description by the author is not very clear (using an example by the author)

                  例如对于 2 个序列 ABCDE 和 BCEFA,有两个可能的图

                  For example for 2 sequences ABCDE and BCEFA, there are two possible graphs

                  图 1) 连接 B 与 B C 与 C 和 E 与 E

                  graph 1) which connects B with B C with C and E with E

                  图 2) 连接 A 和 A

                  graph 2) connects A with A

                  当我连接其他三个(图 1)时,我无法将 A 与 A 连接起来,因为那将是交叉线(假设您在 B-B、C-C 和 E-E 之间画线);也就是说,A-A 上墨的线将穿过连接 B-B、C-C 和 E-E 的线.所以这两个序列产生了两个可能的图;一个有 3 个连接(BB、CC 和 EE),另一个只有一个(AA),然后我按照以下等式计算得分 d.

                  I cannot connect A with A when I am connecting the other three (graph 1) since that would be crossing lines (imagine you draw lines between B-B, C-C and E-E); that is the line inking A-A will cross the lines linking B-B, C-C and E-E. So these two sequences result in 2 possible graphs; one has 3 connections (BB, CC and EE) and the other only one (AA) then I calculate the score d as given by the equation below.

                  因此,定义两个之间的相似程度五弦我们计算它们之间的距离d.对齐两个五弦,我们寻找它们之间的所有身份字符,无论它们位于何处.如果每个身份都是由两个五弦之间的链接表示,我们定义了一个图对于这一对.我们将此图的任何部分称为配置.

                  Consequently, to define the degree of similarity between two penta-strings we calculate the distance d between them. Aligning the two penta-strings, we look for all the identities between their characters, wherever these may be located. If each identity is represented by a link between both penta-strings, we define a graph for this pair. We call any part of this graph a configuration.

                  接下来,我们保留所有没有字符的配置交叉配对(含义在我上面的示例中进行了解释,即相同字符之间没有交叉链接,只保留那些图形).然后将这些中的每一个作为与图形相关的字符数 p,位移 Δi 为对应对和连接字符之间的间隙δij每个五弦.最小值被选为特征和称为距离d:d Min(50 – 10p + ΣΔi + Σδij) 虽然很粗略,该措施通常与定性观察非常吻合引导估计.例如 abcdeabcfg 之间的距离是 20,而 abcdeabfcg 之间是 23 =(50 – 30 + 1 +2).

                  Next, we retain all of those configurations in which there is no character cross pairing (the meaning is explained in my example above, i.e., no crossings of links between identical characters and only those graphs are retained). Each of these is then evaluated as a function of the number p of characters related to the graph, the shifting Δi for the corresponding pairs and the gap δij between connected characters of each penta-string. The minimum value is chosen as characteristic and is called distance d: d Min(50 – 10p + ΣΔi + Σδij) Although very rough, this measure is generally in good agreement with the qualitative eye guided estimation. For instance, the distance between abcde and abcfg is 20, whereas that between abcde and abfcg is 23 =(50 – 30 + 1 +2).

                  我对如何去做这件事感到困惑.任何可以帮助我的建议将不胜感激.

                  I am confused as to how to go about doing this. Any suggestions to help me would be much appreciated.

                  我尝试了 Levenshtein 以及用于蛋白质序列比较的简单序列比对论文的链接是:http://peds.oxfordjournals.org/content/16/2/103.长

                  I tried the Levenshtein and also simple sequence alignment as used in protein sequence comparison The link to the paper is: http://peds.oxfordjournals.org/content/16/2/103.long

                  我找不到关于第一作者 Alain Figureau 的任何信息以及我给 MA Soto 的电子邮件尚未得到答复(截至今天).

                  I could not find any information on the first author, Alain Figureau and my emails to MA Soto have not been answered (as of today).

                  谢谢

                  推荐答案

                  在你引用的文本块之后,有对同一作者以前的论文的参考:蛋白质的二级结构和三维模式识别.如果没有对距离的解释,我认为值得研究一下(我不在工作,所以我无法访问完整的文档).

                  Just after the text block you cite, there is a reference to a previous paper from the same authors : Secondary Structure of Proteins and Three-dimensional Pattern Recognition. I think it is worth to look into it if there is no explanantion of the distance (I'm not at work so I haven't the access to the full document).

                  否则,您也可以尝试直接联系作者:Alain Figureau 似乎是一位老派的法国研究员,没有任何联系(没有网页,没有电子邮件,没有社交网络",..)所以我建议尝试联系 MA Soto,他的电子邮件在论文末尾给出.我想他们会给你你正在寻找的答案:实验的过程必须非常清楚才能重复,这是研究中科学方法的一部分.

                  Otherwise, you can also try to contact directly the authors : Alain Figureau seems to be an old-school French researcher with no contact whatsoever (no webpage, no e-mail, no "social networking",..) so I advise to try contacting M.A. Soto , whose e-mail is given at the end of the paper. I think they will give you the answer you're looking for : the experiment's procedure has to be crystal clear in order to be repeatable, it's part of the scientific method in research.

                  这篇关于python中的字符串比较但不是Levenshtein距离(我认为)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持跟版网!

                  本站部分内容来源互联网,如果有图片或者内容侵犯了您的权益,请联系我们,我们会在确认后第一时间进行删除!

                  相关文档推荐

                  What happens when you compare 2 pandas Series(当你比较 2 个 pandas 系列时会发生什么)
                  Quickly find differences between two large text files(快速查找两个大文本文件之间的差异)
                  Python - Compare 2 files and output differences(Python - 比较 2 个文件和输出差异)
                  Why do comparisions between very large float values fail in python?(为什么在 python 中非常大的浮点值之间的比较会失败?)
                  Dictionary merge by updating but not overwriting if value exists(字典通过更新合并,但如果值存在则不覆盖)
                  Find entries of one text file in another file in python(在python中的另一个文件中查找一个文本文件的条目)
                      <tbody id='de8kB'></tbody>
                    • <tfoot id='de8kB'></tfoot>

                        <small id='de8kB'></small><noframes id='de8kB'>

                            <bdo id='de8kB'></bdo><ul id='de8kB'></ul>

                          • <i id='de8kB'><tr id='de8kB'><dt id='de8kB'><q id='de8kB'><span id='de8kB'><b id='de8kB'><form id='de8kB'><ins id='de8kB'></ins><ul id='de8kB'></ul><sub id='de8kB'></sub></form><legend id='de8kB'></legend><bdo id='de8kB'><pre id='de8kB'><center id='de8kB'></center></pre></bdo></b><th id='de8kB'></th></span></q></dt></tr></i><div id='de8kB'><tfoot id='de8kB'></tfoot><dl id='de8kB'><fieldset id='de8kB'></fieldset></dl></div>
                          • <legend id='de8kB'><style id='de8kB'><dir id='de8kB'><q id='de8kB'></q></dir></style></legend>