<bdo id='3Jx3E'></bdo><ul id='3Jx3E'></ul>

  • <legend id='3Jx3E'><style id='3Jx3E'><dir id='3Jx3E'><q id='3Jx3E'></q></dir></style></legend><tfoot id='3Jx3E'></tfoot>

      <small id='3Jx3E'></small><noframes id='3Jx3E'>

        <i id='3Jx3E'><tr id='3Jx3E'><dt id='3Jx3E'><q id='3Jx3E'><span id='3Jx3E'><b id='3Jx3E'><form id='3Jx3E'><ins id='3Jx3E'></ins><ul id='3Jx3E'></ul><sub id='3Jx3E'></sub></form><legend id='3Jx3E'></legend><bdo id='3Jx3E'><pre id='3Jx3E'><center id='3Jx3E'></center></pre></bdo></b><th id='3Jx3E'></th></span></q></dt></tr></i><div id='3Jx3E'><tfoot id='3Jx3E'></tfoot><dl id='3Jx3E'><fieldset id='3Jx3E'></fieldset></dl></div>
      1. Visual Studio 2013 Update 2 和 Update 3 生成的 SSE 4 指令

        SSE 4 instructions generated by Visual Studio 2013 Update 2 and Update 3(Visual Studio 2013 Update 2 和 Update 3 生成的 SSE 4 指令)

          <tfoot id='gLwt6'></tfoot>

          <small id='gLwt6'></small><noframes id='gLwt6'>

              <bdo id='gLwt6'></bdo><ul id='gLwt6'></ul>
                <tbody id='gLwt6'></tbody>

                <i id='gLwt6'><tr id='gLwt6'><dt id='gLwt6'><q id='gLwt6'><span id='gLwt6'><b id='gLwt6'><form id='gLwt6'><ins id='gLwt6'></ins><ul id='gLwt6'></ul><sub id='gLwt6'></sub></form><legend id='gLwt6'></legend><bdo id='gLwt6'><pre id='gLwt6'><center id='gLwt6'></center></pre></bdo></b><th id='gLwt6'></th></span></q></dt></tr></i><div id='gLwt6'><tfoot id='gLwt6'></tfoot><dl id='gLwt6'><fieldset id='gLwt6'></fieldset></dl></div>

                  <legend id='gLwt6'><style id='gLwt6'><dir id='gLwt6'><q id='gLwt6'></q></dir></style></legend>

                • 本文介绍了Visual Studio 2013 Update 2 和 Update 3 生成的 SSE 4 指令的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着跟版网的小编来一起学习吧!

                  问题描述

                  如果我在 VS 2013 Update 2 或 Update 3 中编译此代码:(以下来自 Update 3)

                  If I compile this code in VS 2013 Update 2 or Update 3: (below comes from Update 3)

                  #include "stdafx.h"
                  #include <iostream>
                  #include <random>
                  
                  struct Buffer
                  {
                    long* data;
                    int   count;
                  };
                  
                  #ifndef max
                  #define max(a,b)            (((a) > (b)) ? (a) : (b))
                  #endif
                  
                  long Code(long* data, int count)
                  {
                    long nMaxY = data[0];
                  
                    for (int nNode = 0; nNode < count; nNode++)
                    {
                      nMaxY = max(data[nNode], nMaxY);
                    }
                  
                    return(nMaxY);
                  }
                  
                  int _tmain(int argc, _TCHAR* argv[])
                  {
                  #ifdef __AVX__
                    static_assert(false, "AVX should be disabled");
                  #endif
                  #ifdef __AVX2__
                    static_assert(false, "AVX2 should be disabled");
                  #endif
                    static_assert(_M_IX86_FP == 2, "SSE2 instructions should be enabled");
                    Buffer buff;
                    std::mt19937 engine;
                    engine.seed(std::random_device{}());
                    std::uniform_int_distribution<int> distribution(0, 100);
                  
                    buff.count = 1;
                    buff.data = new long[1];
                    buff.data[0] = distribution(engine);
                  
                    long result = Code(buff.data, buff.count);
                    std::cout << result; // ensure result is used
                    return result;
                  }
                  

                  启用 SSE2 指令但未启用 AVX/AVX2,发行版中的编译器生成:

                  with SSE2 instructions enabled, but not AVX/AVX2, the compiler in release generates:

                    {
                      nMaxY = max(data[nNode], nMaxY);
                  010612E1  movdqu      xmm0,xmmword ptr [eax]  
                  010612E5  add         esi,8  
                  010612E8  lea         eax,[eax+20h]  
                  010612EB  pmaxsd      xmm1,xmm0  
                  010612F0  movdqu      xmm0,xmmword ptr [eax-10h]  
                  010612F5  pmaxsd      xmm2,xmm0  
                  010612FA  cmp         esi,ebx  
                  010612FC  jl          Code+41h (010612E1h)  
                  010612FE  pmaxsd      xmm1,xmm2  
                  01061303  movdqa      xmm0,xmm1  
                  01061307  psrldq      xmm0,8  
                  0106130C  pmaxsd      xmm1,xmm0  
                  01061311  movdqa      xmm0,xmm1  
                  01061315  psrldq      xmm0,4  
                  0106131A  pmaxsd      xmm1,xmm0  
                  0106131F  movd        eax,xmm1  
                  01061323  pop         ebx  
                    long nMaxY = data[0];
                  

                  其中包含 pmaxsd 指令.

                  pmaxsd 指令是 SSE4_1 指令 或 AVX 指令, 不是 SSE2 指令.

                  pmaxsd instructions are SSE4_1 instructions or AVX instructions as far as I can tell, not SSE2 instructions.

                  Intel core2s 支持 sse3,但不支持 sse4,不支持 pmaxsd.

                  Intel core2s support sse3, but not sse4, and not pmaxsd.

                  在 VS2013 更新 1 或更新 0 中不会发生这种情况.

                  This does not occur in VS2013 update 1 or update 0.

                  有没有办法让 Visual Studio 生成 SSE2 指令而不是像 pmaxsd 这样的 SSE4 指令?这是 Visual Studio 更新 2/3 中的已知错误吗?有解决方法吗?Visual Studio 是否不再支持 Core2 处理器?

                  Is there a way to get Visual Studio to generate SSE2 instructions but not SSE4 instructions like pmaxsd? Is this a known bug in Visual Studio update 2/3? Is there a workaround? Does Visual Studio no longer support Core2 processors?

                  这是上述代码的更复杂版本,它编译(在默认发布设置下)为使 Core2 CPU 崩溃的代码:

                  Here is a more complex version of the above code that compiles (under default release settings) to code that crashes a Core2 CPU:

                  #include "stdafx.h"
                  #include <iostream>
                  #include <random>
                  #include <array>
                  
                  enum unused_name {
                    _nNumPolygons = 10,
                  };
                  
                  
                  #ifndef max
                  #define max(a,b)            (((a) > (b)) ? (a) : (b))
                  #endif
                  
                  struct Buffer
                  {
                    std::array<long*, _nNumPolygons> data;
                    std::array<int, _nNumPolygons>   count;
                  };
                  
                  long Code(Buffer* buff)
                  {
                    long  nMaxY = buff->data[0][0];
                  
                  
                    for (int nPoly = 0; nPoly < _nNumPolygons; nPoly++)
                    {
                      for (int nNode = 0; nNode < buff->count[nPoly]; nNode++)
                      {
                        nMaxY = max(buff->data[nPoly][nNode], nMaxY);
                      }
                    }
                  
                    return(nMaxY);
                  }
                  
                  extern "C" __int32 __isa_available;
                  
                  int _tmain(int argc, _TCHAR* argv[])
                  {
                  #ifdef __AVX__
                    static_assert(false, "AVX should be disabled");
                  #endif
                  #ifdef __AVX2__
                    static_assert(false, "AVX2 should be disabled");
                  #endif
                  #if !( defined( _M_AMD64 ) || defined( _M_X64 ) )
                    static_assert(_M_IX86_FP == 2, "SSE2 instructions should be enabled");
                  #endif
                    // __isa_available = 1; // to force code to act as if SSE4_2 is not available
                    Buffer buff;
                    std::mt19937 engine;
                    engine.seed(std::random_device{}());
                    std::uniform_int_distribution<int> distribution(0, 100);
                  
                    for (int i = 0; i < _nNumPolygons; ++i) {
                      buff.count[i] = 10;
                      buff.data[i] = new long[10];
                      for (int k = 0; k < 10; ++k)
                      {
                        buff.data[i][k] = distribution(engine);
                      }
                    }
                  
                    long result = Code(&buff);
                    std::cout << result; // ensure result is used
                    return result;
                  }
                  

                  这是一个指向此问题的错误的链接,其他人在我大约在同一时间打开发布了这个问题.

                  Here is a link to a bug for this issue that someone else opened around the same time I posted this question.

                  这是生成的.asm:

                  ?Code2@@YAJPAUBuffer@@@Z PROC        ; Code2, COMDAT
                  ; _buff$ = ecx
                  ; File c:usersadam.nevraumont.corelcorp.000documentsvisual studio 2013projectsconsoleapplication1consoleapplication1consoleapplication1.cpp
                  ; Line 22
                    push  ebp
                    mov  ebp, esp
                    sub  esp, 12          ; 0000000cH
                    push  ebx
                    push  esi
                    push  edi
                    mov  edi, ecx
                  ; Line 26
                    xor  ebx, ebx
                    mov  DWORD PTR _buff$1$[ebp], edi
                    mov  DWORD PTR _nPoly$1$[ebp], ebx
                    mov  eax, DWORD PTR [edi]
                    mov  edx, DWORD PTR [eax]
                  ; Line 28
                    movd  xmm0, edx
                    pshufd  xmm1, xmm0, 0
                    movdqa  xmm2, xmm1
                    npad  12
                  $LL6@Code2:
                    lea  ecx, DWORD PTR [ebx*4]
                    xor  eax, eax
                    mov  esi, DWORD PTR [ecx+edi+40]
                    mov  DWORD PTR tv443[ebp], ecx
                    test  esi, esi
                    jle  SHORT $LN5@Code2
                    cmp  esi, 8
                    jb  SHORT $LN25@Code2
                    cmp  DWORD PTR ___isa_available, 2
                    jl  SHORT $LN25@Code2
                  ; Line 26
                    mov  ebx, DWORD PTR [ecx+edi]
                    mov  ecx, esi
                    and  ecx, -2147483641      ; 80000007H
                    jns  SHORT $LN33@Code2
                    dec  ecx
                    or  ecx, -8          ; fffffff8H
                    inc  ecx
                  $LN33@Code2:
                    mov  edi, esi
                    sub  edi, ecx
                    npad  8
                  $LL3@Code2:
                  ; Line 30
                    movdqu  xmm0, XMMWORD PTR [ebx+eax*4]
                    pmaxsd  xmm1, xmm0
                    movdqu  xmm0, XMMWORD PTR [ebx+eax*4+16]
                    add  eax, 8
                    pmaxsd  xmm2, xmm0
                    cmp  eax, edi
                    jl  SHORT $LL3@Code2
                    mov  ebx, DWORD PTR _nPoly$1$[ebp]
                    mov  ecx, DWORD PTR tv443[ebp]
                    mov  edi, DWORD PTR _buff$1$[ebp]
                  $LN25@Code2:
                  ; Line 28
                    cmp  eax, esi
                    jge  SHORT $LN5@Code2
                  ; Line 26
                    mov  edi, DWORD PTR [ecx+edi]
                    npad  4
                  $LL23@Code2:
                  ; Line 30
                    cmp  DWORD PTR [edi+eax*4], edx
                    cmovg  edx, DWORD PTR [edi+eax*4]
                    inc  eax
                    cmp  eax, esi
                    jl  SHORT $LL23@Code2
                  $LN5@Code2:
                  ; Line 26
                    mov  edi, DWORD PTR _buff$1$[ebp]
                    inc  ebx
                    mov  DWORD PTR _nPoly$1$[ebp], ebx
                    cmp  ebx, 10          ; 0000000aH
                    jl  $LL6@Code2
                  ; Line 28
                    movd  xmm0, edx
                    pshufd  xmm0, xmm0, 0
                    pmaxsd  xmm1, xmm0
                    pmaxsd  xmm1, xmm2
                    movdqa  xmm0, xmm1
                    psrldq  xmm0, 8
                    pmaxsd  xmm1, xmm0
                    movdqa  xmm0, xmm1
                    pop  edi
                    psrldq  xmm0, 4
                    pmaxsd  xmm1, xmm0
                    pop  esi
                    movd  eax, xmm1
                    pop  ebx
                  ; Line 35
                    mov  esp, ebp
                    pop  ebp
                    ret  0
                  

                  这里:

                    cmp  esi, 8
                    jb  SHORT $LN25@Code2
                    cmp  DWORD PTR ___isa_available, 2
                    jl  SHORT $LN25@Code2
                  

                  如果 (A) 循环长度小于 8,或者 (B) 我们没有 SSE3/SSE4 支持,我们将测试分支到单步"版本.

                  we have the test that branches to the "single step" version if either (A) the loop is less than 8 long, or (B) we don't have SSE3/SSE4 support.

                  单步版本是:

                  $LN5@Code2:
                  ; Line 26
                    mov  edi, DWORD PTR _buff$1$[ebp]
                    inc  ebx
                    mov  DWORD PTR _nPoly$1$[ebp], ebx
                    cmp  ebx, 10          ; 0000000aH
                    jl  $LL6@Code2
                  

                  没有 SSE 指令.然而,重要的部分是失败.如果eax(迭代参数)通过10,它会落入:

                  which has no SSE instructions. However, the important part is the fall through. If eax (the iteration parameter) passes 10, it falls through into:

                  ; Line 28
                    movd  xmm0, edx
                    pshufd  xmm0, xmm0, 0
                    pmaxsd  xmm1, xmm0
                  

                  这是找出单步版本结果和 SSE4 结果的最大值的代码.第三条指令是pmaxsd,是一条SSE4_1指令,不受__isa_available保护.

                  which is code that finds the max of both the single step version results and the SSE4 results. The 3rd instruction is pmaxsd, which is an SSE4_1 instruction, and it is not guarded by __isa_available.

                  是否有编译器设置或解决方法可以保持自动矢量化不变,同时不在启用 Core2 SSE2 的计算机上调用 SSE4_1 指令?我的代码中是否存在导致这种情况发生的错误?

                  Is there a compiler setting or workaround that can leave the auto-vectorization intact, while not invoking SSE4_1 instructions on Core2 SSE2 enabled computers? Is there a bug in my code that is causing this to happen?

                  请注意,我试图消除循环的双重嵌套性质似乎使问题消失了.

                  Note that my attempts to remove the double-nested nature of the loop seem to make the problem go away.

                  推荐答案

                  这是 记录在案的行为:

                  如果您的计算机支持,Auto-Vectorizer 还使用更新的 SSE4.2 指令集.

                  The Auto-Vectorizer also uses the newer, SSE4.2 instruction set if your computer supports it.

                  如果您仔细查看编译器生成的代码,您会发现 SSE4.2 指令的使用依赖于运行时测试:

                  If you look closer at the code the compiler generates you'll see that the use of the SSE4.2 instructions is dependent on a runtime test:

                  cmp DWORD PTR ___isa_available, 2
                  jl  SHORT $LN11@Code
                  

                  此处的值 2 显然是指 SSE4.2.

                  但是,我能够确认您的第二个示例中的错误.事实证明,我使用的 Core 2 PC 支持 SSE4.1 和 PMAXSD 指令,因此我不得不在配备 Pentium 4 CPU 的 PC 上对其进行测试以获取非法指令异常.您应该向 Microsoft Connect 提交错误报告.请务必提及您的示例代码失败的特定 Core 2 CPU 型号.

                  I was however able to confirm the bug in your second example. It turns out the Core 2 PC I was using supports SSE4.1 and the PMAXSD instruction, so I had to test it in on a PC with a Pentium 4 CPU to get the illegal instruction exception. You should submit a bug report to Microsoft Connect. Be sure to mention the specific Core 2 CPU model your example code fails on.

                  至于解决方法,我只能建议更改受影响函数的优化级别.从优化速度切换到优化大小似乎生成的代码与仅用于 SSE2 指令的代码大致相同.您可以使用 #pragma optimize 像这样切换优化级别:

                  As for a workaround I can only suggest changing the optimization level for the affected function. Switching from optimizing for speed to optimizing for size seems to generate much the same code as would be used with only SSE2 instructions. You can use #pragma optimize to switch the optimization level like this:

                  #pragma optimize("s", on)
                  
                  long Code(Buffer* buff)
                  {
                       ...
                  }
                  
                  #pragma optimize("", on)
                  

                  作为 记录在此错误报告中,/d2Qvec-sse2only 是一个未记录的标志,适用于更新 3(可能还有更新 2)以防止编译器输出SSE4 指令.这自然可以防止某些循环被矢量化./d2Qvec-sse2only 可能会在任何时候停止工作(可能会在未来版本的 VC 上发生变化,恕不另行通知").

                  As documented on this bug report, /d2Qvec-sse2only is an undocumented flag that works on update 3 (and possibly update 2) to prevent the compiler from outputing SSE4 instructions. This can prevent some loops from being vectorized, naturally. /d2Qvec-sse2only may cease to work at any point (it is "subject to future change without notice"), possibly on future versions of VC.

                  Microsoft 声称此问题已在 Update 4 和 Update 4 CTP 2(不用于生产用途)中修复.

                  Microsoft claims that this problem is fixed in Update 4, and in the Update 4 CTP 2 (not for production use).

                  这篇关于Visual Studio 2013 Update 2 和 Update 3 生成的 SSE 4 指令的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持跟版网!

                  本站部分内容来源互联网,如果有图片或者内容侵犯了您的权益,请联系我们,我们会在确认后第一时间进行删除!

                  相关文档推荐

                  Constructor initialization Vs assignment(构造函数初始化 Vs 赋值)
                  Is a `=default` move constructor equivalent to a member-wise move constructor?(`=default` 移动构造函数是否等同于成员移动构造函数?)
                  Has the new C++11 member initialization feature at declaration made initialization lists obsolete?(声明时新的 C++11 成员初始化功能是否使初始化列表过时了?)
                  Order of constructor call in virtual inheritance(虚继承中构造函数调用的顺序)
                  How to use sfinae for selecting constructors?(如何使用 sfinae 选择构造函数?)
                  Initializing a union with a non-trivial constructor(使用非平凡的构造函数初始化联合)

                  <small id='0izVz'></small><noframes id='0izVz'>

                  • <tfoot id='0izVz'></tfoot>

                      <tbody id='0izVz'></tbody>

                      • <bdo id='0izVz'></bdo><ul id='0izVz'></ul>

                            <i id='0izVz'><tr id='0izVz'><dt id='0izVz'><q id='0izVz'><span id='0izVz'><b id='0izVz'><form id='0izVz'><ins id='0izVz'></ins><ul id='0izVz'></ul><sub id='0izVz'></sub></form><legend id='0izVz'></legend><bdo id='0izVz'><pre id='0izVz'><center id='0izVz'></center></pre></bdo></b><th id='0izVz'></th></span></q></dt></tr></i><div id='0izVz'><tfoot id='0izVz'></tfoot><dl id='0izVz'><fieldset id='0izVz'></fieldset></dl></div>

                          1. <legend id='0izVz'><style id='0izVz'><dir id='0izVz'><q id='0izVz'></q></dir></style></legend>