就性能而言,使用 std::memcpy() 或 std::copy() 更好吗?

Is it better to use std::memcpy() or std::copy() in terms to performance?(就性能而言,使用 std::memcpy() 或 std::copy() 更好吗?)
本文介绍了就性能而言,使用 std::memcpy() 或 std::copy() 更好吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着跟版网的小编来一起学习吧!


就性能而言,是使用 memcpy 更好,还是使用 std::copy() 更好?为什么?

Is it better to use memcpy as shown below or is it better to use std::copy() in terms to performance? Why?

char *bits = NULL;

bits = new (std::nothrow) char[((int *) copyMe->bits)[0]];
if (bits == NULL)
    cout << "ERROR Not enough memory.

memcpy (bits, copyMe->bits, ((int *) copyMe->bits)[0]);


我将与这里的普遍观点背道而驰,即 std::copy 会有轻微的、几乎察觉不到的性能损失.我刚刚做了一个测试,发现这是不正确的:我确实注意到了性能差异.然而,获胜者是std::copy.

I'm going to go against the general wisdom here that std::copy will have a slight, almost imperceptible performance loss. I just did a test and found that to be untrue: I did notice a performance difference. However, the winner was std::copy.

我写了一个 C++ SHA-2 实现.在我的测试中,我使用所有四个 SHA-2 版本(224、256、384、512)对 5 个字符串进行哈希处理,并循环了 300 次.我使用 Boost.timer 测量时间.300 循环计数器足以完全稳定我的结果.我每次都运行了 5 次测试,在 memcpy 版本和 std::copy 版本之间交替.我的代码利用尽可能大的块获取数据(许多其他实现使用 char/char * 操作,而我使用 T>/T *(其中 T 是用户实现中具有正确溢出行为的最大类型),因此对最大类型的快速内存访问对于我的算法的性能.这些是我的结果:

I wrote a C++ SHA-2 implementation. In my test, I hash 5 strings using all four SHA-2 versions (224, 256, 384, 512), and I loop 300 times. I measure times using Boost.timer. That 300 loop counter is enough to completely stabilize my results. I ran the test 5 times each, alternating between the memcpy version and the std::copy version. My code takes advantage of grabbing data in as large of chunks as possible (many other implementations operate with char / char *, whereas I operate with T / T * (where T is the largest type in the user's implementation that has correct overflow behavior), so fast memory access on the largest types I can is central to the performance of my algorithm. These are my results:

完成 SHA-2 测试运行的时间(以秒为单位)

std::copy   memcpy  % increase
6.11        6.29    2.86%
6.09        6.28    3.03%
6.10        6.29    3.02%
6.08        6.27    3.03%
6.08        6.27    3.03%

std::copy 的总平均速度比 memcpy 提高:2.99%

我的编译器是 Fedora 16 x86_64 上的 gcc 4.6.3.我的优化标志是 -Ofast -march=native -funsafe-loop-optimizations.

My compiler is gcc 4.6.3 on Fedora 16 x86_64. My optimization flags are -Ofast -march=native -funsafe-loop-optimizations.

我的 SHA-2 实现代码.

我决定也对我的 MD5 实现进行测试.结果不太稳定,所以我决定运行 10 次.然而,在我最初的几次尝试之后,我得到的结果从一次运行到下一次运行变化很大,所以我猜有某种操作系统活动正在进行.我决定重新开始.

I decided to run a test on my MD5 implementation as well. The results were much less stable, so I decided to do 10 runs. However, after my first few attempts, I got results that varied wildly from one run to the next, so I'm guessing there was some sort of OS activity going on. I decided to start over.

相同的编译器设置和标志.只有一个版本的 MD5,而且它比 SHA-2 更快,所以我对一组类似的 5 个测试字符串进行了 3000 次循环.

Same compiler settings and flags. There is only one version of MD5, and it's faster than SHA-2, so I did 3000 loops on a similar set of 5 test strings.

这是我最后的 10 个结果:

These are my final 10 results:

完成 MD5 测试运行的时间(以秒为单位)

std::copy   memcpy      % difference
5.52        5.56        +0.72%
5.56        5.55        -0.18%
5.57        5.53        -0.72%
5.57        5.52        -0.91%
5.56        5.57        +0.18%
5.56        5.57        +0.18%
5.56        5.53        -0.54%
5.53        5.57        +0.72%
5.59        5.57        -0.36%
5.57        5.56        -0.18%

std::copy 相对于 memcpy 的总平均速度下降:0.11%

我的 MD5 实现代码

这些结果表明,std::copy 在我的 SHA-2 测试中使用了一些优化,而 std::copy 在我的 MD5 测试中无法使用.在 SHA-2 测试中,两个数组都是在调用 std::copy/memcpy 的同一函数中创建的.在我的 MD5 测试中,其中一个数组作为函数参数传递给函数.

These results suggest that there is some optimization that std::copy used in my SHA-2 tests that std::copy could not use in my MD5 tests. In the SHA-2 tests, both arrays were created in the same function that called std::copy / memcpy. In my MD5 tests, one of the arrays was passed in to the function as a function parameter.

我做了更多的测试,看看我能做些什么让 std::copy 再次更快.答案很简单:开启链接时间优化.这些是我打开 LTO 的结果(gcc 中的选项 -flto):

I did a little bit more testing to see what I could do to make std::copy faster again. The answer turned out to be simple: turn on link time optimization. These are my results with LTO turned on (option -flto in gcc):

使用 -flto 完成 MD5 测试运行的时间(以秒为单位)

std::copy   memcpy      % difference
5.54        5.57        +0.54%
5.50        5.53        +0.54%
5.54        5.58        +0.72%
5.50        5.57        +1.26%
5.54        5.58        +0.72%
5.54        5.57        +0.54%
5.54        5.56        +0.36%
5.54        5.58        +0.72%
5.51        5.58        +1.25%
5.54        5.57        +0.54%

std::copy 的速度比 memcpy 平均提高:0.72%

总而言之,使用 std::copy 似乎没有性能损失.事实上,性能似乎有所提升.

In summary, there does not appear to be a performance penalty for using std::copy. In fact, there appears to be a performance gain.


那么为什么 std::copy 可以提升性能?

So why might std::copy give a performance boost?

首先,只要开启了内联优化,我不希望任何实现都会变慢.所有编译器都积极地内联;它可能是最重要的优化,因为它支持许多其他优化.std::copy 可以(我怀疑所有现实世界的实现都可以)检测到参数是微不足道的可复制的,并且内存是按顺序排列的.这意味着在最坏的情况下,当 memcpy 合法时,std::copy 的表现应该不会更糟.遵循 memcpystd::copy 的简单实现应该满足编译器的在优化速度或大小时始终内联它"的标准.

First, I would not expect it to be slower for any implementation, as long as the optimization of inlining is turned on. All compilers inline aggressively; it is possibly the most important optimization because it enables so many other optimizations. std::copy can (and I suspect all real world implementations do) detect that the arguments are trivially copyable and that memory is laid out sequentially. This means that in the worst case, when memcpy is legal, std::copy should perform no worse. The trivial implementation of std::copy that defers to memcpy should meet your compiler's criteria of "always inline this when optimizing for speed or size".

然而,std::copy 也保留了更多的信息.当您调用 std::copy 时,该函数会保持类型不变.memcpyvoid * 进行操作,它丢弃了几乎所有有用的信息.例如,如果我传入一个 std::uint64_t 数组,编译器或库实现者可能能够利用 std::copy 的 64 位对齐,但使用 memcpy 可能更难做到这一点.像这样的算法的许多实现首先处理范围开头的未对齐部分,然后是对齐的部分,最后是未对齐的部分.如果保证全部对齐,那么代码会变得更简单、更快,并且处理器中的分支预测器更容易获得正确的结果.

However, std::copy also keeps more of its information. When you call std::copy, the function keeps the types intact. memcpy operates on void *, which discards almost all useful information. For instance, if I pass in an array of std::uint64_t, the compiler or library implementer may be able to take advantage of 64-bit alignment with std::copy, but it may be more difficult to do so with memcpy. Many implementations of algorithms like this work by first working on the unaligned portion at the start of the range, then the aligned portion, then the unaligned portion at the end. If it is all guaranteed to be aligned, then the code becomes simpler and faster, and easier for the branch predictor in your processor to get correct.


std::copy 处于一个有趣的位置.我希望它永远不会比 memcpy 慢,有时使用任何现代优化编译器都会更快.此外,任何你可以memcpy,你都可以std::copy.memcpy 不允许缓冲区中有任何重叠,而 std::copy 支持一个方向的重叠(std::copy_backward 用于另一个方向)重叠方向).memcpy 仅适用于指针,std::copy 适用于任何迭代器(std::mapstd::vectorstd::deque 或我自己的自定义类型).换句话说,当您需要复制数据块时,您应该只使用 std::copy.

std::copy is in an interesting position. I expect it to never be slower than memcpy and sometimes faster with any modern optimizing compiler. Moreover, anything that you can memcpy, you can std::copy. memcpy does not allow any overlap in the buffers, whereas std::copy supports overlap in one direction (with std::copy_backward for the other direction of overlap). memcpy only works on pointers, std::copy works on any iterators (std::map, std::vector, std::deque, or my own custom type). In other words, you should just use std::copy when you need to copy chunks of data around.

这篇关于就性能而言,使用 std::memcpy() 或 std::copy() 更好吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持跟版网!



Why does C++ compilation take so long?(为什么 C++ 编译需要这么长时间?)
Why is my program slow when looping over exactly 8192 elements?(为什么我的程序在循环 8192 个元素时很慢?)
C++ performance challenge: integer to std::string conversion(C++ 性能挑战:整数到 std::string 的转换)
Fast textfile reading in c++(在 C++ 中快速读取文本文件)
Does the C++ standard mandate poor performance for iostreams, or am I just dealing with a poor implementation?(C++ 标准是否要求 iostreams 性能不佳,或者我只是在处理一个糟糕的实现?)
Is there any advantage of using map over unordered_map in case of trivial keys?(在关键的情况下,使用 map 比 unordered_map 有什么优势吗?)