如何在 C++ 中快速将大缓冲区写入二进制文件?

How to write a large buffer into a binary file in C++, fast?(如何在 C++ 中快速将大缓冲区写入二进制文件?)
本文介绍了如何在 C++ 中快速将大缓冲区写入二进制文件?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着跟版网的小编来一起学习吧!

问题描述

我正在尝试将大量数据写入我的 SSD(固态驱动器).大量我的意思是 80GB.

I'm trying to write huge amounts of data onto my SSD(solid state drive). And by huge amounts I mean 80GB.

我浏览了网络以寻找解决方案,但我想到的最好的方法是:

I browsed the web for solutions, but the best I came up with was this:

#include <fstream>
const unsigned long long size = 64ULL*1024ULL*1024ULL;
unsigned long long a[size];
int main()
{
    std::fstream myfile;
    myfile = std::fstream("file.binary", std::ios::out | std::ios::binary);
    //Here would be some error handling
    for(int i = 0; i < 32; ++i){
        //Some calculations to fill a[]
        myfile.write((char*)&a,size*sizeof(unsigned long long));
    }
    myfile.close();
}

使用 Visual Studio 2010 和全面优化编译并在 Windows7 下运行,该程序最大速度约为 20MB/s.真正困扰我的是 Windows 可以以 150MB/s 到 200MB/s 的速度将文件从另一个 SSD 复制到这个 SSD.所以至少快7倍.这就是为什么我认为我应该能够走得更快.

Compiled with Visual Studio 2010 and full optimizations and run under Windows7 this program maxes out around 20MB/s. What really bothers me is that Windows can copy files from an other SSD to this SSD at somewhere between 150MB/s and 200MB/s. So at least 7 times faster. That's why I think I should be able to go faster.

有什么想法可以加快我的写作速度吗?

Any ideas how I can speed up my writing?

推荐答案

这完成了工作(在 2012 年):

This did the job (in the year 2012):

#include <stdio.h>
const unsigned long long size = 8ULL*1024ULL*1024ULL;
unsigned long long a[size];

int main()
{
    FILE* pFile;
    pFile = fopen("file.binary", "wb");
    for (unsigned long long j = 0; j < 1024; ++j){
        //Some calculations to fill a[]
        fwrite(a, 1, size*sizeof(unsigned long long), pFile);
    }
    fclose(pFile);
    return 0;
}

我只是在 36 秒内计时了 8GB,大约是 220MB/s,我认为这可以最大限度地发挥我的 SSD.另外值得注意的是,问题中的代码100%使用了一个核心,而这段代码只使用了2-5%.

I just timed 8GB in 36sec, which is about 220MB/s and I think that maxes out my SSD. Also worth to note, the code in the question used one core 100%, whereas this code only uses 2-5%.

非常感谢大家.

更新:5 年过去了,现在是 2017 年.编译器、硬件、库和我的要求都发生了变化.这就是为什么我对代码进行了一些更改并进行了一些新的测量.

Update: 5 years have passed it's 2017 now. Compilers, hardware, libraries and my requirements have changed. That's why I made some changes to the code and did some new measurements.

先上代码:

#include <fstream>
#include <chrono>
#include <vector>
#include <cstdint>
#include <numeric>
#include <random>
#include <algorithm>
#include <iostream>
#include <cassert>

std::vector<uint64_t> GenerateData(std::size_t bytes)
{
    assert(bytes % sizeof(uint64_t) == 0);
    std::vector<uint64_t> data(bytes / sizeof(uint64_t));
    std::iota(data.begin(), data.end(), 0);
    std::shuffle(data.begin(), data.end(), std::mt19937{ std::random_device{}() });
    return data;
}

long long option_1(std::size_t bytes)
{
    std::vector<uint64_t> data = GenerateData(bytes);

    auto startTime = std::chrono::high_resolution_clock::now();
    auto myfile = std::fstream("file.binary", std::ios::out | std::ios::binary);
    myfile.write((char*)&data[0], bytes);
    myfile.close();
    auto endTime = std::chrono::high_resolution_clock::now();

    return std::chrono::duration_cast<std::chrono::milliseconds>(endTime - startTime).count();
}

long long option_2(std::size_t bytes)
{
    std::vector<uint64_t> data = GenerateData(bytes);

    auto startTime = std::chrono::high_resolution_clock::now();
    FILE* file = fopen("file.binary", "wb");
    fwrite(&data[0], 1, bytes, file);
    fclose(file);
    auto endTime = std::chrono::high_resolution_clock::now();

    return std::chrono::duration_cast<std::chrono::milliseconds>(endTime - startTime).count();
}

long long option_3(std::size_t bytes)
{
    std::vector<uint64_t> data = GenerateData(bytes);

    std::ios_base::sync_with_stdio(false);
    auto startTime = std::chrono::high_resolution_clock::now();
    auto myfile = std::fstream("file.binary", std::ios::out | std::ios::binary);
    myfile.write((char*)&data[0], bytes);
    myfile.close();
    auto endTime = std::chrono::high_resolution_clock::now();

    return std::chrono::duration_cast<std::chrono::milliseconds>(endTime - startTime).count();
}

int main()
{
    const std::size_t kB = 1024;
    const std::size_t MB = 1024 * kB;
    const std::size_t GB = 1024 * MB;

    for (std::size_t size = 1 * MB; size <= 4 * GB; size *= 2) std::cout << "option1, " << size / MB << "MB: " << option_1(size) << "ms" << std::endl;
    for (std::size_t size = 1 * MB; size <= 4 * GB; size *= 2) std::cout << "option2, " << size / MB << "MB: " << option_2(size) << "ms" << std::endl;
    for (std::size_t size = 1 * MB; size <= 4 * GB; size *= 2) std::cout << "option3, " << size / MB << "MB: " << option_3(size) << "ms" << std::endl;

    return 0;
}

此代码使用 Visual Studio 2017 和 g++ 7.2.0(新要求)编译.我用两个设置运行了代码:

This code compiles with Visual Studio 2017 and g++ 7.2.0 (a new requirements). I ran the code with two setups:

  • 笔记本电脑、Core i7、SSD、Ubuntu 16.04、g++ 7.2.0 版,带有 -std=c++11 -march=native -O3
  • 桌面、Core i7、SSD、Windows 10、Visual Studio 2017 版本 15.3.1 和/Ox/Ob2/Oi/Ot/GT/GL/Gy

给出了以下测量值(在丢弃 1MB 的值之后,因为它们是明显的异常值):选项 1 和选项 3 都最大化了我的 SSD.我没想到会看到这个,因为当时 option2 曾经是我旧机器上最快的代码.

Which gave the following measurements (after ditching the values for 1MB, because they were obvious outliers): Both times option1 and option3 max out my SSD. I didn't expect this to see, because option2 used to be the fastest code on my old machine back then.

TL;DR:我的测量表明使用 std::fstream 而不是 FILE.

TL;DR: My measurements indicate to use std::fstream over FILE.

这篇关于如何在 C++ 中快速将大缓冲区写入二进制文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持跟版网!

本站部分内容来源互联网,如果有图片或者内容侵犯您的权益请联系我们删除!

相关文档推荐

Why does C++ compilation take so long?(为什么 C++ 编译需要这么长时间?)
Why is my program slow when looping over exactly 8192 elements?(为什么我的程序在循环 8192 个元素时很慢?)
C++ performance challenge: integer to std::string conversion(C++ 性能挑战:整数到 std::string 的转换)
Fast textfile reading in c++(在 C++ 中快速读取文本文件)
Is it better to use std::memcpy() or std::copy() in terms to performance?(就性能而言,使用 std::memcpy() 或 std::copy() 更好吗?)
Does the C++ standard mandate poor performance for iostreams, or am I just dealing with a poor implementation?(C++ 标准是否要求 iostreams 性能不佳,或者我只是在处理一个糟糕的实现?)