1. <legend id='RnSAC'><style id='RnSAC'><dir id='RnSAC'><q id='RnSAC'></q></dir></style></legend>
  2. <small id='RnSAC'></small><noframes id='RnSAC'>

    • <bdo id='RnSAC'></bdo><ul id='RnSAC'></ul>
      <tfoot id='RnSAC'></tfoot>

      <i id='RnSAC'><tr id='RnSAC'><dt id='RnSAC'><q id='RnSAC'><span id='RnSAC'><b id='RnSAC'><form id='RnSAC'><ins id='RnSAC'></ins><ul id='RnSAC'></ul><sub id='RnSAC'></sub></form><legend id='RnSAC'></legend><bdo id='RnSAC'><pre id='RnSAC'><center id='RnSAC'></center></pre></bdo></b><th id='RnSAC'></th></span></q></dt></tr></i><div id='RnSAC'><tfoot id='RnSAC'></tfoot><dl id='RnSAC'><fieldset id='RnSAC'></fieldset></dl></div>

      包含 8000 万条记录的表并添加索引需要超过 18 小时(或永远)!怎么办?

      Table with 80 million records and adding an index takes more than 18 hours (or forever)! Now what?(包含 8000 万条记录的表并添加索引需要超过 18 小时(或永远)!怎么办?)

    1. <small id='pDuH4'></small><noframes id='pDuH4'>

    2. <i id='pDuH4'><tr id='pDuH4'><dt id='pDuH4'><q id='pDuH4'><span id='pDuH4'><b id='pDuH4'><form id='pDuH4'><ins id='pDuH4'></ins><ul id='pDuH4'></ul><sub id='pDuH4'></sub></form><legend id='pDuH4'></legend><bdo id='pDuH4'><pre id='pDuH4'><center id='pDuH4'></center></pre></bdo></b><th id='pDuH4'></th></span></q></dt></tr></i><div id='pDuH4'><tfoot id='pDuH4'></tfoot><dl id='pDuH4'><fieldset id='pDuH4'></fieldset></dl></div>

        <bdo id='pDuH4'></bdo><ul id='pDuH4'></ul>

              <tbody id='pDuH4'></tbody>

                <legend id='pDuH4'><style id='pDuH4'><dir id='pDuH4'><q id='pDuH4'></q></dir></style></legend>
                <tfoot id='pDuH4'></tfoot>
                本文介绍了包含 8000 万条记录的表并添加索引需要超过 18 小时(或永远)!怎么办?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着跟版网的小编来一起学习吧!

                问题描述

                简要回顾所发生的事情.我正在处理 7100 万条记录(与其他人处理的数十亿条记录相比并不多).在另一个线程上,有人建议我的集群的当前设置不适合我的需要.我的表结构是:

                A short recap of what happened. I am working with 71 million records (not much compared to billions of records processed by others). On a different thread, someone suggested that the current setup of my cluster is not suitable for my need. My table structure is:

                CREATE TABLE `IPAddresses` (
                  `id` int(11) unsigned NOT NULL auto_increment,
                  `ipaddress` bigint(20) unsigned default NULL,
                  PRIMARY KEY  (`id`)
                ) ENGINE=MyISAM;
                

                然后我添加了 7100 万条记录,然后做了一个:

                And I added the 71 million records and then did a:

                ALTER TABLE IPAddresses ADD INDEX(ipaddress);
                

                已经 14 小时了,操作仍未完成.通过谷歌搜索,我发现有一个众所周知的方法可以解决这个问题——分区.我知道我现在需要根据 ipaddress 对我的表进行分区,但是我可以在不重新创建整个表的情况下执行此操作吗?我的意思是,通过 ALTER 语句?如果是,则有一个要求说要分区的列应该是主键.我将使用这个 ipaddress 的 id 来构建一个不同的表,所以 ipaddress 不是我的主键.在这种情况下,我该如何分区我的表?

                It's been 14 hours and the operation is still not completed. Upon Googling, I found that there is a well-known approach for solving this problem - Partitioning. I understand that I need to partition my table now based on the ipaddress but can I do this without recreating the entire table? I mean, through an ALTER statement? If yes, there was one requirement saying that the column to be partitioned on should be a primary key. I will be using the id of this ipaddress in constructing a different table so ipaddress is not my primary key. How do I partition my table given this scenario?

                推荐答案

                好吧,事实证明这个问题不仅仅是一个简单的创建表,索引它然后忘记问题:) 这是我所做的,以防其他人遇到同样的问题(我使用了 IP 地址的示例,但它也适用于其他数据类型):

                Ok turns out that this problem was more than just a simple create a table, index it and forget problem :) Here's what I did just in case someone else faces the same problem (I have used an example of IP Address but it works for other data types too):

                问题:您的表有数百万个条目,您需要非常快速地添加索引

                用例:考虑在查找表中存储数百万个 IP 地址.添加 IP 地址应该不是什么大问题,但为它们创建索引需要 14 多个小时.

                Usecase: Consider storing millions of IP addresses in a lookup table. Adding the IP addresses should not be a big problem but creating an index on them takes more than 14 hours.

                解决方案:使用 MySQL 对表进行分区分区策略

                案例 #1:当您想要的表尚未创建时

                CREATE TABLE IPADDRESSES(
                  id INT UNSIGNED NOT NULL AUTO_INCREMENT,
                  ipaddress BIGINT UNSIGNED,
                  PRIMARY KEY(id, ipaddress)
                ) ENGINE=MYISAM
                PARTITION BY HASH(ipaddress)
                PARTITIONS 20;
                

                案例 2:当您想要的表已经创建时.似乎有一种方法可以使用 ALTER TABLE 来做到这一点,但我还没有找到合适的解决方案.相反,有一个效率稍低的解决方案:

                Case #2: When the table you want is already created. There seems to be a way to use ALTER TABLE to do this but I have not yet figured out a proper solution for this. Instead, there is a slightly inefficient solution:

                CREATE TABLE IPADDRESSES_TEMP(
                  id INT UNSIGNED NOT NULL AUTO_INCREMENT,
                  ipaddress BIGINT UNSIGNED,
                  PRIMARY KEY(id)
                ) ENGINE=MYISAM;
                

                将您的 IP 地址插入此表中.然后创建带有分区的实际表:

                Insert your IP addresses into this table. And then create the actual table with partitions:

                CREATE TABLE IPADDRESSES(
                  id INT UNSIGNED NOT NULL AUTO_INCREMENT,
                  ipaddress BIGINT UNSIGNED,
                  PRIMARY KEY(id, ipaddress)
                ) ENGINE=MYISAM
                PARTITION BY HASH(ipaddress)
                PARTITIONS 20;
                

                最后

                INSERT INTO IPADDRESSES(ipaddress) SELECT ipaddress FROM IPADDRESSES_TEMP;
                DROP TABLE IPADDRESSES_TEMP;
                ALTER TABLE IPADDRESSES ADD INDEX(ipaddress)
                

                就这样……在一台 3.2GHz 的机器上用 1GB 的内存在新表上建立索引花了我大约 2 小时:) 希望这会有所帮助.

                And there you go... indexing on the new table took me about 2 hours on a 3.2GHz machine with 1GB RAM :) Hope this helps.

                这篇关于包含 8000 万条记录的表并添加索引需要超过 18 小时(或永远)!怎么办?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持跟版网!

                本站部分内容来源互联网,如果有图片或者内容侵犯了您的权益,请联系我们,我们会在确认后第一时间进行删除!

                相关文档推荐

                Import CSV to Update rows in table(导入 CSV 以更新表中的行)
                Importing MaxMind#39;s GeoLite2 to MySQL(将 MaxMind 的 GeoLite2 导入 MySQL)
                How to save a Google maps overlay shape in the database?(如何在数据库中保存谷歌地图叠加形状?)
                Is there a performance hit using decimal data types (MySQL / Postgres)(使用十进制数据类型(MySQL/Postgres)是否会影响性能)
                MySQL truncates concatenated result of a GROUP_CONCAT function(MySQL 截断 GROUP_CONCAT 函数的连接结果)
                How to reduce size of SQL Server table that grew from a datatype change(如何减少因数据类型更改而增长的 SQL Server 表的大小)
                <legend id='dSY4t'><style id='dSY4t'><dir id='dSY4t'><q id='dSY4t'></q></dir></style></legend>
                1. <tfoot id='dSY4t'></tfoot>
                  • <bdo id='dSY4t'></bdo><ul id='dSY4t'></ul>
                  • <i id='dSY4t'><tr id='dSY4t'><dt id='dSY4t'><q id='dSY4t'><span id='dSY4t'><b id='dSY4t'><form id='dSY4t'><ins id='dSY4t'></ins><ul id='dSY4t'></ul><sub id='dSY4t'></sub></form><legend id='dSY4t'></legend><bdo id='dSY4t'><pre id='dSY4t'><center id='dSY4t'></center></pre></bdo></b><th id='dSY4t'></th></span></q></dt></tr></i><div id='dSY4t'><tfoot id='dSY4t'></tfoot><dl id='dSY4t'><fieldset id='dSY4t'></fieldset></dl></div>

                          <tbody id='dSY4t'></tbody>

                        <small id='dSY4t'></small><noframes id='dSY4t'>