Wednesday, April 29, 2009

Linux I/O scheduler queue size and MyISAM performance

At MySQL Conference and Expo 2009, I explained how Linux I/O scheduler queue size affects MyISAM insert performance.

It is well known that Linux implemented four types of I/O schedulers (noop/deadline/anticipatory/cfq) in Linux kernel 2.6.10. The default is cfq in most distributions including RHEL, which is not so good then noop normally outperforms, but I'll talk this in other posts.

Linux I/O scheduler also has a functionality to sort incoming I/O requests in its request-queue for optimization. Queue size is configurable. Queue size is defined as "/sys/block/sdX/queue/nr_requests" then you can change queue length as follows.

# cat /sys/block/sda/queue/nr_requests
# echo 100000 > /sys/block/sda/queue/nr_requests

Changing queue size is even effective for noop scheduler.

Here are benchmarking results about changing i/o scheduler queue size for MyISAM insert-intensive loads. Detailed are written in the slides at the UC.

Apparently increasing queue size was very helpful for HDD, but not helpful for SSD.
Let me explain about backgrounds.

On Linux side, I/O requests are handled by the following order:
system calls(pwrite/etc)
-> Filesystem
-> I/O scheduler
-> Device Driver/Disks

I/O scheduler sorts incoming I/O requests by logical block addresses, then sending them to a device driver.
I/O scheduler does not depend on storage devices so is helpful for some parts (i.e. minimizing disk seek overheads), not helpful for other parts (i.e. minimizing disk rotation overheads).

On the other hand, Command Queuing (TCQ/NCQ) also sorts I/O requests for optimization. SAS and recent SATA II disks support command queuing. The goal is partly duplicate from I/O scheduler. But TCQ can minimize not only disk seeks but also disk rotation overhead (See the link to wikipedia). The disadvantage of TCQ is that queue size is very limited (normally 32-64).

Based on the above, sorting almost all random I/O requests on I/O sheculer then sending them to TCQ would be nice.

Suppose 100,000 random read I/O requests are coming.
When I/O scheduler queue size is 128 (default in many cases), TCQ gets I/O requests by almost random order, so pretty high disk seek overhead happens for each action (requests within single queue is dispersed).

When I/O scheduler queue size is 100,000, TCQ gets I/O requests by fully sorted order, so seek overhead can be much smaller.

Increasing queue size does not have any effect on SSD because no disk seek happens.

This would explain my benchmarking results.

I/O scheduler queue size settings is not helpful for InnoDB because InnoDB internally sorts I/O requests to optimize disk seek overheads, and sending limited number of i/o requests controlled by InnoDB internal i/o threads. So the role is duplicate between InnoDB itself and I/O scheduler queue. Note that TCQ improves InnoDB throughput because disk rotation overheads are significantly reduced and such optimizations can not be done from application/kernel side.

MyISAM does nothing special (highly depending on OS) so this helps.

Updated in Apr 30: Added detailed benchmark conditions for people who are interested..

Here is a test script. I ran a single-threaded stored procedure on a same machine.
create table aa (id int auto_increment primary key,
b1 int,
b2 int,
b3 int,
c varchar(100),
index(b1), index(b2), index(b3)) engine=myisam;

drop procedure sp_aa;
delimiter //
create procedure sp_aa(IN count INTEGER)
DECLARE time_a, time_b BIGINT DEFAULT 0;
WHILE done != 1 DO
insert into aa values (i,rand()*count,rand()*count,rand()*count,repeat(1,40));
SET i = i + 1;
IF i % 1000000 = 1 THEN
SELECT unix_timestamp() into time_a from dual;
SELECT i, from_unixtime(time_a), time_a - time_b from dual;
SET time_b = time_a;
IF i > count THEN
SET done = 1;
delimiter ;

mysql test -vvv -e "call sp_aa(300000000)"

Then wait for a long long time...
# Default insert. no insert-delayed, no disable-keys, no delay-key-write, no mmap

H/W, OS, MySQL settings
Sun Fire X4150
CPU: Intel Xeon, 8 cores
RAM: 32GB (but limit filesystem cache size up to 5GB, no swapping happened)
HDD: SAS 15,000RPM, 2 disks, RAID 1, write cache enabled
SSD: Intel X25-E, Single drive, write cache enabled
OS: RedHat Enterprise Linux 5.3 (2.6.18-128.el5)
Filesystem: ext3
I/O Scheduler: deadline
MySQL 5.1.33
key_buffer_size: 2G

i/o stats:
queue size=128 (default)
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
0.01 0.00 0.08 24.69 0.00 75.22

Device: rrqm/s wrqm/s r/s w/s rMB/s
sdb 0.00 0.87 0.00 575.60 0.00

wMB/s avgrq-sz avgqu-sz await svctm %util
2.25 8.01 142.44 247.03 1.74 100.00
(At running 12 hours, 13 mil rows were inserted)

queue size=100000
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
2.06 0.00 5.32 29.66 0.00 62.96

Device: rrqm/s wrqm/s r/s w/s rMB/s
sdb 0.00 2487.33 0.00 2042.60 0.00

wMB/s avgrq-sz avgqu-sz await svctm %util
35.11 35.20 84402.36 43582.33 0.49 100.01
(At running 1.5 hours, 41 mil rows were inserted)

avg-cpu: %user %nice %system %iowait %steal %idle
0.07 0.00 0.19 24.82 0.00 74.91

Device: rrqm/s wrqm/s r/s w/s rMB/s
sdb 0.00 9.03 1.70 756.03 0.01

wMB/s avgrq-sz avgqu-sz await svctm %util
5.56 15.04 31981.72 127560.45 1.32 100.00
(At running 12 hours, 77 mil rows were inserted,
index size was 4.5 GB)

If running many more hours(days) so that index size exceeds filesystem cache, a lot of disk reads will happen. (I didn't have time to run that test before UC)


peter-zaitsev said...


I really think you're getting into some side effect here. If you would publish scripts I would be very curious to redo the run to see if I can repeat the same behavior.

It also would be interesting to see the IO data from the test. When you're inserting data you have potential of doing reads and writes. Reads are issued one by one anyway so they can't be affected by queue depth. It is possible scheduler algorithm causes side effect of having myisam files writes delayed.

Yoshinori Matsunobu said...

Hi Peter,

I published detailed test scripts. If you need more information please let me know.
I didn't have time to fully test on noop/anticipatory/cfq scheduler for this test, but as far as running a couple of hours I saw similar behavior (=increasing queue size improved throughput for this test) on noop scheduler.

Mark Callaghan said...

This is very interesting. Do you have results for a queue size larger than 128 but not as large as 100,000?

Yoshinori Matsunobu said...

Hi Mark,

I haven't done yet. In theory it should be higher than 128 but lower than 100,000. For example, when setting 10,000, avgqu-sz should be less than 10,000, sorting less than 10,000 i/o requests. Then the rest i/o requests are not sorted so it is less efficient than setting 100,000.

Log Buffer said...

"Yoshinori Matsunobu’s item illuminating Linux I/O scheduler queue size and MyISAM performance attracted some comment..."

Log Buffer #145

Anonymous said...

So a trade off between latency and throughput when you know your load isn't latency sensitive. Nice tweak.

Anonymous said...

How about arrays with more then 2 spindles? For instance my Data Array has 34 in a raid 10. I would think that the spindle count would have an impact on what is optimum

Anonymous said...

is it necessary to to shutdown the oracle db before changing the value? And is it neccessary to reboot the server after chanign the value?

Yoshinori Matsunobu said...

Not needed. The change is dynamically updated.

Anonymous said...

i wonder if the change of the parameter "queue size" affects the performance of other types of querys (select or update for example)?
Sorry for my bad english...

Sumant said...

How about arrays with more then 2 spindles? For instance my Data Array has 34 in a raid 10.

anyway Dont forget to click link.

Nick said...

I just increased the queue size to 100,000 and my WinXP VM started 15 seconds faster- from 60 to 45 seconds. Worth investigating. Thanks a lot!

Sildenafil Citrate said...

wow this is great I also managed to increase the Queue size to 200,000 and the noop scheduler performance increased a lot, I did not know that I could configure the Queue size, thank you very much!

Anonymous said...

Just remember that you're increasing latency of the individual requests. You're allowing a huge number of requests to be popped onto the queue and live in memory for a long period of time. Good for some workloads but it has undesirable consequences for others. For example, more writes in-flight during a crash can be bad, especially where ext journaling is now defaulting to writeback.

Handwriting analysis said...

Want to thank you for interesting content dude. Keep writing

get rid of cellulite said...

wow. finally, I discovered something useful for my paper to write about. that is fascinating and helps me with more research in the future. Glad I discovered this blog.Thank you. And I do hope you will broaden some of your ideas about this subject and I'll sure come back and browse it. Thanks for the effort and time.

Sagging skin said...

Your post really grabbed my attention and interest for the reason that the content is not just informative but also simple yet meaningful.

Natural laxative foods Stomach gas Hard stool treatment Health benefits of almonds White spots on skin
Cure ear infection Under eye wrinkles Baby ear infection Prevent prostate cancer Prevent prostate cancer Lemon tea healh benefits Lemon tea healh benefits Prevent candida yeast infection Prevent candida yeast infection Health benefits of papaya Health benefits of papaya Benefits of peppermint oil Benefits of peppermint oil Stop hiccups Stop hiccups Water and weight loss Water and weight loss Wart Removal Wart Removal Health benefits of oatmeal

android tablet said...

Good post. Very impressive. Thanks for sharing.

Anonymous said...

That queue length is way too high. In the second test after 12 hours you've got an average IO wait time of 128 seconds. It may speed up bulk inserting but reads are going to suffer, any kind of interactive system (desktop vm, web server, db backing a web server, etc...) depending on that drive will be unusable. To keep an interactive system feeling responsive you need to keep disk latency under 20ms max, preferably under 10ms.

If you've got a disk array that can handle it, large queues are great. I've got arrays with hundreds of disks, and 4 or 8 4gbps fiberchannel ports to connect them; they can handle queues of 2000 per port and keep responsiveness. But you have to size it properly or you will cause other problems.

hermes birkin said...

Thank you for another essential article. Where else could anyone get that kind of information in such a complete way of writing? I have a presentation incoming week, and I am on the lookout for such information.
Hermes replica
hermes birkin replica

Runescape Gold said...

That queue length is much too much. Within the 2nd test out after 10 hours you might have an average IO put it off use of 128 moments. It may quicken size putting yet reads will likely suffer, any kind of online method (personal computer vm, web remote computer, db support a web-based server, and so forth...) dependant upon that travel will likely be not used. To keep a great active procedure sense responsive you need to continue to keep hard disk drive latency under 20ms utmost, preferably below 10ms.

If you a disk number that will handle it, substantial lists are excellent. Brand-new arrays along with numerous hard disks, along with Some or even Seven 4gbps fiberchannel plug-ins in order to connect these folks; they might manage lists associated with 1999 a port although responsiveness. And you should sizing them accurately or else you will induce other conditions.

Buy RS Gold said...

Thank you for sharing. Glad to see you.It is really a good post.

instantinsurance said...

Wow, nice post,there are many person searching about that now they will find enough resources by your post.Thank you for sharing to us.Please one more post about that..Mary

Saad Jafri said...

Is there a way to change default nr_requests of 128 to 256 whithout doing "echo > /sys/block//queue/nr_requests?

I have hundreds of devices in /sys/block/... directory as I have using large number of LUNs with RHEL 6.2 native multipathing and issueing the "echo" command for each device is error prone and will not be persisten across reboots. Also, sometimes the devices under /sys/block/... directory for the LUNs change after a reboot.

I was wondering if there is a way to pass nr_requests value at the bootup time (perhaps in the kernel line of grub.conf file like we can for selecting disk IO scheduler (noop, or deadline or anticipatory)?

Thank you for your help in advance.

Anonymous said...

for i in /sys/block/sd*
echo X >$i/queue/nr_requests

repls said...

hi, as you said "Suppose 100,000 random read I/O requests are coming.When I/O scheduler queue size is 128 (default in many cases), TCQ gets I/O requests by almost random order, so pretty high disk seek overhead happens for each action (requests within single queue is dispersed)."

while when the I/O scheduler queue size is 128, TCQ gets I/O requests by almost random ?

Top Escorts said...

The exercises represent our ethos to inspire change through creativity and aim to make people stop and think about the bigger picture.

escorte said...

Concerns about this controversial measure and the associated contributions to climate change are driving many groups to reassess the policy.

4 day workout said...

The council noted that with expenditure going to provide social services in inner cities across the country, the programs are among the finest in the world.

Emily said...

Pour tirer le meilleur parti wow gold voie serait de trouver les foules qui laissent tomber le butin le plus cher ou beaucoup d'or quand ils sont tués. l'utilisation de la stratégie d'agriculture, vous devriez pour savoir où le meilleur endroit pour l'agriculture est. Vous devriez pour chercher ces endroits. Pour garantir que vous pouvez revenir à la région pour être fermier quand vous avez besoin de plus d'or, achat po vous pouvez prendre note que ces foules sont. Il n'est pas assez que vous savez juste où l'endroit que les foules réengendreront, vous doit aussi savoir que la foule est la plus de valeur. Si c'est possible que vous puissiez demander aux joueurs connus de partager leur connaissance avec vous. Aussi, vous pouvez lire les guides qu'écrit par le professionnel hou la les joueurs.Dans le monde de warcraft, il y a quelques endroits que les acheter world of warcraft foules réengendreront dès qu'ils sont morts.

paula adams said...

I agree with this article completely, I must thank you for posting such helpful facts.dissertation provider , thesis assistance , writing firm , accounting help , bachelors essay , custom essay

niki 3254 said...

I like your site very much, I fall here by google search and start reading your post, by watching all pages I get to know that you are posting nice stuff on your pages, I would share this information with all my friends, as you are providing a nice information resource to all people, this is only one place I seen this much information all together,Student Loan because people search for such places where they can get all they looking for. Im really happy to find this place and I would talk about your site with my friends. I would definitely return back to your site as I book marked your page url to visit you again. I hope you will keep posting nice stuff on different important issues, and will share all your nice stuff with us, and all your experience to give more knowledge to people, the design of your website is really nice and you are working hard to provide a best information resource. university entrance programmesNo doubt that you are one of the best people who are working hard to make this internet world more worth full as there are many website who are not providing that they should. They are just trying to waste people time and give no good results. But im really happy with your site and design you are using is very good.I am happy to find your distinguished way of writing the post. Now you make it easy for me to understand and implement the concept. I really loved reading your blog. It was very well authored and easy to understand. Unlike additional I have read which are really not tht good. ACCAI also found your posts very interesting. In fact after reading, I had to go show it to my friend and he enjoyed it as well!Thanks so much for this! I have not been this thrilled by a blog post for quite some time! You’ve got it, whatever that means in blogging. Anyway, You’re definitely someone that has something to say that people should hear. Keep up the wonderful job. Keep on inspiring the people.

GuildWars2Items said...

Of all the wonderful gifts that we've been given, one of the greatest is freedom aion gold, a bumblebee if dropped into an open tumbler will be there until it dies, unless it is taken out aion gold, It never sees the means of escape at the top, but persists in trying to find some way out through the sides near the bottom aion gold.

peter anderson said...

I feel strongly about it and love learning more on this topic thanks for sharing.
Dissertation Writing Help

Peter brown said...

Thanks to share with us the information. I wish you good luck!Life Coach Training

Maria Arredondo said...

The consequences of today are determined by the actions of the past scarlet blade gold. To change your future, alter your decisions today scarlet blade gold, Experience is a hard teacher because she gives the test first, the lesson afterwards scarlet blade gold, but it takes character to keep you there.

Maria Arredondo said...

The consequences of today are determined by the actions of the past scarlet blade gold. To change your future, alter your decisions today scarlet blade gold, Experience is a hard teacher because she gives the test first, the lesson afterwards scarlet blade gold, but it takes character to keep you there

Maria Arredondo said...

It takes strength to be truthful when a lie would be more convenient D3 Gold, it takes strength to be polite to someone when that person has been rude to you Buy D3 Gold, it takes strength to persist in the face of obstacles, when it would be much easier to simply give up Cheap D3 Gold.

Maria Arredondo said...

Would you write to please D3 Gold just yourself? Or others? Or yourself by writing for others Cheap D3 Gold? It takes strength to do what must be done when the work is unpleasant and uncomfortable D3 Gold Sale, And of what would you write: Of love? Hate? Fun? Misery? Life? Death?Nothing Everything?

de wo said...

They frequently looked upon WOW Tailoring as a profession and they'll benefit directly from this profession as buy wow gold they create their own outfits

Cahaya Mandiri said...

Great Post. I have not been visiting the site recently. Took a visit again and there were some great comments on the site. Excellent post. Keep up the good work.
tips cara agar cepat hamil l CARA BELAJAR BAHASA INGGRIS l the best acne treatment l how to lose weight fast easy
margahayuland l BISNIS ONLINE l tips cepat hamil l how to get rid of acne home remedies l
home remedies for acne l how to cure acne fast l
baju batik modern l toko sepatu online l grosir jam tangan online l
jual jam tangan l toko jam tangan murah peluang usaha online l is acne no more for you l how to get rid of acne naturally
how to clear acne l cure acne naturally
best natural remedies for acne l acne no more l tempat belajar bisnis online
peluang usaha rumahan l cara mendapatkan uang dari internet
makanan sehat agar cepat hamil l penyebab tidak bisa hamil lcara agar cepat hamil

yakenzu toby said...

There is lot of information and they are very innovative and informative. I have read the article very well and it seems to me awesome.

Rosalinda Ursery said...

Wonderful blog post. That is what I was searching for. I really like your blog and I appericiate your efforts. Your article is very helpful for me and many others to work out. I will definitely come back on your site for more stuff. Good Luck for the future posts.
Visit Here

Lisa Parkar said...

This is a wonderful article, Given so much info in it, These type of articles keeps the users interest in the website, and keep on sharing more ... good luck Werbung Wien

Abdul Rafay said...

Yes,definitely i agree with your conclusions since it looks realistic and has got good volume of information.I will be looking back to your wonderful articles in near future.Sop kindly update your blogs
Writing Essayz

Bob Petrick said...

Right choice at right place! Never intend something that is going to make my self more clear and dominant. I love it!!!

leather jacket captain america wears in avengers

Lisa Parkar said...

The post contains really precious information that will convince readers and can clarify things upon. It's so logically written modern dinette sets

Lisa Parkar said...

I am actually getting ready to across this information as given info is very helpful my friend.Really a wonderful article you have posted modern dining room set

mahasiswa teladan said...

hi...Im student from Informatics engineering nice article,
thanks for sharing :)

Jess said...

Prince Ali will mention his father will pay you 07 rs gold well and then leave. You'll tell him to go see Leela. Now, go to Al Kharid and talk to Hassan once again to receive to buy runescape gold reward. My blog here

alisa brown said...

Thanks for your such an informative and lovely blog. I really appreciate you .
Black Friday Smart TV Deals

Amanda kennedy said...

This blog is really astonishing keep posting these types of informative blogs.
pmp exam

kathrina lewis said...

This post is amazing for me to work out and also i like reading your post.
web design uk

Anayo Etumnu said...

Your post really got my interest and interest basically because the content is not just useful but also easy yet important.

Alex edward said...

This post is really adorable to read thanks for sharing this blog
power lead system

tyjnwey said...

Simply wish to say your article is as astonishing. The clarity in your post is simply great and i could assume you are an expert on this subject. Well with your permission let me to grab your RSS feed to keep updated with forthcoming post. Thanks a million and please keep up the gratifying work.

Plastic Flow Meter

Muslim Marriage said...

Great post and very useful code! The info is very open and very clear explanation of issues. Your website is very useful. Thanks for sharing this post. Appreciate it. freelance career

Muslim Marriage said...

Congratulations for posting such a useful weblog. Your blog isn’t only informative but also very artistic too. There normally are very couple of individuals who can write not so simple articles that creatively. Keep up the great work !! how to bid in odesk

tyjnwey said...

The post is written in very a good manner and it entails many useful information for me. I am happy to find your distinguished way of writing the post. Now you make it easy for me to understand and implement the concept.
PVC valves

Alex Edward said...

This is an amazing blog and I really found it helpful to me.Keep sharing these fantastic blog
clutch purses

Naimur Rashid said...

The journalists’ noticeable of site information is one of top all to reduce an essay next. We fastened them whatsoever they future click types a infinite essay, and eagerly, they responded per three key books, all of which are attentive below.

Deba sheesh said...

I'll bookmark your blog and check again here regularly. I am quite certain I will learn a lot of new stuff right here! Best of luck for the next.
hayward Plastic Strainers

tyjnwey said...

Thanks for valuable and excellent post, as share good stuff with good ideas and concepts, lots of great information and inspiration .
Plastic Butterfly Valve

romesh seo said...

This is a great inspiring Article.I am pretty much pleased with your good Work.You put really very helpful information .

Truflo Pump Protection

romesh seo said...

Thanks for this read mate. Well, this is my first visit to your blog! But I admire the precious time and effort you put into it, especially into interesting articles you share here!

Truflo Adjustable Flow Switch

Deba sheesh said...

This is a terrific article, and I would like more information if you have any. I am fascinated with this topic and your post has been one of the best I have read.
Paddlewheel Flowmeter

romesh seo said...

Nice post, I bookmark your blog because I found very good information on your blog, Thanks for sharing. Waiting for your next post.

Paddle Wheel Flow Meter

tyjnwey said...

Im glad to see that people are actually writing about this issue in such a smart way, showing us all different sides to it. Please keep it up.
Hayward Strainers

green bangladesh said...

I feel very delighted to read the entire post that is amazing. Do you have any more published contents on similar topic? From braces fetish, i would like to thank you for such wonderful topic.

tyjnwey said...

WOW! what a great concept art. I love how you got the guy that was originally Sonic to return!!!!!!!! you guys are doin great! keep up the good work
Hayward Valves

Deba sheesh said...

I am really excited about the trailer which is going to get released in the coming month. I am sure that the visual effects works are going to get you a lot of fans. Thanks for sharing the updated, links and information. Keep posting.
Truflo flow meters

Deba sheesh said...

Advantageously, the article is really the best on this notable topic. I harmonize with your conclusions and will thirstily look forward to your approaching updates
levelpro level transmitter

Brendon Taylor said...

Scrumptious post.... dissertation labs | buy dissertation at dissertationlabs

romesh seo said...

Thanks for sharing this useable article - I really increase your is really very informatic post for young people, and hope they will enjoy well after reading this post.
Plastic Check Valves

mike Sales said...

I am very happy to find this blog.Thanks for creating the page ! I am positive that it will be very popular. It has good and valuable content which is very rare these days.
Plastic valves

jenny maria said...

I really enjoyed the quality information you offer to your visitors for this blog. I will bookmark your blog and have my friends check up here often.
ipad covers

sadia sulaman said...

I have been waiting for someone to share this post. This has actually made me think and I hope to read more. Thanks a lot for sharing with us.SEO Manchester

Saeid Zebardast said...

Great post, Thanks!

sadia sulaman said...

I will be interested in more similar topics. i see you got really very useful topics , i will be always checking your blog thanks. Web Design Manchester

jenny maria said...

Great post, you have pointed out some excellent points, I as well believe this is a very superb website.
Office Supply Stores

mike Sales said...

I have been searching about this topic on internet for a long time but I can't find a good post. Your post is really helpful for me.
Thanks for your wonderful post. I am very happy to read your post. It is really very helpful for us and I have gathered some important information from this.
Flowline liquid level sensors

sadia sulaman said...

i really got surprised while stumbling upon these categories of blog reviews written here, i awaited a lot for this and now finally got the solution here only.SEO Liverpool

william max said...

Wonderful blog post. That may be what exactly My partner and i seemed to be trying to find. I really like your site as well as My partner and i appericiate your time and efforts. there are numerous particular person searching Window Treatments in Peoria AZabout this right now they'll discover plenty of resources by your write-up. Thanks for discussing to help people.

sadia sulaman said...

Hello, i am glad to read the whole content of this blog and am very excited and happy to say that the webmaster has done a very good job here to put all the information content and information at one place.Buy Facebook Fans

Stewart Agron said...

In my last blog post I explained how to use Persona Cluster Control to create a

Post a Comment