Causes and Tips for High I/O



Causes and Tips for High I/O

What is Considered I/O and Why Does it Matter

Anything that accesses a file or directory is an i/o access. If you are writing to a file, reading a file, or even just listing the file (ls), you cause disk accesses. Even just navigating around folders causes disk accesses. i/o access, or disk access, is absolutely normal and is expected with any job. However, when a user starts accessing the disk too often or too fast, this can cause problems for other users who also want to access the disk. Here at the CRC, we consider anything over 5 million accesses in one day to be excessive. To help put it in perspective, that is about 58 accesses per second. Please read the causes and tips sections to help determine and solve any high i/o problems.

Reasons for High I/O

There are typically only a few reasons that you could be experiencing high i/o with your job. Reading/writing inefficiently, or doing excessive lookups (ls).

Reading inefficiently - You may have a single input file that needs to be read multiple times and you are opening the file each time you want to read. This is probably the most common cause.
Writing inefficiently - You may have a single output file that is written to multiple times and you are opening the file each time you want to write. This is almost as common as reading.
Excessive lookups - You may have some script that does find or ls or something like that and is doing it excessively. This is not a common cause.

Tips for High I/O

To solve your high i/o problems, there are simple techniques that can be used to significantly reduce the i/o. However, although the technique is simple, actually pulling it off can, at times, be difficult.

The general rule of thumb is, look for the loops. Wherever you have a loop, you have potential i/o abuse.

Reading - If you have a single file that you need to read from each iteration of the loop, it is best to read it into a variable (into memory) first BEFORE the loop. Then use the variable every iteration instead. This will cause only 1 read instead of 1 read for every iteration. This is especially effective when your loop has an extreme amount of iterations or when you read from the same file many times in a single iteration.
Writing - If you have a single output file that you want to update after each iteration of the loop, it is best to compound your output into a variable (into memory) every iteration, and then AFTER the loop, write that variable into your output file. Once again, this is very effective when your loop has an extreme amount of iterations.
Lookups - This problem is a little more complicated and should be solved on a case to case basis. In general, you should not have very many lookups in your job. If you find yourself with a lot of lookups, you should consider reducing or removing them altogether.

If you would like to test and see how your changes are affecting your i/o, you can use the command "/usr/sbin/vos ex u.NETID", where NETID is your netid. Then just look at the accesses for the last day. If you need further help with fixing your high i/o, please email us at CRCSupport@nd.edu.