Aaa

Causes and Tips for High I/O

Causes of High I/O

There are typically only a few reasons that you could be experiencing high i/o with your job. Reading poorly, writing poorly, or doing excessive lookups (ls).

Reading poorly - You may have a single input file that you need to read for multiple pieces and you are opening the file each time you want to read. This is probably the most common cause.
Writing poorly - You may have a single output file that you need to write to multiple times and you are opening the file each time you want to read. This is almost as common as reading.
Excessive lookups - You may have some script that does find or ls or something like that and is doing it excessively. This is not a common cause.

Tips for High I/O

To solve your high i/o problems, there are simple techniques that can be used to significantly reduce the i/o. However, although the technique is simple, actually pulling it off can, at times, be difficult.

The general rule of thumb is, look for the loops. Wherever you have a loop, you have potential i/o abuse.

Reading - If you have a single file that you need to read from each iteration of the loop, it is best to read it into a variable (into memory) first BEFORE the loop. Then use the variable every iteration instead. This will cause only 1 read instead of 1 read for every iteration. This is especially effective when your loop has an extreme amount of iterations or when you read from the same file many times in a single iteration.
Writing - If you have a single output file that you want to update after each iteration of the loop, it is best to compound your output into a variable (into memory) every iteration, and then AFTER the loop, write that variable into your output file. Once again, this is very effective when your loop has an extreme amount of iterations.
Lookups - This problem is a little more complicated and should be solved on a case to case basis. In general, you should not have very many lookups in your job. If you find yourself with a lot of lookups, you should consider reducing or removing them altogether.

If you would like to test and see how your changes are affecting your i/o, you can use the command "/usr/sbin/vos ex u.NETID", where NETID is your netid. Then just look at the accesses for the last day.