3 Tuning Tips To Increase Your Hadoop Performance

Oct. 21, 2015 ⋅ Categories: Hadoop

While a Hadoop server can be a powerful tool for your computing needs, it isn’t always automatically set up for top performance. To make sure you’re getting the most of your server, apply these tips as they best fit your needs.

1. Maximize your memory

You want to maximize the memory capacity of your server without risking swapping, as it reduces the performance of your Hadoop server. To accomplish this, you should tune your server to meet your requirements, such as programming a swap only when your operating system is fully out of memory space. To help manage your server’s memory usage, you can also use monitoring programs like Ganglia, Cloudera manager or Nagios.

2. Manage your data disks

To best manage your server’s data, think: compression. Compressing your input data not only saves on storage space, it also boosts your transfer speed. Another way you can manage your data disks is by scaling. When you start using your Hadoop server, you’ll want to adjust the amount of disks it uses according to your needs.

3. Reduce map disk spilling

If you want your Hadoop server to run at peak performance, you’ll want to reduce any map spilling Map spilling occurs when there isn’t sufficient memory to handle the map output, and if it happens multiple times, it can lead to unnecessary extra work for the server, and for you, in having to rewrite data. You’ll want to make this process more efficient by ensuring that you’re only spilling one time to maximize your server performance. To figure out your map phase’s spills, look at the Map Output records and after determining the size of your map output and its capacity for records, use that number to figure out how much space you’ll need for buffering.

