This is Joe on Data.

Category Archives: Hadoop

Memory Management in Hadoop MapReduce

If you ever have to write MapReduce jobs or custom UDF or SerDe classes for Hive in Java, you will want to re-use memory as much as possible, meaning as few object and array allocations as possible, while also taking care not to inadvertently use/re-use data that is invalid or corrupted. This is an important practice […]

Intel Launches Hadoop Distribution and Project Rhino

Intel apparently is launching it’s own distribution of Hadoop as well as Project Rhino. Project Rhino is an “open-source effort to enhance security in Hadoop,” which makes Hadoop a more viable option for highly sensitive data. The Intel Hadoop distribution aims to optimize Hadoop for Intel Xenon platforms. I’m not convinced we need another distribution […]