This is Joe on Data.

Category Archives: Databases

Memory Management in Hadoop MapReduce

If you ever have to write MapReduce jobs or custom UDF or SerDe classes for Hive in Java, you will want to re-use memory as much as possible, meaning as few object and array allocations as possible, while also taking care not to inadvertently use/re-use data that is invalid or corrupted. This is an important practice […]

Picking the Right Database for Your Application

One of the first things you need to decide when building a new application or major feature is how you are going to store and process the data for it, which means picking the right database for the job. This is, of course, assuming you will need to store and retrieve data and your storage […]

Intel Launches Hadoop Distribution and Project Rhino

Intel apparently is launching it’s own distribution of Hadoop as well as Project Rhino. Project Rhino is an “open-source effort to enhance security in Hadoop,” which makes Hadoop a more viable option for highly sensitive data. The Intel Hadoop distribution aims to optimize Hadoop for Intel Xenon platforms. I’m not convinced we need another distribution […]