Question

Hadoop has HDFS, which is the default built in FileSystem, written in Java. Cloudera and HortonWorks...

Hadoop has HDFS, which is the default built in FileSystem, written in Java. Cloudera and HortonWorks both use this built-in default Java implementation. MapR has taken a different approach. What approach has MapR taken in its FileSystem implementation, and what may be the advantages and disadvantages of MapR's approach versus other vendors? If there are disadvantages, how can they be addressed? Look at the advantages and disadvantages from user, developer, administrator and risk perspective.

Homework Answers

Answer #1

Approaches MapR has taken in its FileSystem implementation:-

The MapR Data Platform, which is the foundation of the MapR Distribution including Apache Hadoop, delivers a true file system that is POSIX-compliant with full random read-write capability. Instead of setting up Linux with EXT4 and then installing HDFS on top of that, you set up Linux with MapR XD. Significant speed benefits are observed because there are less layers in this architecture.

Let’s take a look at the different parts of the MapR Distribution that benefit from a read-write capable file system.

1) NFS
HDFS NFS support requires utilization of the local file system to temporarily write data before it lands in HDFS. There are two major problems with this. First, the data can potentially be copied out of order. Second, this means space must be reserved in the local file system to allow NFS enough space to land data before it can get copied into HDFS.

MapR NFS support, on the other hand, is true NFS. It is accessed like any other storage device. Any application you have that can read and write to an NFS mount can read and write to MapR XD. You don’t need to reserve local storage for it to work.

In addition to MapR NFS, MapR also supports the HDFS API, giving you even more options for integrating the MapR Distribution in your environment.

2) NameNode
The NameNode in Apache Hadoop is a single point of failure and a choke point for the platform. It limits the cluster to around 50-100 million total files in the system.

MapR doesn’t have a NameNode. The MapR distributed metadata architecture enables a single MapR cluster to support one trillion files and database tables on a single cluster. This is directly enabled by a random read-write file system. The MapR no-NameNode architecture means less hassles and less administrative overhead. Friends don’t let friends run NameNodes.

3) Real-time Hadoop
Apache HBase had to implement concepts like tombstones and compactions in order to be able to run on HDFS. They are workarounds for a write-once, read-many file system. Automatic compactions and region splits can cause the platform to be unstable during heavy production loads, and are recommended to be disabled in a production environment.

MapR Database implements the same API as HBase, but because it is implemented on a random read-write-capable file system, it doesn’t need tombstones or compactions. This enables high performance (an average of 2-7x faster than standard Apache HBase) and consistent low latency for your operational applications using MapR Database.

Advantages and disadvantages of MapR's approach versus other vendors considering user, developer, administrator and risk perspective:-

MapR is generally considered more expensive than free, but to be clear you can still use MapR Community Edition for free. The free part of Apache Hadoop is usually considered to be the biggest cost driver, when in fact it isn't even close. Most people try to ignore details like number of hours to administer, and how much hardware you need to run the platform. Both of which cost a lot of money. MapR has customers running well over 1,000 nodes and have only one administrator for the entire MapR cluster. MapR was built to be as close to zero-administration as possible in every respect.

Regarding the community edition, it is free, but it doesn't give you the HA features. It still delivers a faster and better user experience that the competition because it still runs the MapR File System which does not have a NameNode (read that as no single point of failure and no bottle necks when under heavy file load) and still support the HDFS API. It also delivers NFS (the others don't offer this). Think about this, if you have a 10 node cluster with Apache Hadoop you lose 2 nodes to NameNode and Secondary. With MapR all 10 are for doing actual work. That is a 20% improvement right off the top.

MapR even supports multiple versions of open source software running on the same cluster, the other vendors do not. MapR also stays out of the politics and supports more open source software than the other vendors.

To clarify Edwards point on MapR-DB, it supports the HBase API and now the Open JSON Application Interface (OJAI™ - currently in developer preview). MapR-DB is truly a zero administration database. Unlike HBase which requires considerable care to make it operate properly.

The performance of the MapR platform is considerably faster. There is a case in India where MapR displaced the competition: Architecting the World’s Largest Biometric Identity System: The Aadhaar Experience ... In this case the government was able to handle the same workload with better service levels on 1/3rd the hardware as our competitor.

Any code you write to work with Apache Hadoop or Apache HBase works just fine with MapR's distribution because it uses the same API binaries as the Apache Distributions.
MapR basically rewrote HDFS and HBase to be more performant, but some companies prefer the apache code base which is open source and used in the all other distributions. It can make integration with other tools easier, as there is more documentation and support from a broader community available.

One final point, just remember that free open source still has a cost. People have a difficult time calculating the costs. When it comes down to it, you have to figure out if you want to use the technology to solve problems and focus on your company's core competency, or solve the problems within the technology to make it accomplish the task you want to complete. Hardware = Money and Time = Money.

Know the answer?
Your Answer:

Post as a guest

Your Name:

What's your source?

Earn Coins

Coins can be redeemed for fabulous gifts.

Not the answer you're looking for?
Ask your own homework help question
Similar Questions
What characteristics of the market of systems do you think created monopoly market that Microsoft’s operating system enjoyed?
Read the following case study carefully and answer the questions given at the END.Playing Monopoly: MicrosoftThe success of Bill Gates together with his company Microsoft and the most favors Windows computer operating systems that are still dominating the PC operating system market has always been an excellent example stimulating the youths in the It industry to follow. But the business success and seemingly amazing technology innovation should not be very strong reasons why the ethical issues related to Microsoft and...
Does the Basel II Accord deserve its share of the blame in the run up to...
Does the Basel II Accord deserve its share of the blame in the run up to the financial crisis of 2007? Those who say “no” however point to shortcomings of Basel I Accord as the possible reason. At a time when countries had just begun the implementation of the Basel II Accord, the remnants of the Basel I era, with its lack of sensitivity and inflexibility to rapid innovations, could have created perverse regulatory incentives to simply move risky exposures...
1. Read the following instructions on 'How to Repair an Electric Fan' which are written in...
1. Read the following instructions on 'How to Repair an Electric Fan' which are written in three different parts before answering the question given. The instructions below are considered ineffective as they are not following the effective instructions strategy. 1 Turn your fan on to make sure that the motor works. Plug your fan in and turn it on to the highest power setting. If the fan blades move a little or it starts to turn, the motor is probably...
Merck, AIDS, and Africa Written July 2001, Revised October 23, 2003 Merck was being pilloried in...
Merck, AIDS, and Africa Written July 2001, Revised October 23, 2003 Merck was being pilloried in the international press. The issue? Its role in AIDS crisis in Sub-Saharan Africa, where the price of AIDS treatments far exceeded patients’ ability to pay. The fallout from public opinion threatened not only Merck’s valued reputation, but the international system of prices and intellectual property rights on which Merck’s business was based. The Pharmaceuticals Industry The pharmaceuticals industry is known for its enormous investments...
read Seasons of Love chapter:measuring a child's life after suicide. please answer the questions : reflect...
read Seasons of Love chapter:measuring a child's life after suicide. please answer the questions : reflect on what happens to the families when there is a suicide in the family, based on the Seasons of Love chapter...how should people be told? What details are best left unshared? below is the story These theories may have a certain face-validity, but they often neglect environmental or contextual factors that are innate to answering the question of “why” a person might engage in...
I've posted this question like 3 times now and I can't seem to find someone that...
I've posted this question like 3 times now and I can't seem to find someone that is able to answer it. Please can someone help me code this? Thank you!! Programming Project #4 – Programmer Jones and the Temple of Gloom Part 1 The stack data structure plays a pivotal role in the design of computer games. Any algorithm that requires the user to retrace their steps is a perfect candidate for using a stack. In this simple game you...
1.Establishing the virtual Management: As known, managing virtual staff requires a different method or approach than...
1.Establishing the virtual Management: As known, managing virtual staff requires a different method or approach than managing local staff. Due to that reason, Golden Scent has developed a strategic plan to successfully manage its virtual staff in the USA. Identify the suitable manager. to make sure our work will proceed as we planned, Golden Scent willrecruit a virtual manager with the essential skills and knowledge required to manage virtual employees. Find the skilled people to work with. Since not everyone...
Mattel Responds to Ethical Challenges Business Ethics This case was written by Debbie Thorne, John Fraedrich,...
Mattel Responds to Ethical Challenges Business Ethics This case was written by Debbie Thorne, John Fraedrich, O. C. Ferrell, and Jennifer Jackson, with the editorial assistance of Jennifer Sawayda. This case was developed for classroom discussion rather than to illustrate either effective or ineffective handling of an administrative, ethical, or legal discussion by management. All sources used for this case were obtained through publicly available material. Mattel, Inc. is a world leader in the design, manufacture, and marketing of family...
Financial Reporting and Analysis Assignment #1 Q1. What is IFRS? ? What is the IASB? ?...
Financial Reporting and Analysis Assignment #1 Q1. What is IFRS? ? What is the IASB? ? How widespread is the adoption of IFRS around the world? ? What is the possibility of the Securities and Exchange Commission substituting IFRS for GAAP? ? What are the advantages of converting to IFRS? ? What could be the disadvantages of converting to IFRS? ? What is the difference between convergence and adoption? ? When comparing IFRS and GAAP, what are some overall key...
CSC 322 Systems Programming Fall 2019 Lab Assignment L1: Cipher-machine Due: Monday, September 23 1 Goal...
CSC 322 Systems Programming Fall 2019 Lab Assignment L1: Cipher-machine Due: Monday, September 23 1 Goal In the first lab, we will develop a system program (called cipher-machine) which can encrypt or decrypt a code using C language. It must be an errorless program, in which user repeatedly executes pre-defined commands and quits when he or she wants to exit. For C beginners, this project will be a good initiator to learn a new programming language. Students who already know...
ADVERTISEMENT
Need Online Homework Help?

Get Answers For Free
Most questions answered within 1 hours.

Ask a Question
ADVERTISEMENT