1. What is Hadoop, and briefly explain the history of the development of Hadoop to its current state? What are HBase and Pig? Find a business application case in which a company had to use Hadoop. Summarize the case in no more than a page.
Hadoop is a collection of open-source software utilities that facilitate using a network of many computers to solve problems involving massive amounts of data and computation. It provides a software framework for distributed storage and processing of big data using the MapReduce programming model. Originally designed for computer clustersbuilt from commodity hardware—still the common use—it has also found use on clusters of higher-end hardware.All the modules in Hadoop are designed with a fundamental assumption that hardware failures are common occurrences and should be automatically handled by the framework.
Both Pig and Hive are high-level languages that compile to MapReduce. HBase is a completely different game: it allows Hadoop to support lookups/transactions on key/value pairs. HBase allows you to (1) do quick random lookups, versus scan all of data sequentially, (2) do insert/update/delete from middle, not just add/append.
The differences between Pig and Hive are significant. Specifically:
Case study - Hotels.com
When you have a huge company like the Expedia owned hotel booking site, Hotels.com, you can only imagine the huge amounts data that gets churned by the millisecond. With the number of people that keeps coming and going, how can you convert them into site visitors? Hotels.com, intelligently solved that problem by using Hadoop.
Of course, they were already using cloud to power some of the small functions like the auto search capabilities that popped up as soon as a visitor types in the search boxes. However, during the peak season, things started getting tougher as more and more people poured in. They needed to use data to get closer to their customers.
This has to be done in such a way that the site’s performance doesn’t falter, because customers have little patience for slow-performing websites. The site had to respond quickly to oncoming demands and carry on with the number crunching while allowing people to book holidays without hitches.
Hotels.com started using NoSQL databases and Apache Cassandra. And Cassandra was a major boon in several ways. For example, if you are looking at a particular property, you can see this message “XX people are also looking at this property at this moment” “This property was booked previously on?—?— ”
When they started using Hadoop, it amplified the usage capabilities of Cassandra, helping them to convert more people and giving better service to them all. The conversion rate of the website also changed for the better.
Get Answers For Free
Most questions answered within 1 hours.