Question

1.     What is Hadoop, and briefly explain the history of the development of Hadoop to its current...

1.     What is Hadoop, and briefly explain the history of the development of Hadoop to its current state? What are HBase and Pig? Find a business application case in which a company had to use Hadoop. Summarize the case in no more than a page.

Homework Answers

Answer #1

Hadoop is a collection of open-source software utilities that facilitate using a network of many computers to solve problems involving massive amounts of data and computation. It provides a software framework for distributed storage and processing of big data using the MapReduce programming model. Originally designed for computer clustersbuilt from commodity hardware—still the common use—it has also found use on clusters of higher-end hardware.All the modules in Hadoop are designed with a fundamental assumption that hardware failures are common occurrences and should be automatically handled by the framework.

Both Pig and Hive are high-level languages that compile to MapReduce. HBase is a completely different game: it allows Hadoop to support lookups/transactions on key/value pairs. HBase allows you to (1) do quick random lookups, versus scan all of data sequentially, (2) do insert/update/delete from middle, not just add/append.

The differences between Pig and Hive are significant. Specifically:

  • Pig doesn't require underlying structure to the data, Hive does imply structure via a metastore. This has its pros and cons. It allows Pig to be more suitable for ETL tasks where the input data is still a mish-mash and you want to convert it to be structured. On the other hand, Hive's metastore provides a dictionary that lets you easily see what columns exist in which tables, which can be very handy.
  • Pig is a new language, easy to learn if you know languages similar to Perl. Hive is a subset of SQL with very simple variations to enable map-reduce-like computation. If you come from a SQL background you will find Hive QL extremely easy to pickup (many of your SQL queries will run as-is), while if you come from a procedural programming background (without SQL knowledge) then Pig will be much more suitable for you. Furthermore, Hive is a bit easier to integrate with other systems and tools since it speaks the language they already speak: SQL.

Case study - Hotels.com

When you have a huge company like the Expedia owned hotel booking site, Hotels.com, you can only imagine the huge amounts data that gets churned by the millisecond. With the number of people that keeps coming and going, how can you convert them into site visitors? Hotels.com, intelligently solved that problem by using Hadoop.

Of course, they were already using cloud to power some of the small functions like the auto search capabilities that popped up as soon as a visitor types in the search boxes. However, during the peak season, things started getting tougher as more and more people poured in. They needed to use data to get closer to their customers.

This has to be done in such a way that the site’s performance doesn’t falter, because customers have little patience for slow-performing websites. The site had to respond quickly to oncoming demands and carry on with the number crunching while allowing people to book holidays without hitches.

Hotels.com started using NoSQL databases and Apache Cassandra. And Cassandra was a major boon in several ways. For example, if you are looking at a particular property, you can see this message “XX people are also looking at this property at this moment” “This property was booked previously on?—?— ”

When they started using Hadoop, it amplified the usage capabilities of Cassandra, helping them to convert more people and giving better service to them all. The conversion rate of the website also changed for the better.

Know the answer?
Your Answer:

Post as a guest

Your Name:

What's your source?

Earn Coins

Coins can be redeemed for fabulous gifts.

Not the answer you're looking for?
Ask your own homework help question
Similar Questions
1. Briefly describe the difference between the “merchandise trade balance” and the “current account balance.” Which...
1. Briefly describe the difference between the “merchandise trade balance” and the “current account balance.” Which one do we typically hear about on the news? 2. In the context of the national savings and investment identity, briefly describe the main sources for both the supply of and demand for capital in the U.S. economy. 3. Briefly explain how short-term movements in the business cycle affect the trade balance.
Your company must charge $100 for a software upgrade to make a profit on its development....
Your company must charge $100 for a software upgrade to make a profit on its development. You must find out if your customers are willing to pay this much. A random sample of 72 customers finds that 21 would pay $100 for the upgrade. If the upgrade is to be profitable, you will need to sell it to more than 25% of your customers. Do the sample data provide good evidence that more than 25% are willing to buy at...
Just as you are walking out of yet another meeting discussing options for the development of...
Just as you are walking out of yet another meeting discussing options for the development of the new oil zones, one of your Company’s Sr Managers in charge of Procurement comes in, and loudly reminds everyone that your company in fact actually has a spare subsea tree in storage. Given that they have this expensive asset on the books, and it is available right now, he wants the subsea team to plan to use this tree, in case the oil...
Briefly explain in your own words, what is meant by the following three terms: 1) natural...
Briefly explain in your own words, what is meant by the following three terms: 1) natural selection, 2) genetics, and 3) evolution.  Focus your answer on clarifying the difference(s) and/or similarities between these terms relative to understanding the development, growth, and functioning of human society.   
1) Explain what inferential statistics is used for 2) Define briefly and in your words the...
1) Explain what inferential statistics is used for 2) Define briefly and in your words the p-value 3) Provide an example where a hypothesis test would be worth doing with a null hypothesis μ1-μ2 = 0, and with an alternative hypothesis of μ1-μ2 ≠ 0 4) Explain why, in confidence intervals, when moving from a case in which the population variance is known to another in which this value is estimated from samples (sample variance), the length of the interval...
1. Explain the key differences between development of systems to run on mobile devices and on...
1. Explain the key differences between development of systems to run on mobile devices and on typical personal computing or internet-based environments, and apply this knowledge in the design of mobile device software. 2. Design effective applications for a mobile device by taking into consideration the underlying hardware-imposed restrictions such as screen size, memory size and processor capability. 3. Build, test and debug graphical applications for mobile devices by using the standard libraries that are bundled as part of the...
1.Briefly explain some reasons for why fertility rates have fallen in the world 2.To what extent...
1.Briefly explain some reasons for why fertility rates have fallen in the world 2.To what extent is investment in education a necessary condition for economic development? Discuss the two theories on this issue. 3.What are the problems with setting a poverty line in a poor country?
Subject: Business Ethics Answer in more than 3 sentences 1.explain what Marx means by 'fetishism of...
Subject: Business Ethics Answer in more than 3 sentences 1.explain what Marx means by 'fetishism of commodities' and explain at least one situation in which production does not involve such a fetish. 2.According to Adam Smith, how should capital be used productively and how will its productive use better the lives of all people in socity?
1) Company's Current ratio 2017 Current ratio = 2.055 2016 Current ratio = 2.077 Explain what...
1) Company's Current ratio 2017 Current ratio = 2.055 2016 Current ratio = 2.077 Explain what information this ratio provides (define), and what the results mean specifically to your company. Use complete sentences in your own words. Has the current ratio improved?_________________________ 2) Company's Debt ratio 2017 Debt Ratio =0.417987 = 41.799% 2016 Debit Ratio = 0.415240 = 41.524% Explain what information this ratio provides (define), and what the results mean specifically to your company Use complete sentences Has the...
1. Is the United States’ labor supply more inelastic or more elastic? Briefly summarize the competing...
1. Is the United States’ labor supply more inelastic or more elastic? Briefly summarize the competing theories. 2. The demand for energy drinks is more elastic than the demand for milk. Would a tax on energy drinks or a tax on milk have a larger deadweight loss? Explain. 3. You find that your paycheck for the year is higher this year than last. Does that mean that your real income has increased? Explain carefully. 4. For the economy as a...