Draw a comparison between data integration, functional integration, and application integration. List the integration technologies that enable data and metadata integration.
What is the architectural difference between a data warehouse, data mart, operational data store, and an enterprise data warehouse?
What are the various types of metadata, and what roles do they play in the various integration exercises?
What is the relationship between business processes and the various architecture for data warehousing?
What are the major design principles of security architec- ture that need to be adhered to in the creation and opera- tion of a data warehouse?
What is the relationship between master data management and a data warehouse within an enterprise? What further advancements can be made to improve data quality?
Discuss the relationships between data warehouse archi- tectures and development methods.
Discuss security concerns involved in building a data warehouse.
Investigate current data warehouse development imple- mentation through offshoring. Write a report about it. In class, debate the issue in terms of the benefits and costs, as well as social factors.
SAP uses the term strategic enterprise management (SEM), Cognos uses the term corporate performance management (CPM), and Hyperion uses the term business
performance management (BPM). Are they referring to the same basic ideas? Provide evidence to support your answer.
11. BPM encompasses five basic processes: strategize, plan, monitor, act, and adjust. Select one of these processes, and discuss the types of software tools and applications that are available to support it. Figure 3.10 provides some hints. Also, refer to Bain & Company’s list of manage- ment tools for assistance (bain.com/management_tools/ home.asp).
12. Select a public company of interest. Using the compa- ny’s 2016 annual report, create three strategic financial objectives for 2017. For each objective, specify a strate- gic goal or target. The goals should be consistent with the company’s 2016 financial performance.
13. Distinguish between performance management and per- formance measurement.
14. Create a strategy for a hypothetical company, using the four perspectives of the BSC. Express the strategy as a series of strategic objectives. Produce a strategy map depicting the linkages among the objectives.
15. Compare and contrast the DMAIC model with the closed-loop processes of BPM.
16. Select two companies that you are familiar with. What terms do they use to describe their BPM initiatives and software suites? Compare and contrast their offerings in terms of BPM applications and functionality.
Q.1 Ans: Comparison between Data integration, Functional integration, and Application integration
Application Integration: Application Integration, as the term invariably implies, is the integration of business data and workflows from the disparate applications in an organization. In a digitally transformed business process, there ensues a compelling need for the on-premise and cloud applications to work together. Application integration is the exertion to effectuate seamless interoperability and data orchestration required for generating real-time insights.
Data Integration: Data integration is the practice of retrieving data from heterogeneous sources and combining the retrieved data to form a unified structure and view. Data integration resolves the complexity of merging distinct applications, to provide a collective outlook of the organization’s data assets, thereby enabling users to derive value from the consolidated interface. Data integration is classified into two broad practice areas, Analytic DI and Operational DI.
Analytic DI is applied to data warehousing (DW) and business intelligence (BI), while Operational DI supports the migration, consolidation, and synchronization of operational databases and also the exchange of data in a business-to-business circumstance.
Simply stated, data integration is the mechanism of integrating data between “databases” while application integration handles the integration of data between “applications.”
Functional integration: Functional integration is a collection of results in mathematics and physics where the domain of an integral is no longer a region of space, but a space of functions. Functional integrals arise in probability, in the study of partial differential equations, and in the path integral approach to the quantum mechanics of particles and fields.
In an ordinary integral (in the sense of Lebesgue integration) there is a function to be integrated (the integrand) and a region of space over which to integrate the function (the domain of integration). The process of integration consists of adding up the values of the integrand for each point of the domain of integration. Making this procedure rigorous requires a limiting procedure, where the domain of integration is divided into smaller and smaller regions. For each small region, the value of the integrand cannot vary much, so it may be replaced by a single value. In a functional integral the domain of integration is a space of functions. For each function, the integrand returns a value to add up. Making this procedure rigorous poses challenges that continue to be topics of current research.
Differences Between Application Integration and Data Integration
1. Efficacy Vs Efficiency
Data integration is a batch-oriented or a batch mode process otherwise can be termed as a scheduled procedure – it attends to the data at rest. Therefore it calls for a series of data-intensive operations such as manipulation, standardization, duplication, reconciliation, cleansing and as such, and takes place in interday or intraday batches. A data integration task can run once in a time period such as in an hour or day or even in a week, but cannot run multiple times in a single instance. The results of data integration provide an accurate vision of the business performance, oversight, anomalies and compliance; yet they do require a considerable amount of time for computing.
Application integration is the timely communication of live operational data between applications, at a real-time rate. The data merely changes hands between multiple independent applications with a bi-directional orientation, in a workflow type of functionality. The amount of data and time involved in an Application Integration task is quite modest, as it solely deals with the connection of applications at the workflow level, with data being a fulcrum of transference.
2. Transactional Vs Transformative
Like mentioned before, application integration is the timely “movement” of data between applications, it functions at a service level platform. The data flows between the applications through either a synchronous or asynchronous execution process. In short, application integration is about facilitating a business process that traverses across numerous independent applications, and proffers a level of abstraction from the basal applications and allied business processes. Since application integration involves “exchange” of data from one application to another, it is described to be transactional in nature.
Data integration, on the contrary is a transformative process. Data integration stemmed from the employment of relational databases and the requirement to move data between them. The primal objective of data integration is to create a data warehouse or colonize a data warehouse from various transactional systems. The data is extracted from relevant databases, combined together to form a unified structure of the amalgamation, transformed and loaded, for analysis. Data integration provides an abstraction layer from the underlying data sources and furthermore it is not only limited to intrinsic databases but can include extrinsic data within the sphere as well.
3. Point-to-Point Vs Compilation
Application Integration operates in a point-to-point architecture framework and this approach is purposive of synchronizing the associated applications at a real-time speed and maintaining this integration till the end of the event process.
Consider a P2P – Procure to Pay business process, where the organization purchases raw materials from its supplier. The P2P business process starts at the issuing of requisition order which is an internal request to purchase raw materials, and proceeds to the succeeding stages of creating a purchase order, receiving goods which involves the framing of documents such as advanced shipping notice and order confirmation, and finally the payment stage which includes generating an invoice to pay suppliers and updating the transaction in the accounting system.
This process spans multiple independent application systems, which may also span external sources, as P2P business process might require outsourcing. In this degree, where the events should take place in a sequential order of interdependence and where there is strict objection to overlapping, application integration pans out to be the ideal approach due to its point-to-point architecture model.
Data integration forces through the entire composition of clustered data and publishes only the relevant data to the user when needed. Large-sized organizations who possess scores of integrated applications will often find it difficult to access individual interfaces, data integration is beneficial for such instantaneous data access.
Integration technologies
Data integration architects develop data integration software programs and data integration platforms that facilitate an automated data integration process for connecting and routing data from source systems to target systems. This can be achieved through a variety of data integration techniques, including:
Extract, Transform and Load: copies of datasets
from disparate sources are gathered together, harmonized, and
loaded into a data warehouse or database
Extract, Load and Transform: data is loaded as is
into a big data system and transformed at a later time for
particular analytics uses
Change Data Capture: identifies data changes in
databases in real-time and applies them to a data warehouse or
other repositories
Data Replication: data in one database is
replicated to other databases to keep the information the
information synchronized to operational uses and for backup
Data Virtualization: data from different systems
are virtually combined to create a unified view rather than loading
data into a new repository
Streaming Data Integration: a real time data
integration method in which different streams of data are
continuously integrated and fed into analytics systems and data
stores
Data Integration Tools Features & Capabilities
Q.2: What is the architectural difference between a data warehouse, data mart, operational data store, and an enterprise data warehouse?
Ans: Data warehouse: A data warehouse is a repository that stores all of an organization’s current and historical data from disparate sources — it’s sometimes called a single source of truth. It’s a key component of a data analytics architecture that creates an environment for decision support, analytics, business intelligence (BI), and data mining.
Data Mart: A data mart is similar to a data warehouse, but it holds data only for a specific department or line of business, such as sales, finance, or human resources. A data warehouse can feed data to a data mart, or a data mart can feed a data warehouse.
Data warehouses and data marts hold structured data, and they’re associated with traditional schemas, which are the ways in which records are described and organized. Whichever repository they choose, businesses use an ETL tool to extract data from various sources and load it into the destination.
The differences between data warehouses and data marts
Data warehouse |
Data mart |
|
Objective |
Centralize data, become single source of truth across business |
Provide easy access to data for a department or specific line of business |
Uses |
Business-wide analysis |
Department-specific analysis |
Decision types |
Strategic decision-making |
Operational or tactical decision-making |
Scope |
Wide; contains data from all departments and lines of business |
Specific; individual data marts for individual departments |
Size |
Typically more than 100GB |
Less than 100GB |
Data held |
All organizational data |
Single business line |
Data sources |
Dozens or hundreds |
Typically just a few |
Time to implement |
Months to years (on-premises); days to weeks (cloud-based) |
Weeks to months (on-premises); days to weeks (cloud-based) |
Cost |
$100K+ (on-premises); on-demand pricing varies (SaaS) |
$10K (on-premises); on-demand pricing varies (SaaS) |
Operational data store, and an enterprise data warehouse
An operational data store (or "ODS") is used for operational reporting and as a source of data for the Enterprise Data Warehouse (EDW). It is a complementary element to an EDW in a decision support landscape, and is used for operational reporting, controls and decision making, as opposed to the EDW, which is used for tactical and strategic decision support.
An ODS is a database designed to integrate data from multiple sources for additional operations on the data, for reporting, controls and operational decision support. Unlike a production master data store, the data is not passed back to operational systems. It may be passed for further operations and to the data warehouse for reporting.
General use of Operational Data Store: The general purpose of an ODS is to integrate data from disparate source systems in a single structure, using data integration technologies like data virtualization, data federation, or extract, transform, and load (ETL). This will allow operational access to the data for operational reporting, master data or reference data management.
An ODS is not a replacement or substitute for a data warehouse or for a data hub but in turn could become a source.
Enterprise data warehouse, or EDW: An enterprise data warehouse (EDW) is a database, or collection of databases, that centralizes a business’s information from multiple sources and applications, and makes it available for analytics and use across the organization. EDWs can be housed in an on-premise server or in the cloud.
The data stored in this type of digital warehouse can be one of a business’s most valuable assets, as it represents much of what is known about the business, its employees, its customers, and more.
Q.3: What are the various types of metadata, and what roles do they play in the various integration exercises?
Answer: Metadata is simply defined as data about data. The data that is used to represent other data is known as metadata. For example, the index of a book serves as a metadata for the contents in the book. In other words, we can say that metadata is the summarized data that leads us to detailed data. In terms of data warehouse, we can define metadata as follows.
Note − In a data warehouse, we create metadata for the data names and definitions of a given data warehouse. Along with this metadata, additional metadata is also created for time-stamping any extracted data, the source of extracted data.
Categories of Metadata
Metadata can be broadly categorized into three categories −
Business Metadata: It has the data ownership information, business definition, and changing policies.
Technical Metadata: It includes database system names, table and column names and sizes, data types and allowed values. Technical metadata also includes structural information such as primary and foreign key attributes and indices.
Operational Metadata: It includes currency of data and data lineage. Currency of data means whether the data is active, archived, or purged. Lineage of data means the history of data migrated and transformation applied on it.
Role of Metadata
Metadata has a very important role in a data warehouse. The role of metadata in a warehouse is different from the warehouse data, yet it plays an important role. The various roles of metadata are explained below.
Q.4: What is the relationship between business processes and the various architecture for data warehousing?
Relationship between business processes and the various architecture for data warehousing
The business analyst get the information from the data warehouses to measure the performance and make critical adjustments in order to win over other business holders in the market. Having a data warehouse offers the following advantages −
To design an effective and efficient data warehouse, we need to understand and analyze the business needs and construct a business analysis framework. Each person has different views regarding the design of a data warehouse. These views are as follows −
Three-Tier Data Warehouse Architecture
Generally a data warehouses adopts a three-tier architecture. Following are the three tiers of the data warehouse architecture.
Bottom Tier: The bottom tier of the architecture is the data warehouse database server. It is the relational database system. We use the back end tools and utilities to feed data into the bottom tier. These back end tools and utilities perform the Extract, Clean, Load, and refresh functions.
Middle Tier: In the middle tier, we have the OLAP Server that can be implemented in either of the following ways.
By Relational OLAP (ROLAP), which is an extended relational database management system. The ROLAP maps the operations on multidimensional data to standard relational operations.
By Multidimensional OLAP (MOLAP) model, which directly implements the multidimensional data and operations.
Top-Tier: This tier is the front-end client layer. This layer holds the query tools and reporting tools, analysis tools and data mining tools.
Get Answers For Free
Most questions answered within 1 hours.