Imagine that you are talking with a business colleague, who describes a project he or she is working on at his or her organization—a data mining project for the marketing and sales department, which is intended to measure the impact of advertising spending. Your colleague claims that he or she considers his or her involvement is "almost over" because "the IT team is ready to start the ETL process. Now that the data sources are clearly identified, I am done, right? Besides, this part is just cut and paste, and start producing the business intelligence!" Explain why your colleague should plan to stay more involved with this project. You may want to explain the kinds of problems could arise and in what ways ETL operations sometimes experience problems with data pollution. Explain how data modeling through ETL will help in identifying poor quality data.
ETL is the process of extract, transform and load data in the system. ETL is one of the most crucial stages where data quality needs to be monitored. This is because an adulteration in data quality can hamper the processing and may increase business costs.
Some of the ways in which data quality problems may happen at the ETL stage:
Data modeling and data profiling helps in identifying the structure of data and clustering alike data at the same spot. This way data quality can be effectively maintained across the ETL process.
Get Answers For Free
Most questions answered within 1 hours.