What is Data Warehouse?

Data Warehouse

  • A data warehouse is a central repository of data that is designed for reporting and data analysis rather than for transaction processing
  •  It usually contains data from OLTP applications and other diverse data sources.
  • It separates analysis workload from transaction workload and enables an organization to consolidate data from several sources.
  • Data warehouse environment includes ETL solution, an online analytical processing (OLAP) engine, client analysis tools, and other applications that manage the process of gathering data and delivering it to business users.

Data warehouses vs. OLTP

  • Data warehouses are not usually in 3NF, while in OLTP environments the data is usually normalized.

  • A data warehouse is updated on a regular basis by the ETL process (run nightly or weekly) using bulk data modification techniques. The end users of a data warehouse do not directly update the data warehouse.
  • In OLTP systems, end users routinely issue individual data modification statements to the database. The OLTP database is always up to date, and reflects the current state of each business transaction.
  • A data warehouse often uses star schema design (Denormalized), while OLTP systems often use fully normalized schemas.
  • Data warehouses usually store many months or years of data. This is to support historical analysis, while OLTP systems usually store data from only a few weeks or months.


Why do we need a data warehouse?

 
Data warehouses are used extensively in the largest and most complex businesses around the world. In demanding situations, good decision making becomes critical. Significant and relevant data is required to make decisions. This is possible only with the help of a well-designed data warehouse.
 

Enhancing the turnaround time for analysis and reporting:

 
Data warehouse allows business users to access critical data from a single source enabling them to take quick decisions. They need not waste time retrieving data from multiple sources.
The business executives can query the data themselves with minimal or no support from IT which in turn saves money and time.
 

Improved Business Intelligence:

 
Data warehouse helps in achieving the vision for the managers and business executives. Outcomes that affect the strategy and procedures of an organization will be based on reliable facts and supported with evidence and organizational data.
 

Benefit of historical data:

 
Transactional data stores data on a day to day basis or for a very short period of duration without the inclusion of historical data. In comparison, a data warehouse stores large amounts of historical data which enables the business to include time-period analysis, trend analysis, and trend forecasts.
 

Standardization of data:

The data from heterogeneous sources are available in a single format in a data warehouse. This simplifies the readability and accessibility of data. For example, gender is denoted as Male/ Female in Source 1 and m/f in Source 2 but in a data warehouse the gender is stored in a format which is common across all the businesses i.e. M/F.

 

Immense ROI (Return On Investment): 

 
Return On Investment refers to the additional revenues or reduces expenses a business will be able to realize from any project. According to a 2002 International Data Corporation (IDC) study “The Financial Impact of Business Analytics”, analytics projects have been achieving a substantial impact on a business’ financial status.


Components of Data warehouse

 

Data Sources

A flat file database stores data in a normal text format. Contrary to a relational database where the data is stored in the form of tables, in a flat file database the data stored does not have a folders or paths related to them. No manipulations are performed on the data. Delimiters are used in flat files to separate the data columns.
Excel spreadsheets are regularly used in data warehousing operations. They are impressive, low-priced, and flexible tolls that many decision-makers find convenient to use. Excel also provides graphing features that allow the end-user to present the required data in chart and graph formats. These formats can be easily integrated into MS Word and Power Point presentations.
Operational systems of a business contain the day to day transactions of the data at a low-level. For example, the sales data, HR data, marketing data are used as input sources for a data warehouse.
Legacy systems are the applications of the yesteryear. They mirror the requirements of a business that might be twenty to twenty five year old. They are use till date since over years these systems have captured the business knowledge and rules that are exceptionally difficult to translate to a new platform/application.


Staging Area

 
The first part of the staging area is the most challenging process of extraction. Depending on how accurately the data is extracted the subsequent operations succeed or fail. The source systems might be complicated or poorly documented due to which the process becomes all the more difficult. The data may be extracted not only once but also periodically when changes occur at the source side.
The second stage is the transformation where the data is converted from one format to another. Since data often exists in different locations and formats across the enterprises, data conversion is mandatory to ensure that data from one application is comprehensible to other applications and databases.
The third stage is the loading where the extracted and transformed data is loaded into a data mart or a data warehouse depending on the business. The populated data is used for presentation applications by the end users.


Data Repository

 
The data is loaded into a data warehouse in the form of facts and dimensions
 

Users

 
The loaded data is accessed for reporting, analysis, and mining. The reporting tools like Business Objects and Cognos are used by users to generate reports. The data is also used for predicting trends