Database Systems: Design, Implementation, and Management Tenth Edition

Database Systems: Design, Implementation, and

Management Tenth Edition

Chapter 13 Business Intelligence and Data

Warehouses

Objectives

In this chapter, you will learn: • How business intelligence provides a

comprehensive business decision support framework

• About business intelligence architecture, its evolution, and reporting styles

• About the relationship and differences between operational data and decision support data

• What a data warehouse is and how to prepare data for one

Database Systems, 10th Edition 2

Objectives (cont’d.)

• What star schemas are and how they are constructed

• About data analytics, data mining, and predictive analytics

• About online analytical processing (OLAP) • How SQL extensions are used to support

OLAP-type data manipulations

Database Systems, 10th Edition 3

The Need for Data Analysis

• Managers track daily transactions to evaluate how the business is performing

• Strategies should be developed to meet organizational goals using operational databases

• Data analysis provides information about short- term tactical evaluations and strategies

Database Systems, 10th Edition 4

Business Intelligence

• Comprehensive, cohesive, integrated tools and processes – Capture, collect, integrate, store, and analyze

data

– Generate information to support business decision making

• Framework that allows a business to transform: – Data into information

– Information into knowledge

– Knowledge into wisdom Database Systems, 10th Edition 5

Business Intelligence Architecture

• Composed of data, people, processes, technology, and management of components

• Focuses on strategic and tactical use of information

• Key performance indicators (KPI) – Measurements that assess company’s

effectiveness or success in reaching goals

• Multiple tools from different vendors can be integrated into a single BI framework

Database Systems, 10th Edition 6

Database Systems, 10th Edition 7

Business Intelligence Benefits

• Main goal: improved decision making • Other benefits

– Integrating architecture

– Common user interface for data reporting and analysis

– Common data repository fosters single version of company data

– Improved organizational performance

Database Systems, 10th Edition 8

Business Intelligence Evolution

Database Systems, 10th Edition 9

Database Systems, 10th Edition 10

Business Intelligence Technology Trends

• Data storage improvements • Business intelligence appliances • Business intelligence as a service • Big Data analytics • Personal analytics

Database Systems, 10th Edition 11

Decision Support Data

• BI effectiveness depends on quality of data gathered at operational level

• Operational data seldom well-suited for decision support tasks

• Need reformat data in order to be useful for business intelligence

Database Systems, 10th Edition 12

Operational Data vs. Decision Support Data

• Operational data – Mostly stored in relational database – Optimized to support transactions representing

daily operations

• Decision support data differs from operational data in three main areas: – Time span

– Granularity

– Dimensionality

Database Systems, 10th Edition 13

Database Systems, 10th Edition 14

Decision Support Database Requirements

• Specialized DBMS tailored to provide fast answers to complex queries

• Three main requirements – Database schema

– Data extraction and loading

– Database size

Database Systems, 10th Edition 15

Decision Support Database Requirements (cont’d.)

• Database schema – Complex data representations – Aggregated and summarized data – Queries extract multidimensional time slices

• Data extraction and filtering – Supports different data sources

• Flat files • Hierarchical, network, and relational databases • Multiple vendors

– Checking for inconsistent data Database Systems, 10th Edition 16

Decision Support Database Requirements (cont’d.)

• Database size – In 2005, Wal-Mart had 260 terabytes of data in

its data warehouses

– DBMS must support very large databases (VLDBs)

Database Systems, 10th Edition 17

The Data Warehouse

• Integrated, subject-oriented, time-variant, and nonvolatile collection of data – Provides support for decision making

• Usually a read-only database optimized for data analysis and query processing

• Requires time, money, and considerable managerial effort to create

Database Systems, 10th Edition 18

Database Systems, 10th Edition 19

Data Marts

• Small, single-subject data warehouse subset • More manageable data set than data

warehouse • Provides decision support to small group of

people • Typically lower cost and lower implementation

time than data warehouse

Database Systems, 10th Edition 20

Twelve Rules That Define a Data Warehouse

Database Systems, 10th Edition 21

Star Schemas

• Data-modeling technique – Maps multidimensional decision support data

into relational database

• Creates near equivalent of multidimensional database schema from relational data

• Easily implemented model for multidimensional data analysis while preserving relational structures

• Four components: facts, dimensions, attributes, and attribute hierarchies

Database Systems, 10th Edition 22

Facts

• Numeric measurements that represent specific business aspect or activity – Normally stored in fact table that is center of star

schema

• Fact table contains facts linked through their dimensions

• Metrics are facts computed at run time

Database Systems, 10th Edition 23

Dimensions

• Qualifying characteristics provide additional perspectives to a given fact

• Decision support data almost always viewed in relation to other data

• Study facts via dimensions • Dimensions stored in dimension tables

Database Systems, 10th Edition 24

Attributes

• Use to search, filter, and classify facts • Dimensions provide descriptions of facts

through their attributes • No mathematical limit to the number of

dimensions • Slice and dice: focus on slices of the data cube

for more detailed analysis

Database Systems, 10th Edition 25

Attribute Hierarchies

• Provide top-down data organization • Two purposes:

– Aggregation

– Drill-down/roll-up data analysis

• Determine how the data are extracted and represented

• Stored in the DBMS’s data dictionary • Used by OLAP tool to access warehouse

properly

Database Systems, 10th Edition 26

Star Schema Representation

• Facts and dimensions represented in physical tables in data warehouse database

• Many fact rows related to each dimension row – Primary key of fact table is a composite primary

key

– Fact table primary key formed by combining foreign keys pointing to dimension tables

• Dimension tables are smaller than fact tables • Each dimension record is related to thousands

of fact records Database Systems, 10th Edition 27

Performance-Improving Techniques for the Star Schema

• Four techniques to optimize data warehouse design: – Normalizing dimensional tables

– Maintaining multiple fact tables to represent different aggregation levels

– Denormalizing fact tables

– Partitioning and replicating tables

Database Systems, 10th Edition 28

Performance-Improving Techniques for the Star Schema (cont’d.)

• Dimension tables normalized to: – Achieve semantic simplicity – Facilitate end-user navigation through the

dimensions

• Denormalizing fact tables improves data access performance and saves data storage space

• Partitioning splits table into subsets of rows or columns

• Replication makes copy of table and places it in different location

Database Systems, 10th Edition 29

Data Analytics

• Subset of BI functionality • Encompasses a wide range of mathematical,

statistical, and modeling techniques – Purpose of extracting knowledge from data

• Tools can be grouped into two separate areas: – Explanatory analytics

– Predictive analytics

Database Systems, 10th Edition 30

Data Mining

• Data-mining tools do the following: – Analyze data – Uncover problems or opportunities hidden in

data relationships

– Form computer models based on their findings – Use models to predict business behavior

• Runs in two modes – Guided

– Automated

Database Systems, 10th Edition 31

Database Systems, 10th Edition 32

Predictive Analytics

• Employs mathematical and statistical algorithms, neural networks, artificial intelligence, and other advanced modeling tools

• Create actionable predictive models based on available data

• Models are used in areas such as: – Customer relationships, customer service,

customer retention, fraud detection, targeted marketing, and optimized pricing

Database Systems, 10th Edition 33

Online Analytical Processing

• Three main characteristics: – Multidimensional data analysis techniques – Advanced database support

– Easy-to-use end-user interfaces

Database Systems, 10th Edition 34

Multidimensional Data Analysis Techniques

• Data are processed and viewed as part of a multidimensional structure

• Augmented by the following functions: – Advanced data presentation functions

– Advanced data aggregation, consolidation, and classification functions

– Advanced computational functions

– Advanced data modeling functions

Database Systems, 10th Edition 35

Advanced Database Support

• Advanced data access features include: – Access to many different kinds of DBMSs, flat

files, and internal and external data sources

– Access to aggregated data warehouse data

– Advanced data navigation – Rapid and consistent query response times

– Maps end-user requests to appropriate data source and to proper data access language

– Support for very large databases

Database Systems, 10th Edition 36

Easy-to-Use End-User Interface

• Advanced OLAP features are more useful when access is simple

• Many interface features are “borrowed” from previous generations of data analysis tools – Already familiar to end users

– Makes OLAP easily accepted and readily used

Database Systems, 10th Edition 37

OLAP Architecture

• Three main architectural components: – Graphical user interface (GUI) – Analytical processing logic

– Data-processing logic

Database Systems, 10th Edition 38

OLAP Architecture (cont’d.)

• Designed to use both operational and data warehouse data

• In most implementations, data warehouse and OLAP are interrelated and complementary

• OLAP systems merge data warehouse and data mart approaches

Database Systems, 10th Edition 39

Database Systems, 10th Edition 40

Relational OLAP

• Relational online analytical processing (ROLAP) provides the following extensions: – Multidimensional data schema support within the

RDBMS

– Data access language and query performance optimized for multidimensional data

– Support for very large databases (VLDBs)

Database Systems, 10th Edition 41

Multidimensional OLAP

• Multidimensional online analytical processing (MOLAP) extends OLAP functionality to multidimensional database management systems (MDBMSs) – MDBMS end users visualize stored data as a 3D

data cube

– Data cubes can grow to n dimensions, becoming hypercubes

– To speed access, data cubes are held in memory in a cube cache

Database Systems, 10th Edition 42

Relational vs. Multidimensional OLAP

• Selection of one or the other depends on evaluator’s vantage point

• Proper evaluation must include supported hardware, compatibility with DBMS, etc.

• ROLAP and MOLAP vendors working toward integration within unified framework

• Relational databases use star schema design to handle multidimensional data

Database Systems, 10th Edition 43

Database Systems, 10th Edition 44

SQL Extensions for OLAP

• Proliferation of OLAP tools fostered development of SQL extensions

• Many innovations have become part of standard SQL

• All SQL commands will work in data warehouse as expected

• Most queries include many data groupings and aggregations over multiple columns

Database Systems, 10th Edition 45

The ROLLUP Extension

• Used with GROUP BY clause to generate aggregates by different dimensions

• GROUP BY generates only one aggregate for each new value combination of attributes

• ROLLUP extension enables subtotal for each column listed except for the last one – Last column gets grand total

• Order of column list important

Database Systems, 10th Edition 46

The CUBE Extension

• CUBE extension used with GROUP BY clause to generate aggregates by listed columns – Includes the last column

• Enables subtotal for each column in addition to grand total for last column – Useful when you want to compute all possible

subtotals within groupings

• Cross-tabulations are good candidates for application of CUBE extension

Database Systems, 10th Edition 47

Materialized Views

• A dynamic table that contains SQL query command to generate rows – Also contains the actual rows

• Created the first time query is run and summary rows are stored in table

• Automatically updated when base tables are updated

Database Systems, 10th Edition 48

Summary

• Business intelligence generates information used to support decision making

• BI covers a range of technologies, applications, and functionalities

• Decision support systems were the precursor of current generation BI systems

• Operational data not suited for decision support

Database Systems, 10th Edition 49

Summary (cont’d.)

• Data warehouse provides support for decision making – Usually read-only

– Optimized for data analysis, query processing

• Star schema is a data-modeling technique – Maps multidimensional decision support data

into a relational database

• Star schema has four components: – Facts, dimensions, attributes, and attribute

hierarchies Database Systems, 10th Edition 50

Summary (cont’d.)

• Data analytics – Provides advanced data analysis tools to extract

knowledge from business data

• Data mining – Automates the analysis of operational data to

find previously unknown data characteristics, relationships, dependencies, and trends

• Predictive analytics – Uses information generated in the data-mining

phase to create advanced predictive models

Database Systems, 10th Edition 51

Summary (cont’d.)

• Online analytical processing (OLAP) – Advanced data analysis environment that

supports decision making, business modeling, and operations research

• SQL has been enhanced with extensions that support OLAP-type processing and data generation

Database Systems, 10th Edition 52

  • Database Systems: Design, Implementation, and Management Tenth Edition
  • Objectives
  • Objectives (cont’d.)
  • The Need for Data Analysis
  • Business Intelligence
  • Business Intelligence Architecture
  • PowerPoint Presentation
  • Business Intelligence Benefits
  • Business Intelligence Evolution
  • Slide 10
  • Business Intelligence Technology Trends
  • Decision Support Data
  • Operational Data vs. Decision Support Data
  • Slide 14
  • Decision Support Database Requirements
  • Decision Support Database Requirements (cont’d.)
  • Slide 17
  • The Data Warehouse
  • Slide 19
  • Data Marts
  • Twelve Rules That Define a Data Warehouse
  • Star Schemas
  • Facts
  • Dimensions
  • Attributes
  • Attribute Hierarchies
  • Star Schema Representation
  • Performance-Improving Techniques for the Star Schema
  • Performance-Improving Techniques for the Star Schema (cont’d.)
  • Data Analytics
  • Data Mining
  • Slide 32
  • Predictive Analytics
  • Online Analytical Processing
  • Multidimensional Data Analysis Techniques
  • Advanced Database Support
  • Easy-to-Use End-User Interface
  • OLAP Architecture
  • OLAP Architecture (cont’d.)
  • Slide 40
  • Relational OLAP
  • Multidimensional OLAP
  • Relational vs. Multidimensional OLAP
  • Slide 44
  • SQL Extensions for OLAP
  • The ROLLUP Extension
  • The CUBE Extension
  • Materialized Views
  • Summary
  • Summary (cont’d.)
  • Slide 51
  • Slide 52
X
Get a Price Quote