Learn
more about Data Load Statistics for Siebel Analytics Author:
Victor Castellá Introduction Database
statistics are considered a very important task to improve query performance.
This article gives an overview and tips to provide a better understanding of Data
Load Statistics (DLS). Statistics are usually updated on a daily, weekly or longer
scale, depending on database activity (inserts/deletes) and the configuration
specified by Database Administrator (DBA). The Data Warehouse Application Console
(DAC) updates statistics during the extract, tranform and load (ETL) processes.
Analyze
Tables The task that updates the statistics in the DAC is named Analyze
Tables. DAC allows enabling or disabling this functionality through the System
Properties panel. See the parameter in the following screenshot.

The
DAC updates the statistics of the Data Warehouse while the ETL processes are executed.
Every time a dimension or a fact table is loaded, DAC launches a task that refers
to the table loaded and updates the statistics.
For tables where the data
load is significant, the Analyze Table task may take a lot of time to execute.
The efficiency of this task also depends on the specifications of the hardware
where the Data Warehouse (DW) database is located. If the DW is Oracle, it is
possible to modify parameters within the DAC setup to make the statistics creation
process more efficient. SQL command Once executed,
the Analyze Tables task launches a SQL command against the database. The DAC obtains
this SQL command from a XML file, which means that it can be modified. The
XML file is located in the following folder: X:\SiebelAnalytics\DAC\CustomSQLs\customsql.xml The
entries related to the statistics are the following: <SqlQuery
name = "ORACLE_ANALYZE_TABLE" STORED_PROCEDURE = "TRUE"> DBMS_STATS.GATHER_TABLE_STATS(ownname
=> '@TABLEOWNER', tabname => '%1', estimate_percent => 30, method_opt
=> 'FOR ALL COLUMNS SIZE AUTO',cascade => true ) </SqlQuery> <SqlQuery
name = "ORACLE_ANALYZE_TABLE_70" STORED_PROCEDURE = "FALSE"> ANALYZE
TABLE %1 COMPUTE STATISTICS </SqlQuery> <SqlQuery
name = "MSSQL_ANALYZE_TABLE"> UPDATE STATISTICS %1 </SqlQuery> <SqlQuery
name = "DB2_ANALYZE_TABLE" STORED_PROCEDURE = "TRUE"> SIEBSTAT
('@TABLEOWNER', '%1', 'SQL_STATS_ALL') </SqlQuery> For
Oracle databases, DAC uses the GATHER_TABLE_STATS, which is part of the package
DBMS_STATS. The method used by default requires a lot of resources to execute
the statistics, as it generates statistics using histogram models. This method
generates the most complete statistics, although in the case huge volumes of data
in the Data Warehouse, this process can exceed the maximum times established to
execute the ETLs. TIP: The parameter estimate_percent
defines the amount of data that is going to be analyzed. By reducing this value,
the Analyze Table task will take less time. This is a trick that can be very useful,
especially when facing very long ETL processes.
SQL example <SqlQuery
name = "ORACLE_ANALYZE_TABLE" STORED_PROCEDURE = "TRUE"> DBMS_STATS.GATHER_TABLE_STATS(ownname
=> '@TABLEOWNER', tabname => '%1', estimate_percent => 10, method_opt
=> 'FOR ALL COLUMNS SIZE AUTO',cascade => true ) </SqlQuery>
An
alternative to reduce the ETL processes is to disable the Analyze Tables option
(as mentioned before). Once the DW is loaded, the statistics can be manually executed
against the database.
Conclusion As conclusion we know that the
DAC is updating the statistics while the ETL are executing. It can enlarge too
much the ETL processes in big databases. In the case of Oracle where the statistics
are more complex, we know now that there is the option to reduce the percentatge
of data to be analyzed. Finally there is also the option to disble the Analyze
Tables functionality from the DAC application and then launch this manually when
the ETL has finished.
Go
to top
|