Skip to main content

Best practices for DB2 database index design




Index (Index) is a very important concept in relational databases . In general, index will improve query performance. For a database administrator (DBA), creating indexes for a database is a very important part of their work . Generally speaking, the index design is based on the structure of the table in the database or the logical relationship between the tables. For example, the primary-key of each table is actually an index, and the employee ID column in the EMP table that records employee information is usually also indexed. But experienced database administrators know that designing appropriate indexes is not a simple matter. In the actual production environment, program developers who write applications often do not understand database design as database administrators do, so a common situation is that SQL query statements written by program developers may not be able to use database administrators to create The index, resulting in slow database response and poor SQL query performance. If this happens, if you want to improve query performance, a feasible way is for the database administrator to analyze the SQL query statement and design a new index for it. How to analyze SQL query statements and design the most suitable index? This article will give some best practices to solve this problem.


It should be pointed out that the more index is not the better, because the index itself also needs to take up a certain amount of space in the database, and the operation needed to maintain the index in the database will also bring additional performance overhead, so in individual cases, inappropriate indexes It may even cause a loss in database performance. Another situation is that even if the database administrator establishes an index based on the SQL statement in the business logic, the database may not use that index at runtime. This is a more complicated situation. Because of this, database administrators are often very cautious when creating new indexes for the database to avoid results that outweigh the gains. DB2 database takes this into consideration, so it introduces the concept of virtual index (Virtual index). Through this virtual index mechanism, database administrators can verify the performance of a new index after designing it, and then build a real index if the performance is improved to achieve the desired effect. This article will also briefly introduce this mechanism in the DB2 database.


 


 


The database index is a sorted data structure in the database to assist in quick query and update of data in the database table. The index is like a catalog of books, you can quickly locate the content of the book. Establishing appropriate indexes can improve query performance hundreds to thousands of times. Let's take a look at how indexes do this in a DB2 database.


Here is a SQL statement of a customer management software. Access the customer table CUSTOMER, hoping to return the customer data whose username is MARIA.


SELECT * FROM CUSTOMER WHERE C_FIRST_NAME ='MARIA'


The definition of the CUSTOMER table is as follows:


CREATE TABLE CUSTOMER ( 

  C_CUSTOMER_ID  CHAR(16) NOT NULL, 

  C_FIRST_NAME  CHAR(20) NOT NULL, 

  C_LAST_NAME  CHAR(30) NOT NULL, 

  C_BIRTH_DAY  INTEGER NOT NULL, 

  C_BIRTH_MONTH  INTEGER NOT NULL, 

  C_BIRTH_YEAR  INTEGER NOT NULL, 

  …

  )


The table data of the database is stored on the disk. Without an index, DB2 will read all the data pages of the table accordingly, find and return the record whose user name is MARIA. This way of querying the database is called a full table scan (TB-SCAN).



Figure 1. Full table scan

Figure 1. Full table scan

If there is a suitable index on the CUSTOMER table, the database can speed up the query process through the index. The index in the DB2 database uses a B+ tree structure, and the leaf nodes of the index contain the value of the index key and a pointer to the data address. DB2 first queries the index, and then directly accesses the data page of the table through the pointer recorded in the index. Doing so does not need to access all data pages of the CUSTOMER table, only a small amount of data pages can be accessed to return the query results, which greatly accelerates the query speed.


So what is the process of DB2 scanning index? The process of scanning the index can be divided into three categories, matching index scan, non-matching index scan, and index only scan.


Assume that the following index CUSTOMER_IDX_01 exists in the CUSTOMER table:


CREATE INDEX CUSTOMER_IDX_01 ON CUSTOMER 

  (C_FIRST_NAME ASC)


For the previous SQL statement, the predicate C_FIRST_NAME ='MARIA' specifies the value of the C_ FIRST _NAME column, so DB2 can use the index CUSTOMER_IDX_01 to directly locate the leaf node, and then access the corresponding data page of the table. This type of index access is called a matching index scan.



Figure 2. Matching index scan

Figure 2. Matching index scan

Look at another SQL statement:


SELECT * 

 FROM DB2ADMIN.CUSTOMER 

 WHERE C_FIRST_NAME = 'MARIA' 

AND C_BIRTH_YEAR = 1977


Assume that there is only the following index CUSTOMER_IDX_02 on the CUSTOMER table:


CREATE INDEX CUSTOMER_IDX_02 ON CUSTOMER"

  (C_FIRST_NAME ASC,   

   C_LAST_NAME ASC, 

   C_BIRTH_YEAR ASC)


When DB2 accesses the index, it will do the following steps:


First use the first key of the index C_FIRST_NAME to limit the range C_FIRST_NAME ='MARIA',

For another predicate C_BIRTH_YEAR = 1977, because C_BIRTH_YEAR is the third key of the index, DB2 cannot directly find the corresponding index leaf node based on it, but can only scan and select from all index leaf nodes that meet the condition C_FIRST_NAME='MARIA' C_BIRTH_YEAR = leaf node of 1977. (This step is called Screening, or "screening").

This access method that needs to scan the filter conditions one by one from the index leaf nodes is called a mismatch index scan.



Figure 3. Unmatched index scan

Figure 3. Unmatched index scan

Look at another example, given the following SQL statement:


SELECT C_BIRTH_YEAR 

 FROM CUSTOMER 

 WHERE C_BIRTH_YEAR>1977


And suppose there is only index CUSTOMER_IDX_03 on the CUSTOMER table:


CREATE INDEX CUSTOMER_IDX_03 ON CUSTOMER 

  (C_BIRTH_YEAR ASC)


It can be seen that the data returned by the query is exactly the index key in CUSTOMER_IDX_03. At this time, DB2 does not need to access the data page of the table on the disk, and only needs to scan the index to get the value of the corresponding column. This access method is to do index scan only.



Figure 4. Index scan only

Figure 4. Index scan only

Design index first step


As mentioned before, if we want to improve the query performance of a SQL statement by creating a new index, we need to analyze the semantic characteristics of this SQL statement. Before starting to analyze SQL, you need to understand a noun Boolean-term (which can be translated as "Boolean term").


The concept of Boolean-term


Boolean-term is a kind of predicate in the WHERE clause of the SQL statement. If the value of the predicate is False, then the return value of the entire WHERE clause is false. In other words, Boolean-term is like a switch that can control the entire WHERE clause. For each data record (Row) processed in the database table, once the data record does not meet the judgment conditions of Boolean-term, then this The data record is considered to not satisfy the judgment condition of the entire WHERE clause. The following uses specific SQL examples to illustrate this concept.


In the following SQL statement:


SELECT * FROM EMP 

 WHERE WORKDEPT = 'A00'

 AND (SALARY > 40000 OR BONUS > 800)


You can see three basic predicates: WORKDEPT ='A00', SALARY> 40000 and BONUS> 800. Among them, only WORKDEPT ='A00' satisfies the definition of Boolean-term, which means that for each data record in the EMP table, if it does not meet the condition of WORKDEPT ='A00', then it does not meet the conditions of the entire WHERE clause , So it will not be returned as the result of this SQL query. For the predicate SALARY> 40000, even if a data record does not meet this condition, the record may be returned as a query result because it meets BONUS> 800 and WORKDEPT ='A00, so the predicate SALARY> 40000 is not a Boolean-term. Similarly, the predicate BONUS> 800 is not a Boolean-term. Through this example, we can see that in the SQL form, Boolean-term cannot appear in the predicate connected by OR, it must be connected with other predicates by AND.


The reason why Boolean-term is introduced here is because it is an important concept closely related to index access in DB2 databases. In the above example, because SALARY> 40000 and BONUS> 800 are not Boolean-term, even if there is an index including SALARY column or BONUS column, DB2 will not select this index for index matching scan (Matching index scan) . Because of this, when designing a new index, we don't need to consider this kind of predicate that is not a Boolean-term. For the sake of strictness, what we need to explain here is that for non-Boolean-term predicates such as (SALARY> 40000 OR BONUS> 800) in this example, DB2 can use multi-index access to simultaneously Scan data with multiple indexes. Multi-index scanning and the corresponding multi-index design are more complex technologies than the access and design of a single index, and this article will not discuss them here. The content involved in this article is only for the design of a single index, so the WHERE clause in the SQL statement that appears in this article will not appear with the predicate connected by OR.


Determine the candidate index key from the predicate


The most basic step of analyzing SQL statements is to find all Indexable predicates in all Boolean-terms of the WHERE clause, and design index keys according to the columns referenced therein. As the name suggests, Indexable predicate means "a predicate that can use an index". Logically speaking, according to the conditions given in this predicate, the DB2 database can use index access to quickly find one or more matching records in the index tree. It should be noted that the concept of Indexable predicates is concerned with the writing of the predicate itself that makes it possible to access data through the index, and it does not guarantee that the appropriate index exists in the database, nor does it guarantee that the DB2 database will definitely be available at runtime The data that meets the condition of this predicate is filtered through index access; but conversely, if a predicate is not in the form of Indexable, then the database must not be able to filter the data that meets the condition through the index. In other words, "predicate is in the form of Indexable " is a necessary but insufficient condition for "database can use index access to filter data". So what kind of predicate is "predicate that can use index"? Table 1 lists the form of each predicate and indicates whether it belongs to Indexable predicate.



Table 1. Indexable predicate table

Predicate type Indexable Predicate type Indexable Predicate type Indexable

COL = value Y COL IN (cor subq) Y COL on (noncor subq) Y

COL = noncol expr Y (COL1,...COLn) IN (cor subq) N COL op ANY (noncor subq) Y

COL IS NULL Y COL NOT IN (cor subq) N COL op ALL (noncor subq) Y

COL on value Y (COL1,...COLn) NOT IN (cor subq) N COL <> (noncor subq) N

COL on noncol expr Y COL IS DISTINCT FROM value N COL <> ANY (noncor subq) N

COL BETWEEN value1 AND value2 Y COL IS NOT DISTINCT FROM value Y COL <> ALL (noncor subq) N

COL BETWEEN noncol expr1 AND noncol expr2 Y COL IS DISTINCT FROM noncol expr N COL IN (noncor subq) Y

value BETWEEN COL1 AND COL2 N COL IS NOT DISTINCT FROM noncol expr Y (COL1,...COLn) IN (noncor subq) Y

COL BETWEEN COL1 AND COL2 N T1.COL1 IS DISTINCT FROM T2.COL2 N COL NOT IN (noncor subq) N

COL BETWEEN expression1 AND expression2 Y T1.COL1 IS NOT DISTINCT FROM T2.COL2 N (COL1,...COLn) NOT IN (noncor subq) N

COL LIKE 'pattern' Y T1.COL1 IS DISTINCT FROM T2 col expr N COL = (cor subq) N

COL IN (list) Y T1.COL1 IS NOT DISTINCT FROM T2 col expr Y COL = ANY (cor subq) Y

COL <> value N COL IS DISTINCT FROM (noncor subq) N COL = ALL (cor subq) N

COL <> noncol expr N COL IS NOT DISTINCT FROM (noncor subq) Y COL on (cor subq) N

COL IS NOT NULL Y COL IS DISTINCT FROM ANY (noncor subq) N COL op ANY (cor subq) N

COL NOT BETWEEN value1 AND value2 N COL IS NOT DISTINCT FROM ANY (noncor subq) N COL op ALL (cor subq) N

COL NOT BETWEEN noncol expr1 AND noncol expr2 N COL IS DISTINCT FROM ALL (noncor subq) N COL <> (cor subq) N

value NOT BETWEEN COL1 AND COL2 N COL IS NOT DISTINCT FROM ALL (noncor subq) N COL <> ANY (cor subq) N

COL NOT IN (list) N COL IS NOT DISTINCT FROM (cor subq) N COL <> ALL (cor subq) N

COL NOT LIKE ' char' N COL IS DISTINCT FROM ANY (cor subq) N XMLEXISTS Y

COL LIKE '%char' N COL IS DISTINCT FROM ANY (cor subq) N NOT XMLEXISTS N

COL LIKE '_char' N COL IS NOT DISTINCT FROM ANY (cor subq) N COL = ANY (noncor subq) Y

COL LIKE host variable Y COL IS DISTINCT FROM ALL (cor subq) N COL = ALL (noncor subq) N

T1.COL = T2 col expr Y COL IS NOT DISTINCT FROM ALL (cor subq) N expression op (subq) N

T1.COL on T2 col expr Y EXISTS (subq) N COL=(noncor subq) Y

T1.COL <> T2 col expr N NOT EXISTS (subq) N expression op value N

T1.COL1 = T1.COL2 N expression = value N T1.COL1 <> T1.COL2 N

T1.COL1 on T1.COL2 N expression <> value N  


After analyzing all the predicates that can be used in the Boolean-term in the SQL statement, the index can be designed according to the columns in these predicates. Take the following SQL statement as an example:


SELECT C_ COMMENT 

 FROM CUSTOMER 

 WHERE C_ ACCTBAL > 10000 

  AND UCASE(C_NAME)= ’ IBM ’

AND C_CUSTKEY < > C_NATIONKEY 

AND C_MKTSEGMENT = ‘ CHINA 

AND C_PHONE LIKE ‘ 135010% ’

AND C_ ADDRESS= ’ BEIJING ’


According to Table 1, C_ ACCTBAL, C_MKTSEGMENT, C_PHONE and C_ ADDRESS are all candidates for suitable index columns. If you want to design a single-key index (Single-key Index), any of them can form an index; if you want to design more Key index (Multiple-keys Index), the order between them is the next issue that needs to be considered. For detailed discussion, see "Selection of Index Key Order" below. Correspondingly, C_NAME, C_CUSTKEY and C_NATIONKEY are not qualified index column candidates.


Index key sequence selection


After determining the keys required for indexing, the next step is to consider how to determine the order of these keys. For this, you can refer to the following two rules:


1. Put the stop-matching predicate at the end of the index


If all the predicates are =, then access to this index can always be accessed by index matching; but when it contains range operation predicates such as <, >, LIKE, only the first range operation predicate can perform index matching access, and then All predicates, even the predicates of =, can only perform non-matching access operations. We call this kind of predicate the stop matching predicate. Obviously, we want to perform more index matching access operations, so we must put all the predicates that stop matching at the end of the index.


For example, for the following SQL statement:


Select C_ COMMENT 

 From CUSTOMER 

 Where C_ ACCTBAL > 10000 

AND UCASE(C_NAME)= ’ IBM ’

AND C_CUSTKEY < > C_NATIONKEY 

AND C_MKTSEGMENT = ‘ CHINA 

AND C_PHONE LIKE ‘ 135010% ’

AND C_ ADDRESS= ’ BEIJING ’


In this query, the three predicates on C_ ACCTBAL, C_CUSTKEY, and C_PHONE are all range operation predicates, which means that they are all stop matching predicates. We should put them at the end of the index when designing. For this query, a possible index design is (C_MKTSEGMENT, C_ ADDRESS, C_ ACCTBAL).


2. Sort keys in descending order of column cardinality


In DB2, there is a concept of filter factor, which is the ratio of output data to input data after DB2 applies a predicate. The filter factor is a decimal between 0 and 1. DB2 will try to select the predicate with a low filter factor first, so that the amount of data that needs to be processed in subsequent operations will be relatively small. When an index can provide a smaller filter factor, DB2 will give priority to it. So we also need to put the predicate with a small filter factor in front of the index to encourage DB2 to use this index in other queries.


For other queries, we generally assume that the predicate connection with the "=" sign is used. The filter factor of this predicate is inversely proportional to the cardinality of its column. In this way, when designing an index, we give priority to putting the largest cardinality in front of the index, and arrange all keys in descending order of cardinality.


For example, the following SQL statement:


Select C_ COMMENT 

 From CUSTOMER 

 Where C_NAME= ’ IBM ’

AND C_MKTSEGMENT = ‘ CHINA 

AND C_ ADDRESS= ’ BEIJING ’


Assume that the cardinals corresponding to these columns are CARD(C_NAME) = 1000, CARD(C_MKTSEGMENT) = 2000, and CARD(C3) = 1500. Then the order of the index keys we designed should be: (C_MKTSEGMENT, C_ ADDRESS, C_NAME).


Further design of the index


If the performance still cannot meet the requirements according to the index designed above, then the following three index design directions can be further considered:


1. Add all columns referenced in the WHERE clause to the index, so that non-matching access predicates can also benefit


Even if it is a non-matching access operation in the index, the performance is still higher than that of a direct full table scan. Establishing indexes for some or all of the indexable predicates in the WHERE clause can improve performance. However, it should be noted that the columns that can be accessed for matching must be placed in the front, and the columns for non-matching access are placed at the back to prevent premature stop matching (refer to the previous section on index order).


Example: For queries.


Select C_ COMMENT 

 From CUSTOMER 

 Where C_ ACCTBAL > 10000 

 AND C_PHONE LIKE ‘ 135010% ’

 AND C_ ADDRESS= ’ BEIJING ’


The index that can be considered is (C_ ADDRESS, C_ ACCTBAL, C_PHONE). Note that C_ ADDRESS must be placed in the first position of the index, so that DB2 can perform matching access operations on this key. C_ ACCTBAL is the predicate to stop matching, and C_PHONE after it can only use non-matching access.


2. On the basis of the first step, you can further add all the columns in the SELECT clause to make the query an index-only access method


For the above SQL statement, if we want to use index-only access, then the index we need to build is (C_ ADDRESS, C_ ACCTBAL, C_PHONE, C_ COMMENT)


3. Add all the columns in the GROUP BY and ORDER BY clauses to reduce the sorting operation in the access plan


For the GROUP BY and ORDER BY keywords in the query, DB2 needs to perform a sort operation to get the correct result, unless the data has been sorted. Using the knowledge about the index structure mentioned before, we know that the index is a set of sorted columns. If there happens to be an available index on the column to be sorted in the GROUP BY and ORDER BY clauses, then the database reads from the index The data does not need to be sorted anymore.


Example:


SELECT L_PARTKEY,L_ORDERKEY 

 FROM LINEITEM 

 WHERE L_TAX = 10 

 GROUP BY L_PARTKEY,L_ORDERKEY 

 ORDER BY L_PARTKEY,L_ORDERKEY


In order to reduce the sorting operation, the index that needs to be established is (L_TAX, L_PARTKEY, L_ORDERKEY). The first position of the index is for the predicate L_TAX = 10, and then all the columns in the GROUP BY and ORDER BY are added to avoid the sorting operation.


Create indexes required for table join


For SQL statements involving connections between multiple tables, when we design indexes to improve performance, we must consider the connection mode and connection order of the tables. Different databases may have many different table connection methods, and they also have many differences in the way they use indexes. For an explanation of a specific table connection method, please refer to the database manufacturer's manual. In this article, we take the most common table connection method in DB2-nested loop connection as an example.


Nested loop connection, as the most basic and common connection method, is widely used in all mainstream database products. The two tables it connects are divided into outer and inner tables. The outer table is scanned once, and the inner table is scanned multiple times to find the rows that satisfy the join predicate. The pseudo code of the nested loop connection process is as follows:


For each i in 外表 : 

For each j in 内表 : 

If (i,j) meets the constraints

  Put (i,j) into the result set


It can be seen that the outer table only needs to do a complete full table scan, and the index has no effect on this access; while the inner table needs to be scanned multiple times, and each scan uses the join predicate to perform a query operation. For this access method, it is quite necessary to build an index on the relevant column of the inner table.


We can imagine a simple example to understand the role of the index on the inner table: the outer table and the inner table each contain 1000 records, and the inner table has no index on the connected column. When the database connects these two tables in a nested loop, the outer table is scanned once and 1000 records need to be read. Since the inner table is not indexed, 1000 full table scans are required, and the database has to read 1000 from the disk. × 1000 is 1 million records. Then we look at the situation where there is an index on the inner table. Similarly, the outer table must read 1000 records from the disk, but the access to the inner table is by index access this time. If 10% of the records on the inner table match the join predicate, then it is necessary to read from the disk (1000 × 10%=) 100 records, performance increased by 10,000 times. Of course, in a real database, there will be various mechanisms such as caching, pre-reading, etc. to improve performance. The actual performance gap will not be so big, but through this example, I believe readers should be able to appreciate the improvement of index pairs on the internal table How important performance is. But if the index is built on the external table, access to the internal table will not benefit, and access to the external table still needs to read 1000 records (assuming the database chooses to use the index). It can be seen that it is not only important to build an index, but also to build an index on the correct table.


For example, the following SQL statement:


SELECT DISTINCT O_CUSTKEY 

 FROM LINEITEM, ORDER 

 WHERE L_ORDERKEY = O_ORDERKEY 

 AND O_ORDERDATE >= DATE('1993-07-01')


In order to benefit from nested loop joins to build indexes, we need to build indexes on all connected columns. Taking into account the different table connection sequences that DB2 may use, the designed index may be (L_ORDERKEY) or (O_ORDERKEY). We need to look at the actual access method to finally determine which index we need to build.


The other side of the index


Indexing will bring some performance benefits, but it is not a cure-all. Building too many indexes, or inappropriate indexes, will also bring some side effects.


1) Indexing will reduce the speed of update, insert, and delete data in the table. Because DB2 needs to update the indexes on the table at the same time, if there are multiple indexes on the same table, the situation will be worse. There is a set of experimental data, and the test environment is established as follows:


CREATE TABLE TEST_INDEX_TABLE (C1 INT, C2 INT , C3 INT, C4 INT); 

 CREATE INDEX IDX_01 ON TEST_INDEX_TABLE (C1);


The SQL statement tested is:


UPDATE TEST_INDEX_TABLE SET C1 = 1 WHERE C1 BETWEEN 1 AND 10000


The updated column C1 happens to be contained in IDX_01, and DB2 needs to update the index IDX_01 while updating the data of C1. If the data in the table are 10k, 100k, and 1000k respectively, the following test results compare the significant reduction in SQL execution speed before and after indexing.



Table 2. Time test results

Number of different records in the table No index on the table

(unit: milliseconds) An index is established on the table

(unit: milliseconds) Increase time

(unit: milliseconds)

10k 516 6937 6421

100k 1156 52750 51594

1000k 3234 451719 448485


2) The database needs disk space to save the index, so indexing will bring disk overhead. Take the IDX_01 index just now as an example to test the same database table TEST_INDEX_TABLE. When the number of data rows in the table is different, the test results of the disk space required to establish IDX_01 are as follows.



Table 3. Space test results

Number of different records in the table Index occupied space

(unit: KB)

10k 188

100k 1688

1000k 16692


Verify the index created


Based on the analysis of the previous steps, we can design some indexes for a query. But will these indexes be selected by the optimizer? After using these indexes, how much performance has been improved? After using the index, how will the access plan of this query be changed? The easiest way to get answers to these questions, of course, is to actually create the index that you want to be tested, and actually execute the query statement; but actually creating an index requires a certain amount of overhead, and it is often not allowed in the actual production system Create indexes at will for testing. In order to solve this problem, DB2 has provided us with a very good tool-virtual index evaluation.


Create a virtual index in DB2 LUW


In order to test a designed index, first define the corresponding virtual index in DB2. The way to define a virtual index is to insert the corresponding record in the ADVISE_INDEX table. The inserted content should at least include the name of the index, the name of the table, the definition of the index key, and whether the index is a unique index.


Here is a simple example to illustrate how to create a virtual index, the insert statement is:


INSERT INTO ADVISE_INDEX 

 (NAME, CREATOR, TBNAME, TBCREATOR, COLNAMES, UNIQUERULE, COLCOUNT, IID, 

 NLEAF, NLEVELS, FIRSTKEYCARD, FULLKEYCARD, CLUSTERRATIO, USERDEFINED, 

 SYSTEM_REQUIRED, SEQUENTIAL_PAGES, DENSITY, FIRST2KEYCARD, FIRST3KEYCARD,

  FIRST4KEYCARD, PCTFREE, UNIQUE_COLCOUNT, MINPCTUSED, REVERSE_SCANS,

   USE_INDEX, CREATION_TEXT, INDEXTYPE, EXISTS, RIDTOBLOCK) 

 values ( 'IDX1002','OEDEV','EXPLAIN_INSTANCE','OEDEV','+ISOLATION+BLOCK',

  'D', 2, 1, 1, 1, 25, 25, 100, 1, 0, -1, -1, 0, 0, 0, -1, -1, 0, 'Y', 'Y',

   'CREATE INDEX "OEDEV "."IDX1002" ON "OEDEV "."EXPLAIN_INSTANCE" 

   ("ISOLATION" ASC, "BLOCK", ASC) ALLOW REVERSE SCANS', 'REG','N','N')


This creates an index with the key ("ISOLATION" ASC, "BLOCK", ASC) on the table EXPLAIN_INSTANCE. For the specific meaning of each column on this ADVISE_INDEX, you can refer to the description of the ADVISE_INDEX table on DB2 SQL REFERENCE.


After inserting the virtual index into the ADVISE_INDEX table, the virtual index can be verified. The specific method is to first execute "SET CURRENT EXPLAIN MODE EVALUATE INDEXES" to set special registers, and then execute the query that needs to be verified. At this time all access plan information has been saved. After that, you can view the access plan that includes the virtual index like a normal access plan.


Analysis of the reasons why the index is not adopted


After reviewing the access plan, you may find that the index we designed was not selected by the optimizer. There may be many reasons for this result. According to the author's experience, the common situations may be as follows:


The most common situation is that there are some problems with the designed index, such as failure to consider the optimal table connection order, or the existence of stop-matching keys in the index. If it is determined that this is the reason, then you need to return to the previous steps to redesign the index.


Another possibility is that the statistical information in the database is incorrect or even non-existent. In this case, DB2 is often unable to select the optimal access plan, so it is possible that the designed index will not be used. This situation is generally resolved by re-executing the DB2 RUNSTATS command.


In addition, if DB2 determines that the proportion of data that needs to be read from the table is high (for example, more than 90% of the records in the table need to be returned), then DB2 is likely to choose a full table scan instead of using an index, because it can Reduce one reading of the index tree. If this is the case, indexing on the table does not actually improve performance.


This article introduces the performance optimization theory related to indexes in the DB2 database, and how to analyze SQL statements and establish appropriate indexes. At the same time, this article also introduces the mechanism of virtual index (Virtual index) used to verify index performance in DB2 database and how to use it. Index design is very important to database performance. If the database administrator wants to improve the performance of the SQL query statement by establishing an index, he should analyze the characteristics of the SQL statement, design the most suitable index, and submit it to DB2 for verification through a virtual index. If the index is really effective and achieves the desired goal, then the database administrator can finally build these indexes in the database to improve the performance of DB2 when executing queries, thereby reducing the total cost and increasing the return on investment of the business.

Comments

Popular posts from this blog

Defination of the essential properties of operating systems

Define the essential properties of the following types of operating sys-tems:  Batch  Interactive  Time sharing  Real time  Network  Parallel  Distributed  Clustered  Handheld ANSWERS: a. Batch processing:-   Jobs with similar needs are batched together and run through the computer as a group by an operator or automatic job sequencer. Performance is increased by attempting to keep CPU and I/O devices busy at all times through buffering, off-line operation, spooling, and multi-programming. Batch is good for executing large jobs that need little interaction; it can be submitted and picked up later. b. Interactive System:-   This system is composed of many short transactions where the results of the next transaction may be unpredictable. Response time needs to be short (seconds) since the user submits and waits for the result. c. Time sharing:-   This systems uses CPU scheduling and multipro-gramming to provide economical interactive use of a system. The CPU switches rapidl

What is a Fair lock in multithreading?

  Photo by  João Jesus  from  Pexels In Java, there is a class ReentrantLock that is used for implementing Fair lock. This class accepts optional parameter fairness.  When fairness is set to true, the RenentrantLock will give access to the longest waiting thread.  The most popular use of Fair lock is in avoiding thread starvation.  Since longest waiting threads are always given priority in case of contention, no thread can starve.  The downside of Fair lock is the low throughput of the program.  Since low priority or slow threads are getting locks multiple times, it leads to slower execution of a program. The only exception to a Fair lock is tryLock() method of ReentrantLock.  This method does not honor the value of the fairness parameter.

How do clustered systems differ from multiprocessor systems? What is required for two machines belonging to a cluster to cooperate to provide a highly available service?

 How do clustered systems differ from multiprocessor systems? What is required for two machines belonging to a cluster to cooperate to provide a highly available service? Answer: Clustered systems are typically constructed by combining multiple computers into a single system to perform a computational task distributed across the cluster. Multiprocessor systems on the other hand could be a single physical entity comprising of multiple CPUs. A clustered system is less tightly coupled than a multiprocessor system. Clustered systems communicate using messages, while processors in a multiprocessor system could communicate using shared memory. In order for two machines to provide a highly available service, the state on the two machines should be replicated and should be consistently updated. When one of the machines fails, the other could then take‐over the functionality of the failed machine. Some computer systems do not provide a privileged mode of operation in hardware. Is it possible t