Abstract
This short article shows by example how bad index can affect Firebird database performance.
Test database
Imagine you have simple Firebird database:
CREATE DATABASE “E:\TESTFB25INDEX.FDB” USER “SYSDBA” PASSWORD “masterkey”;
CREATE TABLE TABLEIND1 (
I1 INTEGER NOT NULL PRIMARY KEY,
NAME VARCHAR(250),
MALEORFEMALE SMALLINT
);
CREATE GENERATOR G1;
SET TERM ^ ;
create or alter procedure INS1MLN
returns (
INSERTED_CNT integer)
as
BEGIN
inserted_cnt = 0;
WHILE (inserted_cnt <1000000) DO
BEGIN
Insert into tableind1(i1, name, maleorfemale) values(gen_id(g1,1), 'TEST name', (:inserted_cnt - (:inserted_cnt/2)*2));
inserted_cnt=inserted_cnt+1;
END
suspend;
END^
SET TERM ; ^
GRANT INSERT ON TABLEIND1 TO PROCEDURE INS1MLN;
GRANT EXECUTE ON PROCEDURE INS1MLN TO SYSDBA;
COMMIT;
For some reason you need:
· insert 1 million records
· update all these records
· delete all records
· run SELECT count(*) from this database.
To perform it, we can use the following script (to run it immediately after database creation script) :
set stat on; /*show statistics*/
select * from ins1mln;
update tableind1 SET MALEORFEMALE = 3;
delete from tableind1;
select count(*) from tableind1;
and store result for future analysis.
Then we will create database with the same structure - but add index for MALEORFEMAIL column
CREATE INDEX TABLEIND1_IDX1 ON TABLEIND1 (MALEORFEMALE);
As you can see, we insert to this column only 0 or 1 integer values.
Then we repeat the script with this index and compare results.
Results are in the following table:
Without index for MALEORFEMAIL | With index for MALEORFEMALE |
SQL> set stat on; /*show statistics*/ SQL> select * from ins1mln; INSERTED_CNT ============ 1000000 Current memory = 10487216 Delta memory = 80560 Max memory = 12569996 Elapsed time= 13.33 sec Buffers = 2048 Reads = 0 Writes 18756 Fetches = 7833503 SQL> update tableind1 SET MALEORFEMALE = 3; Current memory = 76551788 Delta memory = 66064572 Max memory = 111442520 Elapsed time= 15.04 sec Buffers = 2048 Reads = 16166 Writes 15852 Fetches = 6032307 SQL> delete from tableind1; Current memory = 76550240 Delta memory = -1548 Max memory = 111442520 Elapsed time= 3.27 sec Buffers = 2048 Reads = 16147 Writes 16006 Fetches = 5032277 SQL> select count(*) from tableind1; COUNT ============ 0 Current memory = 76552064 Delta memory = 1824 Max memory = 111442520 Elapsed time= 1.35 sec Buffers = 2048 Reads = 16021 Writes 1791 Fetches = 2032278 | SQL> set stat on; /*show statistics*/ SQL> select * from ins1mln; INSERTED_CNT ============ 1000000 Current memory = 10484140 Delta memory = 75524 Max memory = 12569996 Elapsed time= 23.94 sec Buffers = 2048 Reads = 1 Writes 23942 Fetches = 11459599 SQL> update tableind1 SET MALEORFEMALE = 3; Current memory = 76548712 Delta memory = 66064572 Max memory = 111439444 Elapsed time= 29.30 sec Buffers = 2048 Reads = 16167 Writes 19492 Fetches = 10035948 SQL> delete from tableind1; Current memory = 76547164 Delta memory = -1548 Max memory = 111439444 Elapsed time= 3.41 sec Buffers = 2048 Reads = 16147 Writes 15967 Fetches = 5032277 SQL> select count(*) from tableind1; COUNT ============ 0 Current memory = 76548988 Delta memory = 1824 Max memory = 111439444 Elapsed time= 0.69 sec Buffers = 2048 Reads = 16021 Writes 1901 Fetches = 2032278 |
So, bad index decreases performance by approximately 2 times while inserting or updating. Also we can see that non-optimal index greatly increases number of writes and record fetches.
Let’s get statistics for this sample database (with bad index for MALEORFEMALE) and try to find some details.
To gather statistics, we run the following command:
gstat -r e:\testfb25index.fdb > e:\teststat.txt
TABLEIND1 table and indices statistics section looks intriguing, but what useful information it gives to us?
TABLEIND1 (128)
Primary pointer page: 166, Index root page: 167
Average record length: 0.00, total records: 1000000
Average version length: 27.00, total versions: 1000000, max versions: 1
Data pages: 16130, data page slots: 16130, average fill: 93%
Fill distribution:
0 - 19% = 1
20 - 39% = 0
40 - 59% = 0
60 - 79% = 0
80 - 99% = 16129
Index RDB$PRIMARY1 (0)
Depth: 3, leaf buckets: 1463, nodes: 1000000
Average data length: 1.00, total dup: 0, max dup: 0
Fill distribution:
0 - 19% = 0
20 - 39% = 0
40 - 59% = 0
60 - 79% = 1
80 - 99% = 1462
Index TABLEIND1_IDX1 (1)
Depth: 3, leaf buckets: 2873, nodes: 2000000
Average data length: 0.00, total dup: 1999997, max dup: 999999
Fill distribution:
0 - 19% = 0
20 - 39% = 1
40 - 59% = 1056
60 - 79% = 0
80 - 99% = 1816
To understand the meaning of shown numbers and percentage values, we can use IBAnalyst tool, which offers visual interpretation of database statistics:
By clicking Reports/View recommendations we can find the appropriate explanation for this index:
Bad indices count: 1.
By `bad` we name indices with many duplicate keys (90% of all keys)
and big groups of equal keys (30% of all keys). Big groups of equal
keys slowdown garbage collection - MaxEquals here is % of max groups
of keys having equal values. Index search for such an index is not
efficient. You can drop such indices (if they are not on FK
constraints).
Index ( Relation) Duplicates MaxEquals
TABLEIND1_IDX1 (TABLEIND1) : 100%, 50%
In production databases often we can see many bad indices, which can greatly affect database performance. In this example we can see table with 13 millions of records which have 7 bad indices, which are (most likely) useless and greatly decrease Firebird performance.
Usually the first move of developer is to drop these bad indices. On other hand, it’s possible that bad index was used in some particular SQL query, which required specific combination of indices (including bad one) in order to run fast enough.
Firebird SQL optimizer has complex rules, and it often happens that dropping of bad index can lead to significant performance degradation in some queries.
Certainly, these SQL queries should be rewritten to use more optimal execution plan without bad index, but there is a still a problem: how to find SQL queries which use bad index?
The answer is in the next episode of our Optimization series of articles.
The answer is in the next episode of our Optimization series of articles.
No comments:
Post a Comment