Tips for writing more efficient SQL

  • Rewriting the SQL in PL/SQL - For certain queries this can result in more than a 20x performance improvement.
  • Use minus instead of EXISTS sub queries - Using the minus operator instead of NOT IN and NOT- EXISTS will result in a faster execution plan.
  • Use SQL analytic functions - The Oracle analytic functions can do multiple aggregations (e.g. rollup by cube) with a single pass through the tables, making them very fast for reporting SQL.
  • Re-write NOT EXISTS sub Queries as outer joins - In many cases of NOT queries (but ONLY where a column is defined as NULL), you can re-write the uncorrelated sub-Query into outer joins with IS NULL tests.  Note that this is a non-correlated sub-query, but it could be re-written as an outer join.
            select book_key from book
     where
     book_key NOT IN (select book_key from sales);


    • Below we combine the outer join with a NULL test in the WHERE clause without using a sub-query, giving a faster execution plan.
            select b.book_key from book b, sales s
       where
      b.book_key = s.book_key(+)
      and s.book_key IS NULL;


  • Index your NULL values - If you have SQL that frequently tests for NULL, consider creating an index on NULL values.  To get around the optimization of SQL queries that choose NULL column values (i.e. where emp_name IS NULL), we can create a function-based index using the null value built-in SQL function to index only on the NULL columns.
  • Leave column names alone - Never do a calculation on an indexed column unless you have a matching function-based index (a.k.a. FBI).  Better yet, re-design the schema so that common where clause predicates do not need transformation with a FBI:
                where salary*5            > :myvalue
       where substr(ssn,7,4)     = "1234"
       where to_char(mydate,mon) = "january"
  • Avoid the use of NOT IN or HAVING. Instead, a NOT EXISTS sub-Query may run faster (when appropriate).
  • Avoid the LIKE predicate = Always replace a "LIKE" with equality, when appropriate.
  • Never mix data types - If a WHERE clause column predicate is numeric, do not to use quotes. For char index columns, always use quotes. There are mixed data type predicates:
            where cust_nbr = "123"
     where substr(ssn,7,4) = 1234
  • Use decode and case - Performing complex aggregations with the “decode” or "case" functions can minimize the number of times a table has to be selected.
  • Don't fear full-table scans - Not all OLTP queries are optimal when they use indexes.  If your query will return a large percentage of the table rows, a full-table scan may be faster than an index scan.  This depends on many factors, including your configuration (values for db_file_multiblock_read_count, db_block_size), query parallelism and the number of table/index blocks in the buffer cache
  • Use table alias: Always use table alias and prefix all column names with the aliases when you are using more than one table.
  • Creating an index - To improve the performance of a query that selects rows of a table based on a specific column value, create an index on that column. For example, the following query performs better if the NAME column of the EMP table has an index.
           SELECT * FROM EMP WHERE NAME = 'Smith';
  • Order of the tables in Joins: If you specify 2 or more tables in the FROM clause of a SELECT statement, then Oracle parser will process the tables from right to left, so the table name you specify last will be processed first. In this case you have to choose one table as driving table. Always choose the table with less number of records as the driving table.
  • Use EXISTS instead of DISTINCT: Use EXISTS in place of DISTINCT if you want the result set to contain distinct values while joining tables.
    • For example:
          SELECT DISTINCT d.deptno, d.dname FROM dept d, emp e 
              WHERE d.deptno = e.deptno;

           The following SQL statement is a better alternative.

             SELECT d.deptno, d.dname
             FROM dept d
            WHERE EXISTS (SELECT e.deptno
             FROM emp e
             WHERE d.deptno = e.deptno);
             (48% Time Reduction)
  • Use of NOT operator on indexed columns: Never use NOT operator on an indexed column.Whenever Oracle encounters a NOT on an index column, it will perform full-table scan.
    • For Example:
              SELECT * FROM emp WHERE NOT deptno = 0;
               Instead use the following:

               SELECT * FROM emp WHERE deptno > 0;
  • Use Column Names Instead of * in a SELECT Statement:
  • If you are selecting only a few columns from a table there is no need to use SELECT *. Though this is easier to write, it will cost more time for the database to complete the query. By selecting
  • Only the columns you need, you are reducing the size of the result table and in turn increasing the speed of the query.
  • Use EXISTS instead of LEFT JOIN: 
    • The LEFT JOIN merges the outer query with the inner query and keeps the extra rows from the outer table. The same result can be obtained by using an EXISTS sub query. The will eliminate the need to compare two tables as the inner query acts as a filter when the outer query executes. Example: While creating a tool that modified the help pages dynamically at OUAC, I needed to find which Universities had help files associated with them. By using an EXISTS sub query instead of LEFT JOIN, I increased the efficiency of this query by avoiding a table comparison.

                   SELECT merfnbr, mestname FROM buma.merchant LEFT JOIN 
                    buma.helpfiles ON merfnbr=hemenbr

                   SELECT merfnbr, mestname FROM buma.merchant WHERE EXISTS 
         (SELECT * FROM buma.helpfiles where merfnbr = hemenbr)
       (23% Time Reduction)
  • Use BETWEEN instead of IN: The BETWEEN keyword is very useful for filtering out values in a specific range. It is much faster than typing each value in the range into an IN.

           SELECT crpcgnbr FROM cgryrel WHERE crpcgnbr IN (508858, 508859, 508860,508861,508862, 508863, 508864)

     SELECT crpcgnbr FROM cgryrel WHERE crpcgnbr 
            BETWEEN 508858 and 508864
     (59% Time Reduction)
  • Use BETWEEN instead of Greater Than & less than Range: The BETWEEN keyword is very useful for filtering out values in a specific range. It is much faster than typing each value in the range into Greater Than & less than Range.
           (50% Time Reduction)

  • Use EXISTS instead of IN:A simple trick to increase the speed of an IN sub query is to replace it with EXISTS. The EXISTS method is faster than IN because it doesn’t check unnecessary rows in the comparison.
    SELECT cgrfnbr from category where EXISTS (select cpcgnbr from cgprrel where cpprnbr = 149)

    SELECT cgrfnbr from category where cgrfnbr IN (select cpcgnbr    from cgprrel where cpprnbr = 149)
    (36% Time Reduction)
  • Does order of tables matter in a JOIN clause? 
    • The order of tables referenced in the ON clause of a JOIN doesn't affect the performance of a query. 
Query #1
 
SELECT
     c.Customer_ID
    ,c.Last_Name
    ,c.First_Name
    ,s.Order_ID
    ,s.Order_Date
FROM
    dbo.Customers AS c JOIN
    dbo.Sales_Orders AS s
    ON s.Customer_ID = c.Customer_ID
Query #2
 
SELECT
     c.Customer_ID
    ,c.Last_Name
    ,c.First_Name
    ,s.Order_ID
    ,s.Order_Date
FROM
    dbo.Customers AS c JOIN
    dbo.Sales_Orders AS s
    ON c.Customer_ID = s.Customer_ID

The execution plan of both queries will be same, so order of tables referenced in the ON clause of a    JOIN doesn't affect the performance of a query.

No comments:

Post a Comment