January 1994 Newsletter
Volume 5, No. 1
Highlights of This Issue
Forum 94 - by Peter Wages
Informix Announces Site on World Wide Web - Press Release
Informix Query Optimization - by Kevin Fennimore
Informix DRDA- DB2 Gateway - by Israel Gotay (Note: this articule is not available
online)
WAIUG Forum 94
Last December, the Washington Area Informix Users Group held its second user group forum
in Tysons, Virginia. The forum was a one day event with 16 sessions on Informix database
development, and 12 exhibitors demonstrating exciting new database tools and products. Over
185 people attended the event. See the article by Peter Wages on page 3 for more details. On
behalf of the user group, I would like to thank all the people who volunteered to make it
happen, the speakers, the exhibitors, and the folks from Informix.
NEXT MEETING
1) Informix Query Optimization
2) New InformixLink WWW site on the internet
The next meeting will cover two topics. Kevin Fennimore will present and discuss Informix
Query Optimization. (See Kevin's article in this newsletter for a preview.) Then, Lester
Knutsen will present the new InformixLink World Wide Web site on the Internet. There will
be time before and after the meeting for you to try out InformixLink on the Internet. The
meeting will be at the new Informix Bethesda Office (Commercial Division).
Date: March 24, 1994 Time: 9:00 am to 12:00 noon
Location: Informix Software, Inc. Phone: 301-214-1300
Two Democracy Plaza, 6707 Democracy Blvd., Suite 500
Bethesda, MD 20817
1995 Informix Worldwide User Conference
The Informix Worldwide User Conference is a once-a-year opportunity for Informix end users,
partners, industry executives and interested customers to meet. This year's conference will
feature more than 60 conference sessions, tutorials and other breakouts designed to help
Informix's customers and partners effectively use Informix database technology to be more
competitive. The conference will also feature a 75,000 square foot exhibition hall where over
100 Informix customers, hardware and software partners, and VARs will showcase
Informix-based solutions. The conference and exhibition will be held July 17-20 in San Jose,
California. For more information call 1-800-926-EXPO.
Sponsors for the Newsletter
This newsletter is sponsored by American Computer Technology, Inc., Advanced DataTools
Corporation, and Business Systems Support Group. The sponsorship fee allows a company
to have a full page for advertizing or announcements. If you company would be interested in
sponsoring a future newsletter, please call Lester Knutsen, at 703-256-0267.
List of Employment and Consulting Opportunities
As editor and president of WAIUG, I get several calls each month from companies looking for
skilled Informix programmers and consultants. I have started a list of the companies that have
called and will fax or email this information to any members who wants it. I would also like
to start a directory of members who are Informix consultants. Several companies have asked
for a directory of members or consultants. We don't want to get into the job- or consultant-placement business, but will have information available for members who need it. We will be
discussing this a our next Board meeting and I would appreciate any feedback from members
on this topic.
Plans for 1994
We plan to have quarterly meetings and publish four newsletters this year. The meetings will
be in March, June, September, and November. The four topics that are of most interest to our
members based on responses from the forum registrations, are 1) client/server database
development, 2) database network and management, 3) Informix NewEra, and 4) database
performance tuning. Our goal is to focus our meetings on these topics.
June Meeting - Client/Server Development
Our June user group meeting will focus on client/server development and setting up a database
network. What our members are most interested in is other Informix users describing what
works and how they did it. If you have implemented client/server and network databases using
Informix and would be interested in discussing how you accomplished it, please give me a call
(Lester at 703-256-0267). We need to cover topics such as: what needs to installed on a PC
to connect it, how to install I-Net and I-Star, what tools work, what mistakes have been made,
how do you set it up for a small office, and large projects.
Elections to the Board of Directors
At our June meeting we will be holding elections to the WAIUG Board of Directors. The
Board is comprised of volunteers who plan our activities and work hard at putting them
together. If you are interested in volunteering to be on the Board, or would like to nominate
a member, please contact the Secretary of the WAIUG, Rick Montgomery, at 703-756-7273.
By Peter Wages
The Washington Area Informix Users' Group held their second annual forum on December
15, 1994 at the Tysons Corner Westpark Hotel. Forum 94 hosted 183 participants, eleven
speakers and ten exhibitors. Each participant received a diskette of public domain software
contributed by members of the Informix mailing list on the Internet.
Forum 94 opened with a rousing address by Informix Inc. Vice President Robert MacDonald.
Mr. MacDonald reviewed the accomplishments that Informix had in 1994, including the
introduction of New Era, anew graphical client/server application development tool, and the
addition of United Airlines as a major customer. With friendly enthusiasm, he interspersed
his remarks with several punches at Oracle Corporation. For example, Mr. MacDonald
showed a slide of an Informix billboard located at the entrance of Oracle Headquarters. He
reported that Oracle employees hate that billboard.
WAIUG members and Informix Inc. then presented talks about various subjects relating to
Informix. Mr. Wayne Beekman of Information Concepts Inc. spoke about "Visual Basic
Client/Server With Informix". Judging from the discussion about Open Database
Connectivity (ODBC) drivers on the internet mailing list, many are interested in this topic.
Information Concepts Inc. has had some success with Visual Basic and Q + E Inc. 's ODBC
driver.
Lester Knutsen talked about "The Internet, Public Domain Software, and Support". He
impressed Filomena Resurrecion of Netrix Inc. with the abundance of information available.
David Sears of Informix Inc. and John Woolsoncroft of Concepts Dynamic Inc. discussed New Era,
the new graphical client/server tool released by Informix this year. New Era is to compete with
Visual Basic and Powerbuilder, other client/server products, in the marketplace. Among many of
New Era's benefits, is that the programmer does not have to write C code for many routines.
Informix is developing a New Era function library.
Mayank Patel of Informix Inc. spoke on "Online Performance Tuning". Although his talk was very
knowledgeable, he geared the discussion to version 6 of Informix which caused some confusion
because many in the audience were still using version 5. The two versions are rather different.
It was great to see old friends and colleagues. One does enjoy the chance to what others are doing.
Lester Knutsen and the folks at Summit Data Group did a wonderful job organizing the event.
Bob Formica of the Aberdeen Proving Ground won the raffle. He received a copy of New Era.
Peter Wages
VSE Corporation 2760 Eisenhower Ave Alexandria, Virginia
pmwages@cais.com Work: (703) 329-2626 Home: (202) 244-3511
Company is First Independent Database Vendor to Offer Online Information to Customers Via World
Wide Web
MENLO PARK, Calif., (January 3, 1995) -- Informix Software, Inc. (NASDAQ:IFMX), the leader
in parallel processing database technology, announced today the expansion of its site on the World
Wide Web (WWW) to a fully interactive Web server, called InformixLink, which will provide users
with easy access to a variety of information on Informix products, services, partners and customers.
Informix was the first independent database vendor to offer a site on the Web in 1994, and is now the
first database company to offer a complete interactive Web site. Informix's WWW site will be
accessible at http://www.informix.com/ by January 4.
"This announcement is our response to the growing needs of our customers for more immediate,
readily accessible information about our company, products and services," said Jim Hendrickson, vice
president of Customer Services. "The Informix Web site will give customers a current resource of
information whenever and from wherever they need it." WWW Site an Extension of Original
InformixLink
The new Informix Web site is an extension of an existing service which provides interactive, online,
subscription information service for Informix customers, resellers, partners and developers. The
original InformixLink, now called InformixLink Classic is still available.
Although the new InformixLink web site will be available to the general public, a portion of the site
will be restricted to subscription customers. These subscribers will now be able to access a "members
only" portion of the web site with access to information such as product documentation, training,
software and technical notes, in addition to all of the publicly available information. Informix
employees will be able to access information on corporate policies, programs, benefits and internal
job opportunities.
InformixLink Provides Easy Access to Information
InformixLink will allow any Internet user to access information about Informix. Informix's Home
Page offers an easy, interactive interface to the company's web site and information on:
* Corporate background: Company information and facts; an events calendar including user
conferences, tradeshows and seminars; recent Informix news and press releases; investor relations
information; and a corporate calendar.
* Products: Product briefs and brochures on Informix's industry leading database servers, tools and
connectivity products.
* Customer/user services: General information regarding Informix customer service; an explanation
of the various levels of Informix customer support; training schedules, course descriptions and on-line
registration for classes; descriptions of Informix consulting services; listing of Informix user
groups; and information on ordering products.
* Industry solutions: Customer success stories organized by industry and business need (White
papers on key Informix technology initiatives and icon links to customer press releases will be added
later in the year.)
* Partner programs: An overview of Informix Value Added Partners (VAPs) with icon links to
related success stories.
* InSync program: an overview of the Informix InSync development partner program with icon links
to related successes.
* "E-Zines": Electronic copies of "Informix Times ", the Informix customer newsletter; and briefs
on Informix-related books available from Informix Press.
* Job opportunities: in Menlo Park, Calif., Lenexa, Kan., and for sales positions worldwide.
* Help, comments and suggestions: Instructions on how to use the server; a table of contents,
explanations of icons; and an interactive form for user comments; and suggestions.
About Informix
Informix Software is the leading supplier of high performance, parallel processing database
technology for open systems. The company's database servers are the number one choice of
computer hardware manufacturers for publishing Transaction Processing Council (TPC) benchmarks
for UNIX-based systems. Informix products also include application development tools for creating
client/server production applications, decision support systems, and ad-hoc query interfaces, and
connectivity software that allows information to be shared transparently from PCs to mainframes
within the corporate computing environment. The company's corporate headquarters is in Menlo
Park, California.
by Kevin Fennimore
An often overlooked step in developing a database application is that of optimizing the queries
performed against the database. Database queries include any statement that searches the database
for specific rows, such as select, update and delete( i.e. statements with a where clause ). These
statements are usually written without regard to how the database engine will access the data
requested. It is assumed that we, the programmers, tell the database server what data we want and
the server will find that information for us in the most efficient way possible. The database server
does its best but, unfortunately, it does not always find the most efficient way. Following are some
ideas and examples of how we can optimize our queries to ensure the database server gets the data
we request in the most efficient way possible.
The first thing to do when optimizing a query is to set up a good test environment. Some guidelines
to follow for the test environment are:
- run the queries several times ( keep track of the changes and what affect each one had on
performance )
- run queries on a machine with minimal system activity
- use data that represents the production data the query will eventually run against and use the
same data for each run of the queries
- run update statistics on the tables involved in the query
- use the output of the set explain command
- time each run of the queries
- check the output of the query each time to make sure that it is correct
Running the queries several times will help to determine which is the best query to use. It is also a
very good idea to keep track of each change; it is very frustrating to write a select that works well,
try to improve it and lose the one that worked well. Using a machine with minimal system activity
will help reduce differences in the performance of the changed queries due to other users.
Running the queries against "production" data will give a better simulation of how they will perform
in a production environment. Often, developers will run their programs against a test database that
does not have the volume of data or the disbursement of data that is typically found in a production
environment. This can hide how inefficient a query may actually be. There is a big difference
between sequentially reading from a table with a few hundred record and one with a few hundred
thousand records.
As mentioned in a previous article( "The Set Explain Command" ) the database engine has an
optimizer which determines how to access the requested data. It examines the information in the
system catalogs and determines the best query path to access the data. Since the information it uses
is stored in the system catalogs it is very important for the update statistics command to be run
frequently, especially on tables that are very dynamic. This ensures that the information being used
by the optimizer is accurate.
The output from the set explain command is the most useful tool for optimizing database queries.
It tells you exactly how the database engine will read from the tables to get the data. When changes
are made the effect of those changes can be seen in the set explain output. Sometimes it is a good
idea to run a problem program with set explain turned on so that all of the queries performed by the
program can be examined. This can be a hassle since the program would have to have the set explain
command added to it and then recompiled. A solution is to add a call to the following 4GL function
at the beginning of a program; after the database statement:
function set_explain()|
define set_explain char(10)
let set_explain = fgl_getenv( "SET_EXPLAIN" )
if( set_explain = "ON" )
then
set explain on
else
set explain off
end if
end function
This function gets the value of the environment variable SET_EXPLAIN. If the value is "ON" then
set explain is turned on, otherwise it is turned off. This function is useful for running a program in
a production environment without having to recompile the program to get set explain output.
Timing the queries is also useful and can be accomplished by using UNIX's timex command or other
similar utility. The query can be put into a .sql file and run from the command line as follows:
timex dbaccess dbname query.sql > try1.out 2>try1.err
The output from the query is stored in the file tyr1.out and the errors, including the output from
timex, is in the file try1.err. Look at the man pages for timex and look at the different options and
the output for each one. Some versions have an option that reports time for all of the child processes
associated with the process being timed. This is useful since it would include the time for the
database server(sqlexec or sqlturbo).
The next thing to do is to optimize the queries. Keep in mind that the main goal of optimization is
to reduce I/O performed by all application processes ( i.e. the front-end and back-end processes ).
Some of the best optimization techniques are to use indexes appropriately and to prepare statements
that are executed over and over again. These have been discussed in previous article( "Indexing
Strategies" and "The Prepare Statement" ). However, there are other ways to optimize queries that
go beyond indexes and preparing statements.
Using Temp Tables
The primary key to optimizing a query is to know what the query is doing and the type of data being
retrieved. For example, if we had a table of order items that contained 500,000 rows of data and, for
a variety of close-out numbers and purchase numbers, we wanted to know how many were of type
"Special" we might write the following selects:
select count(*) from order_items
where order_type = "Special" and closeout_num = ?
select count(*) from order_items
where order_type = "Special" and purchase_num = ?
Hopefully there would be an indexes on purchase_num and closeout_num that could be used to
access the data. If not, we would sequentially read 500,000 records from the order_items table each
time one of these queries was executed. Let's assume that only 20,000 records were of type
"Special". That would mean that we would sequentially read through 500,000 records even though
we are only interested in 20,000. If there were indexes on these columns but the indexes were not
very unique there could still be performance problems.
An alternative to the above selects is to create a temporary table that would contain only the records
from the order_items table that were of type special order. We could then build indexes on that
temporary table and repeatedly query from that table as follows:
select closeout_num, purchase_num
from order_items
where order_type = "Special"
into temp special_orders with no log
Note: The with no log is for OnLine users and it tells the server not to log modifications to the table.
create index i_spec_orders1 on special_orders(
closeout_num )
create index i_spec_orders2 on special_orders(
purchase_num )
select count(*) from special_orders
where closeout_num = ?
select count(*) from special_orders
where purchase_num = ?
The selects from this temporary table would very efficient and, with OnLine, the selects would be able
to perform key-only reads. Using temp tables in this manner is only good for selecting data that is
not going to change. Once the data is retrieved into the temporary table it does not change as the
data in the original table changes, it is just a copy.
UNIONs instead of ORs
Queries that use the or in the where clause present another opportunity for optimization. If, for
example, we were looking for customers that lived in California or New York we could write the
following select:
select * from customers
where state in ( "CA", "NY" )
Note: The in clause translate into and or clause(e.g. state = "CA" OR state = "NY" ).
If there was an index on state, the server could use that index to satisfy the query. If, however, we
were looking for customers that lived in California or had a status of inactive we might write the
following select:
select * from customers
where state = "CA"
or customer_status = "I"
This gets the rows we are interested in but causes the server to read through the table sequentially
because it cannot use an index. If indexes existed on state and customer_status, a more appropriate
way to write the above select is to use the UNION as follows:
select * from customers
where state = "CA"
UNION
select * from customers
where customer_status = "I"
This causes the server to execute each select individually and thereby allows it to use the indexes on
state and customer_status.
Another example of using the union instead of an or, which was not as obvious, was the following
select:
select count(*) from transaction_log
where trans_date >= ?
and trans_date < ?
and trans_id in ( 1, 5, ..., 100 )
and some other criteria
There was an index on trans_date and one on trans_id, trans_date and fields used in the "other
criteria". The optimizer decided to use the index on trans_date to read from the transaction_log table
which was extremely inefficient since the date range being used in the query included most of the
table. However, the condition on trans_id only included a small portion of the table. The only way
to make the optimizer use the index on trans_id was to remove the index on trans_date. This was
not a feasible solution since the affect on other queries could not be determined. The not so obvious
solution was to remove the or clause( or the in clause in this case ) and replace it with a union as
follows:
select count(*) from transaction_log
where trans_date >= ?
and trans_date < ?
and trans_id = 1
and some other criteria
UNION ALL
select count(*) from transaction_log
where trans_date >= ?
and trans_date < ?
and trans_id = 5
and some other criteria
...
UNION ALL
select count(*) from transaction_log
where trans_date >= ?
and trans_date < ?
and trans_id = 100
and some other criteria
Note: The UNION ALL in the example above will get all of the records from each select. The
UNION without the ALL excludes any duplicate records.
This caused the optimizer to execute each statement individually and use the appropriate index. It
did not do much for the readability of the code since it expanded one simple select into about ten or
so unioned selects. The code also had to be changed to loop through and sum all of the counts
returned by the union. It did have a dramatic affect on performance; the program went from running
an average of 45 to 60 minutes to less than 10 minutes.
Select Specific Columns vs. Select *
As mentioned earlier, the object of optimization is to reduce I/O. This applies to both front-end and
back-end or server processes. One commonly overlooked I/O is the communication, through pipes,
between the front-end and the back-end. When a select is executed the front-end writes the select
to the back-end, the back-end executes the select and then writes the results back to the front-end.
When selecting information from a table it is very common to do the following select:
select * from customers
If all of the columns from the customer table are not needed then this select is inefficient. If only a
few rows are returned it is not a problem, but when several thousand rows are returned and the size
of each row is several hundred bytes the writes across the pipe start to add up. It is therefore more
efficient to only select the columns that are necessary.
Inserts with Select Statements as Value Lists
It is also helpful to reduce the number of statements passed, or written, to the server. For example,
if we were to select data from a table and then insert it into another table, the code might look as
follows:
declare c1 cursor for
select * from customer where state = "CA"
foreach c1 into rec.*
insert into some_table values( rec.customer_id, ... )
end foreach
This would cause data to be passed to the front-end and the insert to be passed to the server for each
customer record. An alternative to this would be:
insert into some_table values
select customer_id, ...
from customer
where state = "CA"
This inserts all of the rows that would have been inserted in the previous section of code. In this case,
however, only one statement is passed to the server and no records are returned to the front-end. The
problem with this is that there is no control over errors in the insert. If one record fails to be inserted,
the rest of the records will not be inserted. In the first example, error handling could be added to
process errors in the insert and continue inserting subsequent rows.
Outer Joins
Another way to reduce the number of statements passed to the server is to use outer joins. If we
wanted to select all of the customers and we also wanted to see how many orders they had we might
write the following code:
declare c1 cursor for
select customers.cnum, customers.name
from customers
foreach c1 into p_cnum, p_name
select count(*) into p_count from orders where cnum =
p_cnum
output to report xyz( p_cnum, p_name, p_count )
end foreach
This would execute the select from orders for each customer record. Using the outer join the code
would look as follows:
declare c1 cursor for
select customers.cnum, customers.name, orders.cnum,
count(*)
from customers, outer orders
where customers.cnum = orders.cnum
group by customers.cnum, customers.name, orders.cnum
foreach c1 into p_cnum, p_name, p_orders_cnum, p_count
if ( p_orders_cnum is NULL )
then
let p_count = 0
end if
output to report xyz( p_cnum, p_name, p_count )
end foreach
The outer join tells the server to retrieve all of the rows from the customers table and then to retrieve
any corresponding rows from the orders table if they exist. If no corresponding rows exist on the
orders table a NULL value is returned for orders.cnum. So in the code we check the value of
p_orders_cnum to see if it is NULL. If it is then there were no orders for that customer. This
reduces the number of statements sent to the server and thus improves performance.
Indexes and Order Bys
The last optimization technique is using indexes to perform order bys. The optimizer gives priority
to indexes that will help the server reduce the number of rows fetched( i.e. the where clause ). It then
looks at the order by clause to see if the indexes being used for the where clause will produce the
correct order. Remember that doing an index read returns the rows ordered by the key of the index.
If the index will not produce the correct order, a temporary sort file is used. Sometimes it is more
efficient to do an indexed read on an index that is not used in the where clause but does produce the
correct ordering. For example, if we want to see all of the customers with a status of "A" and we
want it ordered by name, the select would be:
select * from customer
where customer_stat = "A"
order by customer_name
If there is an index on customer_stat and customer_name the optimizer will choose the index on
customer_stat to access the customer table and a temporary file will be used to sort the records ( this
can be seen by examining the output of set explain ). In this case let's assume that 90% of the
customers have a status of "A". This means that we will retrieve 90% of the records into a temporary
file. This is not very optimal if there are 500,000 records in the table( that would be 450,000 records
in the temporary file ).
It would be better if the optimizer used the index on customer_name since the index would read the
records ordered by customer_name and thus there would be no need for a temporary file. In order
to achieve this we must "force" the optimizer to use the index on customer_name is to include
customer_name in the where clause as follows:
select * from customer
where customer_name >= ""
and customer_stat = "A"
order by customer_name
This should cause the optimizer to use the index on customer_name to satisfy the where condition
and the order by. Since all strings are greater than or equal to the empty string, the additional
condition on customer_name does not affect the results of the query. If the optimizer still insists on
using the index on customer_stat, which in some cases it may, the check for customer_stat = "A" can
be removed from the select and put in the code. Any row returned that has a status not equal to "A"
can be skipped.
These are just some of the techniques that can be used to optimize database queries. There are many
other techniques that can be learned through practice( a.k.a. trial and error ). The best way to learn
new techniques is to try various options. Think of alternative ways to perform queries that might not
seem immediately obvious. The key is to remember that you are trying to improve performance by
reducing I/O. This means reduce the number of rows read, reduce the communication between front-end and back-end and reduce the need for temporary files.
The other thing to keep in mind is to always use the output of set explain. It tells you exactly how
the server will perform your query and whether it will use indexes, sequential scans, temporary files,
etc. It is the most useful tool available for optimizing queries.
Kevin Fennimore
UCI Consulting Inc.
(919) 644-1073
Informix DRDA- DB2 Gateway
by Israel Gotay
Reprinted from the April edition of Computing Solutions with permission
from Technical Enterprises, Inc. (NASPA) Copyright 1994
(Note - this articule is not available online)
This Newsletter is published by the Washington Area Informix Users Group.
Lester Knutsen, President/Editor
Washington Area Informix Users Group
4216 Evergreen Lane, Suite 136, Annandale, VA 22003
Phone: 703-256-0267
lester@access.digex.net
|