Failing that advice, try:
One of the most common problems experienced is that transaction logging becomes a major issue. If you do have to use logging, HEED THESE WORDS:
LTXHWM 80 # Long TX high-water mark (percent) LTXEHWM 90 # Long TX exclusive high-water mark (percent) to: LTXHWM 70 # Long TX high-water mark (percent) LTXEHWM 80 # Long TX exclusive high-water mark (percent)
A small band of merry men ran forth and insisted that these LTX* figures need to go still lower:
But Jim Frison (firstname.lastname@example.org) clearly had other experiences:
Long transactions and completely locked up Informix systems have been the bane of my existence for the last two years. I strongly recommend that the defaults be revised to lower values. We gradually reduced them to their current levels (LTXHWM=65, LTXEHWM=75) and I am tempted to take them to 60,70 to feel safer. We have 8 logs and definitely ran into trouble at 70, 80. Change the defaults.
Note: Ken Miles (email@example.com) uses:
LTXHWM 40 # Long TX high-water mark (percent) LTXEHWM 65 # Long TX exclusive high-water mark (percent)
They deadlocked with the 70/80 settings. Indeed, Clem W. Akins (firstname.lastname@example.org) claims that the high-water marks should be less than 50%, to allow a roll-back. Have enough logs that whatever number you choose is enough for your largest long transaction. 70/80 is not nearly low enough.
Johnathan Leffler (email@example.com) suggests that:
That with a reasonable log configuration (say more than 4 logs), the 70/80 limit is very unlikely to run you into trouble, unless you are mass inserting or deleting extremely small records (say 1-8 bytes, though I'm tempted to suggest 1-4 bytes). I do not recall seeing systems run into trouble with 80/90 figures which are the default in tbconfig.std.
Reducing these parameters to less than 50% is not something I'd recommend; there is definitely an upper-limit on how much log space you can require to do a rollback, because the CLM records (which are the cause of the trouble) are of finite size (16 bytes, if memory serves me right), and there is at most on CLM for each operation performed by the user, and each operation has a 12 byte header plus some data, such as the whole record which is inserted or deleted, or both the before and after image for an updated record. Unless these records are tiny, the amount of space used to generate the LTX dwarfs the amount of space needed to rollback.
I suppose that if you have multiple users who regularly and simultaneously start operations which become LTX transactions, then you may have problems. But that is at least partly a question of education, and at least partly a question of operational controls on the OnLine system.
I consider that you have to be trying quite hard to run into trouble with LTXHWM = 70, LTXEHWM = 80. I also admit that the problems which occur if this is wrong are so severe that erring on the side of caution is advisable.
I (firstname.lastname@example.org) cheekily ask: Definitive enough for you? ;-)
(email@example.com (Clem W. Akins)):
Re-write deletes/purges to commit every row
If the logs get filled up, manually run the auto-backup from tbmonitor to free them. This works when the tbtape dies, and the logs fill from regular use (maybe not from a single huge event).
Be careful using temporary tables. Use the WITH NO LOG clause when selecting into a temporary table.
Depending on your application and the number of users you will have concurrently, increase this. A lot. The max is one quarter million, default is 2,000. I always set mine to be >100,000 and <200,000. Remember, a massive delete (such as can be easily run from SQL) can generate tens of thousands of locks in a hurry. This will cost you some memory but not much.
If you run out of locks, and the rollback of the transaction cannot complete because it runs into the Long Transaction problem, the entire Online session may get corrupted and could require a complete reload.
If you use the SQL user menu to launch another informix application (one you wrote in 4gl, say) the user will have consumed 2 licenses! One for the menu, and one for the application. You can quickly run out of licenses this way. We use a shell script for the front-end user menu and have it run one informix application at a time.
Store different parts of your database on different spindles on different controllers. For example, store the root db, the log space, the data tables all on *different* disks, and if your bottleneck then becomes controller I/O, run these different disk drives on different controllers. Ensure your tables are in contiguous disk space by
Doug Mason explains:
There are several parts necessary to get an ODBC-compliant program talking to your Informix Database:
Li Chung Yuen (firstname.lastname@example.org) helpfully lists this database configuration :
The Engine The Client Informix 5.01 MS Access 2.0 I-Star 5.0 Q+E ODBC 1.2 I-Net 5.0 PC/TCP 2.2
... but regretfully informs us: It works, but the performance is not good.
On 24th June 2002 email@example.com (Fernando Ortiz) wrote:-
After several calls to Informix Tech Supp the problem I found was the existence of several OLD sempahore files in /INFORMIXTMP.
The solution is to remove all the files in /INFORMIXTMP before starting the engine.
I added 'rm -rf /INFORMIXTMP/VP*' to the end of my /etc/rc.d/rc.local.
The following SQL error codes can occur because of locking problems:
-233: Cannot read record that is locked by another user. -240: Could not delete a row. -244: Could not do a physical-order read to fetch next row. -245: Could not position within a file via an index. -246: Could not do an indexed read to get the next row. -250: Cannot read record from file for update. -263: Could not lock row for update. -271: Could not insert new row into the table. -346: Could not update a row in the table. -348: Could not read a row from the table. -378: Record currently locked by another user.
The following ISAM error codes can occur because of locking problems:
-78: Deadlock detected -79: No record locks available. -107: ISAM error: record is locked. -113: ISAM error: the file is locked. -134: ISAM error: no more locks. -143: ISAM error: deadlock detected. -144: ISAM error: key value locked. -154: ISAM error: Deadlock Timeout Expired - Possible Deadlock.
Errors -78 through -154 can also be shown as positive values under some circumstances. The error message files record the same message for both -78 and +78. This is not true of the SQL error messages.
If you are working on System V based machines (rather than BSD), the system error number leads to certain errors being mis-reported. The locking errors which may cause problems are:
-45: SysV -- Deadlock detected -45: BSD -- Operation not supported on socket -46: SysV -- No record locks available -46: BSD -- Protocol family not supported. -56: SysV -- File locking deadlock error -56: BSD -- Socket is already connected.
In each case, the error would be reported using the BSD message, but would actually have the System V meaning.
This list was generated from the document "Informix Error Messages" (December 1991) Part No 000-7121. Neither of these list is guaranteed to be definitive, nor are they guaranteed to remain unchanged with new versions of the software. However, the document used includes Version 5.00 error messages as well as 4.10 error messages.
The controversy over the kills has to do with what Informix is trying to execute at the time. A badly timed abortion of some process can do damage to the activities of updating and logging the transactions in updates to one or more databases under the engines control.
This can leave things an in unrecoverable state.
(firstname.lastname@example.org (Dave Kosenko))
Of course, killing a front end process is different from killing an engine (generally, it is safer). You BEST bet is, of course, to code signal handling into your application. This means catching signals in your esql/c programs and using the DEFER command in 4gl (or not using DEFER and just using the interrupt to terminate it). In esql/c, catch your signal, then issue a sqlint() followed by an sqlexit(), then exit the application.
Now it's time for a bit of truth regarding engines and signals. The following signals are IGNORED by sqlturbo (as of 5.0):
SIGHUP SIGINT SIGQUIT SIGTSTP
You may send any of these signals to your heart's content; however, it will do nothing to stop a sqlturbo.
Three signals are caught:
SIGTERM SIGUSR1 SIGPIPE
Each of these cause a different action on the part of the sqlturbo. SIGUSR1 is what is used by tbmode -z. However, tbmode also diddles with some shared memory flags before sending this signal. If you send it yourself, it will generally not have any effect since the appropriate shared memory flags have not been set. Anyway, there is no advantage to sending SIGUSR1 yourself over letting tbmode -z do it for you.
To understand the remaining two, you need a basic understanding of how sqlturbo works. Basically, once it gets started up, it will sit in an infinite loop reading messages from the pipe (from the client process) and taking actions based on that message. When the action is complete, and you come back to the top of the loop, a return message will have been placed on the pipe (which the client reads and processes). Normally, this loop will only be terminated when you exit your client application (or use sqlexit(), which has the same effect).
A SIGPIPE is generated whenever a process attempts to write to a pipe that has no reader. We catch SIGPIPE and set a flag. At the top of the loop mentioned above, this flag is checked. So, if the client goes away (so there is no reader on that end of the pipe), when sqlturbo tries to write its return message to the pipe, a SIGPIPE is generated, we set the flag and continue. Since the very next action after writing anything to the pipe is to return to the main loop, the flag is checked and the loop is exited. sqlexit() accomplishes this by simply closing the client end of the pipe.
Generally, whenever a client goes away, the engine should follow suit. If it is blocked on a read from the pipe (waiting for the next request) it will get an EOF and terminate. If it attempts to write a return message to the pipe, it will get a SIGPIPE. When the engine is busy doing work when the client goes away, though, the situation changes. Until it attempts a write on the pipe, or tries to read from the pipe, it will not know that the client has gone. Eventually, it will get around to its write and terminate, but that could take a while if it is doing something that takes a lot of time.
Finally, we have our SIGTERM. This signal is caught and another global flag is set. This flag is checked all over the place in the sqlturbo code, and control is passed back to the main loop (with a message indicating that the statement was interrupted passed back: -213); the engine goes back to blocking on the read from the pipe. Now the flag is not checked after every statement; more like after every "unit of work" where the definition depends on the type of activity. So it may not have an immediate effect, but it will have an effect pretty quickly. If the client has gone away, the write of the -213 to the pipe will generate a SIGPIPE and the loop will be terminated, or the read will get EOF and the loop will be terminated.
So, if tbmode -z does not work, and you cannot shut your system down (really the best alternative), the next best thing to do is send a SIGTERM to the sqlturbo (causing it to interrupt its current work) followed by a SIGPIPE (if the SIGTERM alone doesn't do it). If this combination has no effect after a reasonable amount of time, you really should do a shutdown. Any other signal sending is likely to bring the system down anyway (abort mode).
(email@example.com (Jonathan Leffler)):
As a matter of idle fact, killing an sqlexec (SE) is even worse than killing an sqlturbo (OnLine). With OnLine, there is a supervisory process (tbinit) which will detect the problem and either clean up after the dead process or bring the system to a halt before any damage is done. With SE, there is no such supervisory process, and if an sqlexec dies part way through a transaction, the data will appear committed for ever more. Until you try to do a rollforward database, that is. But by then, who knows how many decisions have been made on the faulty data which wasn't rolled back? This too is a controversial point of view. Note that this applies strictly to the SE engine, not to an application running it. That can die with impunity as long as the Engine firstname.lastname@example.org (Brent Jackson) writes:
With OnLine 4.00 we had difficulty with tbmode -z. We found that kill -1 and -15 were ignored by the process, and kill -9 could bring the engine down if it hit at the wrong time. However, we found that a kill -8 was a reliable and safe way of killing the process.
Note however, since OnLine 4.10 (through to 5.06) we have had no problems with tbmode -z, though (as already mentioned) it can sometimes take awhile (~ 30 minutes) to work.
The files suffixed with "-cr" in $INFORMIXDIR/etc purport to give the versions of your current software. eg:
cat $INFORMIXDIR/etc/OnLine-cr (Version of Online)
$INFORMIXDIR/lib/sqlturbo -V (Version of Online) $INFORMIXDIR/lib/sqlexec[d] -V (Version of SE) fglpc -V (Version of 4GL-RDS) ... you get the idea ...
A handy little shell from Paul T. Pryor (email@example.com) is included in Appendix H which will list all versions of any Informix products you have installed.
SELECT * FROM systables WHERE tabname = " VERSION"
This has the number of the last installed version of the engine. If that version has been de-installed then this data will be incorrect.
Prior to Informix 6.0, the engine used a locking mechanism which would involve locking rows adjacent to the row actually being locked.
Attached is the edited version of the Tech Info describing the Informix Twin Problem for pre-6.0 engines.
Albert E. Whale firstname.lastname@example.org
The "Twin Problem" is a situation that can arise when inserting and deleting rows in databases with logging. Suppose that two processes, Px and Py, are updating a table that has a unique index. Px deletes a row with key value 8; then Py inserts a row with key value 8. Later, Px decides to roll back its transaction. If we allow Px to roll back, there will now be two rows with key value 8, violating the uniqueness constraint on the index.
Py therefore must not be allowed to insert its row until Px completes its transaction. But if only row locks are used, there is no way to prevent Py from inserting the row. Once Px deletes the row, the lock is gone; a row that is not there cannot be locked.
A similar problem can occur when Isolation Mode is set to Repeatable Read. Suppose Px, using Repeatable Read, locks the set of rows with key value 5. These rows now cannot be deleted or updated. However, new rows can still be added. Py can insert a row with key value 5, or update a row with key value 2 to key value 5. This would violate the Repeatable Read.
If the Repeatable Read selection criteria did not include a filter on an indexed column, the only solution is to place a shared lock on the entire table. This prevents any other process from altering the table while the repeatable read is in progress. However, when the selection makes use of an indexed column, OnLine makes use of adjacent key locking to prevent both this and the "Twin Problem".
What is the effect on the "Twin Problem"? When Px deletes the row with key value 8, it first tests value 8 to make sure it is not already locked, then it places a lock on key value 9, and deletes value 8. When Py attempts to insert a row with key value 8, it first has to test value 9 to see if it is locked. Since Px has locked value 9, Py cannot insert value 8. Note that if there is no value 9, Px will lock value 10, which not only prevents Py from inserting a value 8, but also prevents Py from inserting a value 9. If 8 was the highest value, the "infinity" value is locked, preventing Py from inserting any value higher than 8.
This also protects reads with Isolation Mode of Committed Read or greater, by alerting the reading process to uncommitted insertions or deletions from the selected set. If Py selects all rows "WHERE key_val < 10", and Px has deleted key value 8, Py will encounter the lock on key value 9, alerting it to the fact that there is an uncommitted insertion or deletion of a row. If there is no value 9, 10, or 11, Px will lock key value 12; however, Py will still encounter the lock, because the way in which OnLine determines that it has found all required rows is to read the index values until it reaches the first one which does NOT fit the criteria, which in this case would be 12. Note that Py would also receive a locking error if Py selected all rows "WHERE key_val > 10", even though the row actually deleted was 8.
These examples have dealt with issues on unique indexes. However, adjacent key locking also applies to non-unique indexes, as in the Repeatable Read problem. When Px, in Repeatable Read, selects all rows "WHERE value = 5" on a non-unique indexed value, it locks each row with value 5 as it is read. When it passes the last rowid with indexed value 5, it reads the first indexed value 6, which alerts it that it has completed its search, and locks it as well. If Py now attempts to insert a row with value 5, it will first test the adjacent key. If the rowid of the new row is less than that of any one of the existing rows with value 5, it will test the key of the first existing row with rowid greater than itself, and find it locked, as the Repeatable Read locked all the keys for value 5. If the new rowid is greater than any of the existing rows, it will test the key value 6, and find it locked as well. Thus Py is prevented from inserting any new rows with value 5 until the Repeatable Read is completed.
Another side-effect of adjacent key locking is reduced concurrency of insertions. If Px inserts the key value 9 into a unique index, Py will now receive a locking error if it attempts to insert the value 8.
email@example.com (Colin McGrath) writes:-
The ONLINE_7.2 OS Release notes only mention: 6. OS Patch information: OS version is Digital UNIX V3.2D. The KAIO patch from DEC is required. The patch number for various OS versions are: OS v3.2de1 OSF360-070 OS v3.2de2 OSF365-360070 OS v3.2f OSF370-360070 OS v3.2g OSF375-360070 For OS v4.0 and up, this patch comes with the OS.
Also in Digital UNIX V4.0:-
A data corruption problem can occur when the parameter new-wire-method is turned on. The new-wire-method parameter is only available in V4.0 and later releases. All versions V4.0 and later ship with the default being new-wire-method enabled. Digital UNIX Engineering is investigating the exact cause of the problem; however,until we fully understand the circumstances surrounding this, you should follow the recommended workaround detailed below. All I know right now, is that under very heavy loads (with OPS and DRD) this data corruption problem can occur. Even if you are not running Oracle Parallel Server, we recommend you implement the workaround because we don't fully know the cause of the problem. Systems running Digital UNIX V3.2x are not affected by this problem. BLITZ TITLE: DIGITAL UNIX DATA CORRUPTION WITH SHARED MEMORY It is the Strong Recommendation of Digital UNIX Engineering that this workaround be implemented on all systems running Digital UNIX V4.0 and above. Failure to do so can result in undetected data corruption.
firstname.lastname@example.org (David Williams) writes:-
Recently in comp.databases.informix there have been several reports of performance problems with HP-UX version 10.x. These are problems related to shared memory and have been fixed in 10.20. I would recommend that people upgrade to 10.20.
email@example.com (Ivica Smolic) writes:-
From the release notes of Informix OnLine 7.23 for DEC Alpha: "It is not recommended to set AFF_SPROC to 0. CPU 0 is a master CPU and DIGITAL does not recommend to bind process to this processor 0."
On 8th Dec 1997 firstname.lastname@example.org (Dean Robinson) writes:-
[Editors note: This is about upgrading from Unixware 2.1.1 to version 2.1.2 and problems starting Online 7.23.UC1]
I had the same problem, utiill I rebuilt the kernel as follows;
cd /etc/conf/bin ./idtune -f HCPULIM 0x7FFFFFFF ./idtune -f SCPULIM 0x7FFFFFFF ./idtune -f HFSZLIM 0x7FFFFFFF ./idtune -f SFSZLIM 0x7FFFFFFF ./idtune -f HDATLIM 0x7FFFFFFF ./idtune -f SDATLIM 0x7FFFFFFF ./idtune -f HSTKLIM 0x7FFFFFFF ./idtune -f SSTKLIM 0x7FFFFFFF ./idtune -f HCORLIM 0x7FFFFFFF ./idtune -f SCORLIM 0x7FFFFFFF ./idtune -f HFNOLIM 0x00000800 ./idtune -f SFNOLIM 0x00000800 ./idtune -f HVMMLIM 0x7FFFFFFF ./idtune -f SVMMLIM 0x7FFFFFFF ./idbuild -B
reboot and your informix engine should now work ok
On 13th Jan 1999 email@example.com (Steven L Cooper) wrote:-
oninit: Fatal error in initializing ASF with 'ASF_INIT_DATA' flags; asfcode='-25580'. [Editor's note on SCO Openserver 5.0.4]
Kernel parameters are normally tuned with configure(ADM) on OpenServer, but some of the changes you want to make aren't supported by that, so use idtune(ADM). Something like:
cd /etc/conf/cf.d ../bin/idtune SEMMNI 2048 -max 8192 ../bin/idtune SEMMNS 2048 -max 8192 ../bin/idtune SEMMAP 2048 -max 8192 ../bin/idtune SHMMAX 671088640 ./link_unix -y
On 25th Mar 1999 firstname.lastname@example.org (Heiko Giesselmann) writes:-
To use more than 1.75GB of shared memory on HP-UX 10.20 some HP patch has to be installed. HP should be able to help out with the current id of the patch required.
To use more than 1.75GB shared memory with Informix you have to 'chatr' all executables that attach to shared memory (onstat, ontape, ...). See the man pages for chatr. Please note that onbar cannot be chatr'ed with IDS versions earlier than 7.24.UC5 and therefore onbar won't be able to attach to shared memory if shared memory is created by a chatr'ed oninit.
To improve performance you might also try to do 'chatr +pd64M $INFORMIXDIR/bin/oninit' to increase shared memory page size. That should make a noticable performance difference for instances with larger shared memory segments.
In the good old days of SE, a table with a high level of activity (i.e. deletions/inserts) became riddled with old data row slots which were available for re-write, and the index began to look like swiss cheese. To correct this periodic unload/loads or cluster manipulation or ALTER TABLE (same as an unload/load but more user-friendly) were needed to keep index and data table efficiency at peak. [See section 8.1, part 3]
Is this also true of Online? Or does Online do this automagically?
OnLine has assorted "reclamation" stuff that it does, so that you should not wind up with a "swiss cheese" situation (reuse of freed pages, page compression, etc.).
What OnLine will NOT do is return pages allocated to an extent (or an empty extent, for that matter) back to the dbspace. Once an extent has been allocated to a able, it will always be allocated to that table (unless, of course, you drop or alter the table).
(email@example.com (David Kosenko))
Raw Online space does get very fragmented, especially if you are short of room in the partition. Table extents become very small and numerous and it causes performance problems. Tables can generally have about 240 extents, and it is possible to reach that value and come to a grinding halt.
We regularly monitor all our production systems extents and partition fragmention. Tbunloading and tbloading the database back typically puts all the data for each table in the first extent (if there is enough contiguous room). We then carefully set all the next extents on the tables, to accomodate future growth. If tbunloading is unreasonable because of the database size, we will create an empty new table, with the first extent set, and copy the data from the old table to the new table.
(firstname.lastname@example.org (Naomi Walker))
The optimizer, which is what determines what read strategy to use when searching for data, uses the statistics as a means of determining the most efficient way to read for data. These statistics can quickly fall out of date after a large number of insertions or deletes. Using an 'update statistics' will give the optimizer a better shot at the data.
UPDATE STATISTICS FOR tablename can be run if only one or a few tables have changed - this is particularly true for large databases. Be warned the tables are locked while statistics are retrieved.
Information on how the optimizer has decided to process a query is available in a local file called sqexplain.out after a SET EXPLAIN ON command has been issued. This can be invaluable in improving query efficiency.
You can't. The starting value for a SERIAL column is not preserved anywhere. Once any data has been inserted, there is no way to tell what value was set.
No. Nor with SELECT COUNT(*) either, because by the time you get to execute the SELECT *, someone may have changed the tables.
Stefan Gutsche, email@example.com adds: Another option, faster than SELECT COUNT(*) is to:
1. UPDATE STATISTICS FOR TABLE [Table]
2. SELECT nrows FROM systables WHERE tabname = [Table]
It is really faster than SELECT COUNT(*) and for a secure working you should lock the table caption in exclusive mode.
... or is it just an estimate?
The info in the SQLERRD item is only filled in after you've processed the last row. It therefore isn't much help in the context of SELECT.
It is very useful in the context of:
In all cases, it is an exact number.
With the more recent Engines, the estimates of the cost and number of rows to be returned are also available. For more information look in the Informix Guide to SQL Reference Manuals under 'Error checking with SQLCA' (or under 6.0, the individual ESQL manuals).
The system catalogue table SysColumns documents the type of the column in encoded form using values from 2 columns, ColType and ColLength.
A detailed explanation of how to interpret these fields can be found in Appendix E, while a 4GL function which will interpret the values for you can be found in Appendix F.
If you do not have a copy of the Appendixes, and are a 4GL programmer, you could hack the sample 4GL source in $INFORMIXDIR/demo/fglbe/ex30.4gl