Other Issues

IIUG Developer's Desktop - The Newsletter For IDS Developers
Volume I, Issue 3

In This Issue

From the Editor - News and Notes
Enhanced Error Message Lookup Using the Basic Text Search Extension
Extended Data Type Storage


From the Editor - News and Notes

It's time to talk about our current notes and News:

Those are the news and notes for now. I hope you enjoy this issue.

Mark Jamison
Editor IIUG Developer's Dekstop


Enhanced Error Message Lookup Using the Basic Text Search Extension. By: Guy Bowerman

The usual method for looking up an Informix error message is to use the finderr utility. On UNIX this is a command line utility that takes an error number as an argument, and on Windows it is a GUI Winhelp based utility. Suppose you want to do more than just find an SQL error by number - perhaps you are troubleshooting a problem and want to which errors are related to referential key contraints, or VII indexes.

A simple way to provide a more sophisticated search for Informix error messages is to load the error numbers and their definitions into a database and create a BTS or Basic Text Search index on the text. The Basic Text Search extension is an IDS 11.10 feature that utilizes the power of the cLucene open source text search engine to support searching for words and prhases in unstructure text.

What follows is a step by step guide to creating a database to index the Informix error messages and search them using the BTS extension.

1. Create an External space

The Basic Text Search extension makes use of an external dbspace to store index information. An external space is a special type of dbspace which uses the filesystem for storage. To create an external space, create a file system directory and use the onspaces "-x" parameter:
2. Create a BTS Virtual Processor

The Basic Text Search engine runs in its own Virtual Processor class, so to use BTS add a VPCLASS entry to the onconfig file: Note that only a single BTS VP is supported in IDS 11.10. For the onconfig parameter to take effect you will need to restart IDS.

3. Create the Error Message database

The finder database will use a single LVARCHAR column table with one row representing one error message (including number and text). In the example below no dbspace is specified in the CREATE DATABASE statement, which will mean the database is created in the root dbspace. Typically I'd put "IN dbspace_name" after the database name.
4. Load the error message data

The list of Informix error messages is supplied with the IDS media in a file called errmsg.txt. In distributions using the standard US locales the file is found here: $INFORMIXDIR/msg/en_us/0333/errmsg.txt or ( %INFORMIXDIR%\msg\en_us\04e4\errmsg.txt on Windows).

In order to convert the errmsg.txt file into a format that can be loaded into the errtable table add a pipe "|" character between each message, and escape every end-of-line with a backslash "\" character. Unfortunately this conversion involves more than a simple sed script because the number of blank lines between messages is not always consistent. For your convenience a converted file has been placed here: errmsg.unl.
5. Register the BTS Extension

In order to create a BTS index it is necessary to register the BTS DataBlade module with the finderr database. This is done by running the blademgr command line utility as user informix: Note: On Windows the blademgr command line utility is not provided with IDS so the recommended way to register a DataBlade module for IDS running on Windows is to use the GUI blademgr utility provided with the DBDK (DataBlade Deverloper's Kit).

5. Create the BTS Index

With the BTS extension registered and an external dbspace in place, a BTS index can be created on the errcol column as follows: In this example bts_lvarchar_ops is an operator class which corresponds to the LVARCHAR data type. An operator class represents a set of functions for IDS to associate with the BTS access method. A similar operator class exists for every data type that can be represented as text (BLOB, CHAR, CLOB, LVARCHAR, NCHAR, NVARCHAR, VARCHAR).

The (delete='immediate') clause is an optional clause that tells IDS to remove index information automatically after every delete operation. Without this clause, indexes on dynamic data will need to be manually optimized using the bts_index_compact() function. For a static database such as the one in this example this clause effectively makes no difference.

The BTS index can take a few minutes to create, and once created the database is ready for queries.

6. Running Queries

With the index in place, queries on the errcol column would take the form: where "xxx" represents a search string that conforms to the Lucene search syntax. For example, suppose you wish to search for error messages which contain the words "unlock" and "table" but not "transaction", the query would be: A way to replace the command line finderr utility would be to write a shell script which constructs a similar query based on the command line arguments: Here is an example of the script in action: Note: To use this script in production it would also need to escape characters which are treated as special characters by the BTS engine. For example with the script in its current form, to look up an error number like -291, it would be necessary to escape the negative sign twice, once for the command shell and once for the BTS parser: A good way to use an index like this would be as part of a PHP or Java web application that could parse any special characters before constructing the query, and be available to all users. Another method I like to use is to make the error message database available via an instant message chat bot.

Summary

The finderr database was created and loaded using the following SQL commands: The Blade Manager was then used to register the BTS module to the finderr database, and the index created with: The BTS extension is a fast and flexible way to make freeform text searches available in applications. It is also undergoing continued development, look out for enhancements in future releases of BTS such as user configurable stop words.

References

Apache Lucene - Query Parser Syntax
Introduction to Basic Text Search Datablade - IDS Experts article
BTS documentation in the IBM Informix Database Extensions User's Guide.



Extended Data Type Storage  by: Paul Watson

This article describes how opaque, row and collection types are stored on disk

Storing Opaque Types

What is Byte Alignment on Opaque Types?  The first thing to consider when trying to understand byte alignment is how the data is parsed, so consider the following c structures:

struct {

       integer i1;

       short s1

       char c3;

       char c4;

} mystruct1_t

struct {

       integer i1;

       short s1;

       char c1;

} mystruct2_t

struct {

       integer i1;

       char c1;

       short c1;

} mystruct3_t

The size requirement  for each of  the structure elements are strictly defined, which means the size of the example structures can easily be calculated. The size of mystruct1_t is 8 bytes and the size of mystruct2_t  and mystruct3_t are 7 bytes each.  When these structures are in memory it is inefficient to parse the structure on a byte-by-byte basis, it is much better to parse by the word. A word is the term the operating system uses for a group of bytes.  Usually a word is the same size as an integer but it can and does vary depending on the operating system. In a 32-bit operating system, each word and the integer are 32 bits, or 4 bytes, in length.  In a 64-bit OS, words and integers (corresponding to the C 'long' type)  are 8 bytes long

With a size of 8 bytes mystruct_1 is perfect; the structure fits into one or two words.  However, mystruct2_t and mystruct3_t are both 1 byte short of a word and so these structures have a 1 byte pad when stored in memory ensuring the structure begins on a word boundary.

So that should be easy, but life is never easy and things are a little more complicated. On 32 bit systems the internal alignments are 1, 2 or bytes, for 64 bit systems the values are 1, 2,  or 4 and depending the system  8 bytes.   Because the alignment value depends on the data type the padding can be differ between structure of the same size but different elements. The ‘alignment rules’ mean a structure component always has to fit in an alignment that is appropriate for its data type.

Type

Size

Alignment

Char

2

2

Short

2

2

Integer

4

4

Float

4

4

Double [1]

8

8 (Windows)

4 Linux

 

1: Assuming a x86 processor                                                                                                                                     

So how can the actual physical structures be viewed?  The unix command od   can be used to see the structure on disk and for the structures defined earlier they would look like the following on a 32 bit system.

 

word

Word

 

byte

byte

byte

byte

byte

byte

byte

Byte

mystruct1_t

integer

short

char

char

mystruct1_t

integer

short

char

pad

mystruct1_t

integer

char

pad

short

 

Prior to IDS 9.x, Informix stored all information in a data stream. That is, no padding was done so that tuples could be stored as compactly as possible. When the data was read into memory, the SQL parser read the data and padded any extra bytes required for word alignment in memory.

In IDS version 9.x, data is not compacted for opaque data types; the opaque type is regarded as a black box with no compaction or alignment.  This makes it necessary for the designer of the opaque type to handle alignment issues

Storing Row Types

Firstly, there is no difference in between the way named row type columns and unnamed row type columns are stored.

Since row type data is nothing more than a structure consisting of other data types, it is simply stored in the same way that it's element data is stored.  But, there is always a but, but row types always begin on a 4-byte boundary with at least one extra and unused byte at the start of the row type.  This means that if previous column ends on a 4-byte boundary then the column has 4 unused bytes added and the next column starts on the next boundary.

When tables are created using ‘OF TYPE’ syntax then columns will have the same name as the specified row type.  Once the table is created the row type is no longer associated with the table.

Storing Collection Types

As long as the collection fits on a single page then it is stored like any other non-blob data type, i.e. in a tblspace data page. If the collection size is greater than a page then it will be written to a blobpage in the tblspace and a descriptor maintained in the data row to indicate the collection column is in a blob.

Although not always used, the space required for the descriptor is allocated and is written to the row.  So when a collection is small, the collection data is stored as a part of a descriptor.  For large collections the descriptor points to the location of the blobpage that holds the descriptor data.  A flag in the descriptor maintains the how the data is stored, either in blobpage or tblspace.

The oncheck output below shows a simple collection of three items (“nn”,”ii”,tt”) that fits easily in a single page

oncheck -pD mytest:mycollection

TBLspace data check for mytest:informix.mycollection

page_type  rowid    length fwd_ptr

HOME       101      81     0

   0:61 61 61  0 11  0  0  0  0  0  0  1  0  0  0  9 aaa.............

  16:6e 6e 69 69 74 74  0  0  0  0  0  0  0  0  0  0 nniitt..........

  32: 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0 ................

  48: 0  0  0  0  0  0  0  0  0  0  0  0  0  1  0  0 ................

  64: 0  0  0  0  0  0  0  0  0  0  0  0  0  0 62 62 ..............aa

  80:61                                              a...............

TBLOB:  tblspace bstamp  flags

        0        0       1     BLOBISNULL

The underlined area show the mostly unused descriptor area and the flag set indicating the collection values are in row i.e. the tblspace.

In the next example the collection data will not fit in row and is stored in a blobpage (0x1201ef) and the row within the page (0x101)

 

oncheck -pD mytest:mycollection

TBLspace data check for mytest:informix.mycollection

page_type  rowid    length fwd_ptr

HOME       301      83     0

   0:63 63 63 63 63 63 63 63 63  0  d  0  0  0  0  0 ccccccccc.......

  16: 0  0  0  2  0  0  0  0 ff ff  7 a7  0 12  1 ef ...........'...m

  32: 0  0  0  0  0  0  9 a8  0  0  9 a8  0  0  1  1 .......(...(....

  48: 0  0  0  0  0  0  0  0  0  0 fe a2 ff ff  0  8 ..........~"....

  64: 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0 ................

  80:63 63 63                                        ccc.............

TBLOB:  tblspace bstamp  flags

        1201ef   -350    8

       addr   family vol    size   bstamp coloff flags  type   medium

       101    0      0      2472   -350   1959   8      PNBLOB FIX_MAG

BLOBPAGE: addr     size     bstamp  nbpage   nbstamp

          101      2008     -350    201      -346

          201      464      -346    ffffffff 0

 

Further Reading

http://www-128.ibm.com/developerworks/library/pa-dalign/