Future Lab - datagen GPL Application

datagen GPL Application

The datagen application generates random data files. This application is used mostly to generate test data for other applications.

Datagen generates delimited records consisting of a series of delimited fields that, in turn, consist of a series of numbers. By default, Datagen generates CR/LF delimited records consisting of fields delimited by a space. The program syntax is:

   datagen -n -snnn -fnnn -rnnn -l -v maxrec maxkey_char char_range outfile

Where maxrec is the number of records to generate, maxkey_char is the number of fields in each record to generate, char_range represents the highest random number to generate for each field and outfile is the name of the output file.

The switches allow you to modify the default behavior of the program. Each switch is prefixed with the switch character which is different depending on the platform in use. The Linux/Unix/QNX switch character is '-' while the DOS/Windows switch character is '/'. The switch option is not case sensitive.

The 'n' switch tells datagen not to generate a sequence number. Normally datagen generates an extra field right at the beginning of each record consisting of a sequence number which, by default, starts from one (see the 's' switch to start the sequence number somewhere else).

The 's' switch modifies the start sequence number to the one specified

The 'f' switch changes the field delimiter character (default is a space [f32]). The field delimiter is expected as a decimal number representing an ASCII character.

The 'r' switch changes the record delimiter character (default is CR/LF). The record delimiter is expected as a decimal number representing an ASCII character.

The 'l' (el) switch causes datagen to generate an extra field at the beginning of each record (before the sequence number [if any]) which contains the length of the record.

The 'v' switch causes datagen to generate variable length fields. By default, datagen will generate each field as a five digit number padded with zeros on the left if necessary. By adding the 'v' switch, the field will only be as long as the significant digits and the leading zero will not be generated.

Given the following datagen syntax:

   datagen 10000 5 300 test.dat

Datagen will generate the output file test.dat consisting of 10,000 records with each record consisting with five (actually six since the sequence number field is on by default) fields with each field having a fixed five digit field from zero to 300. The file will look something like:

   00001 00155 00193 00240 00295 00014
   00002 00099 00178 00204 00002 00280

An excellent use of datagen is to generate test data for the Bbuuzzb database engine. To do this, datagen should be run in the following manner:

   datagen -f254 -r255 -l -v

Which will cause datagen to use a record delimiter of ASCII 255 (which is Bbuuzzb's record delimiter), a field mark of ASCII 254, a field at the beginning of each record with the record length and variable length fields. Here is a sample command line that will generate a data file for the Bbuuzzb database engine:

   datagen -f254 -r255 -l -v 100000 25 800 test.dat

Which will generate a data file of 100,000 records with each record having 26 variable length fields (25 plus the sequence number [there are actually 27 including the length field but Bbuuzzb does not count this field, it is used as a record header]) that have a range of zero to 800.

The datagen application can be compiled for all stated platforms. Under the Linux/Unix/QNX operating systems, datagen can be compiled using the cl script.

Goto Top | Future Lab Home | Contact Webmaster | Feedback