Wednesday, January 13, 2016

Performance: Importing text or csv files in AX.

Importing text files. Is there anything left to say about that? Well, obviously there is since I am about to do it.

I was asked to look at an import procedure that was once created by a long gone developer who (surprise, surprise) had not left any documentation. The AX code was confined to a single class that runs nightly as a batch job and took about 4 hours if (and only if) it finished to conclusion without errors.

My mission: Make it better than it was: Better, stronger, faster!

(and if you recall that line, you too were born in a time when dinosaurs roamed the great planes)

No youngsters, this has unfortunately nothing to do with my salary. Anyway, back to the nuts and bolts and time for some reverse engineering.

The original code was a series of import methods like this:

void readxwnaaa()
{
    asciiIO         io;
    container       rowValues;
    Filename        filename = @'\\someserver\ccu\XWNAAA.txt';
    XWNAAA          xwnaaa;
   
    VehicleType     vehicleType;
    RecordIdNumber  recordIdNumber;
    SeqNumber       seqNumber;
    Kritnr          kritnr;
    KritWert        kritwert;
    boolean         isNew = false;
    ;

    io = new asciiIO(filename, 'R');
    io.inFieldDelimiter(';');

    while(io.status() == IO_Status::Ok)
    {
        rowValues = io.read();

        if (rowValues)
        {
            vehicleType     = strRTrim(strLTrim(conpeek(rowValues, 2)));
            recordIdNummer  = conpeek(rowValues, 1);
            seqNumber       = conpeek(rowValues, 3);
            kritnr          = conpeek(rowValues, 4);
            kritWert        = strRTrim(strLTrim(conpeek(rowValues, 5)));

            ttsBegin;
            xwnaaa = XWNAAA::find(VehicleType, recordIdNummer, seqNumber, kritnr, kritWert, true);

            isNew = ! xwnaaa.RecId;
            if (isNew )
                xwnaaa.clear();

            if (conLen(rowValues) >= 1)
                xwnaaa.RecordIdNummer     = conpeek(rowValues, 1);
            if (conLen(rowValues) >= 2)
                xwnaaa.VehicleType        = strRTrim(strLTrim(conpeek(rowValues, 2)));
            if (conLen(rowValues) >= 3)
                xwnaaa.seqNumber          = conpeek(rowValues, 3);
            if (conLen(rowValues) >= 4)
                xwnaaa.Kritnr             = conpeek(rowValues, 4);
            if (conLen(rowValues) >= 5)
                xwnaaa.KritWert           = strRTrim(strLTrim(conpeek(rowValues, 5)));

            if (! isNew )
            {
                if (
                    xwnaaa.Kritnr             != xwnaaa.orig().Kritnr  ||
                    xwnaaa.KritWert           != xwnaaa.orig().KritWert  ||
                {
                    xwnaaa.update();
                }
            }

            else
            {
                xwnaaa.insert();
            }
            ttsCommit;
        }
    }
}

There are a few things questionable about this code, but from a performance perspective the problem is with database round trips.

What happens is the following sequence:

Read record from file > process record > insert/update record in database. [repeat]

That’s all fine and dandy for a manageable collection of records, but in this particular case we’re processing close to one million records, which takes close to four hours on the batch server. Four hours in which lots of things can go wrong.

Faster is not just better because it takes less time. Faster also means that there are fewer interactions with other processes fighting over limited resources and therefore faster means more reliable.

To speed things up a notch, I want to do as much processing in memory as possible.Wouldn’t it be good if we could build a table in memory, do what we have to do and then move the whole thing to our database in one mighty blow?

Bet you know where this is going. In comes the Record Sorted List and it does just that.

The RecordSortedList class inserts multiple records in a single database trip.

Now if only there was a way to delete all records that are currently in the table without having to go through one million delete roundtrips…

Tell me, DAX Whisperer, does such a thing exist?

You can delete multiple records from a database table by using a delete_from statement. This can be more efficient and faster than deleting one record at a time by using the xRecord .delete method in a loop.”

What’s more, you can use delete_from to delete ALL the records in a table, simply by omitting a selection. I heard unconfirmed rumors that the downside of this command is that it doesn’t report back when it is finished, which potentially can lead to a false assumption that a table is empty when in fact it is not (yet). Something to bear in mind, but I have not witnessed any unexpected results.

So armed with my new class and a full clip of DeleteFroms, I reconstructed the code to:

void readxwnaaa()
{
    asciiIO     io;
    container   rowValues;
    Filename        filename = @'\\someserver\ccu\XWNAAA.txt';
    XWNAAA          xwnaaa;
   
    io = new asciiIO(filename, 'R');
    io.inFieldDelimiter(';');

    rsl = new RecordSortedList(tableNum(XWNAAA));
    rsl.sortOrder(fieldNum(XWNAAA, VehicleType)
                , fieldNum(XWNAAA, RecordIdNumber)
                , fieldNum(XWNAAA, SequenceNumber)
                , fieldNum(XWNAAA, Kritnr)
                , fieldNum(XWNAAA, KritWert));
    ttsBegin;

    while(io.status() == IO_Status::Ok)
    {
        rowValues = io.read();

        if (rowValues)
        {
            xwnaaa.clear();
            if (conLen(rowValues) >= 1)
                xwnaaa.RecordIdNumber     = conpeek(rowValues, 1);
            if (conLen(rowValues) >= 2)
                xwnaaa.VehicleType        = strRTrim(strLTrim(conpeek(rowValues, 2)));
            if (conLen(rowValues) >= 3)
                xwnaaa.SequenceNumber     = conpeek(rowValues, 3);
            if (conLen(rowValues) >= 4)
                xwnaaa.Kritnr             = conpeek(rowValues, 4);
            if (conLen(rowValues) >= 5)
                xwnaaa.KritWert           = strRTrim(strLTrim(conpeek(rowValues, 5)));
            rsl.ins(xwnaaa);
        }
    }
    try
    {
        delete_from xwnaaa;
        rsl.insertDatabase();
        ttsCommit;
    }
    catch
    {
        exceptionTextFallThrough();
    }
}

Now doesn’t that look a lot better?

What happens now is that the contents of our .csv file are read into a record sorted list which is quite coincidentally sorted exactly like the cluster index of the xwnaaa table. When and only when we are done with this, then the record sorted list is moved over to the database with one mighty insert, but not before we wiped out the contents of this table completely.

Okay, I changed a few more things, but let’s stay focused on the subject. For our daily load of 900,000 records the old version took just under 4 hours to process.

The new version does the job in….(drum roll)… just over 4 minutes.

I say that’s an AXcellent result. Thank you, record sorted list.

No comments:

Post a Comment