Back Home Up Next
| |
JLabs.Docs.Interbase
Interbase and how we started to use it
Contents
Introduction
Why and how we decided to go the Interbase way
What our first application was about
Reducing database size and other non standard tricks
Balancing the load of the system and using the computational power of client PC's
Advantages and limits of agnosticism
Examples of our typical applications running with Interbase
Final remarks, bugs found and some tricks
Introduction
This page was made just to list some experience with Interbase we gained thru
about three years of intensive use in rather critical applications. We had no time to tune
the English ... This paper is far from complete and even consistent. Anyway it could be
of interest for those that are making their decision about what their future database engine
will be. As you will find in the sequel we will use often the terms "agnostic", "agnosticism"
and the like. Apologies to those for which these terms are having some spiritual value,
we simply took the terms as suitable for our discussion.
We want to stress that we have no general recipe and our experience is only a limited one. Thus
our conclusions need to be judged with this fact on mind. On the other hand we describe
real projects, real failures and real successes. We really struggled with Interbase, ORACLE, WNT
and Linux as described later on. Even when the corresponding companies are selling technical
support the local representants in our country were not able to solve any non trivial problem
we met. This also influenced our decision to use open source products wherever possible.
Even with Linux not everything was nice but this system evolved that quickly so that several
of our problems disappeared spontaneously as we installed newer versions.
At the very beginning we were not sure which way we had to go. We hesitated between proprietary
products with declared technical support and open source products with no formal support
at all. Also our customers were and still are rather conservative and often our products
are the only exception to otherwise totally Microsoft Windows oriented broader context in the
companies we are working for.
Any decision about the right database engine is influenced unfortunately enough by
marketing as well as by not IT people. It is important
to know that a database engine is only a tool and that the ideas and theory around which these
engines are constructed are mostly rather trivial and well known. Moreover the notion of
a relational database is an old one (about 25 years or so). Thus most of the engines are offering
basically the same functionality. The bigger, broader or older the product is the more problems
with backwards compatibility and the bigger is the petrifaction of old-fashioned ideas. That
indicates that sometimes switching to a new leaner product could be a good idea.
We found Interbase to be a rather good tool with extremely small footprint and good functionality.
Now a few words about our company. We are a rather small group creating original software in
various branches - our profile can be found on our home pages.
Our aim is to implement even huge and critical projects using minimum number of people. We do not
believe that good project can be done reasonably when tens or hundreds of programmers
are working together. Their productivity decreases rather quickly to zero as their number surpasses
some constant varying from 4 to 10. That is why we concentrate ourselves to maximize productivity
and re-usability of whatever we ever did. Well we are not alone thinking about what big armies of
programmers are really doing in big companies. As far as we know, many important projects originated
as one-man or few-man shows. Have a look at how Unix, C, troff or other important things were build-up.
Our method can be described as "distilled" abstraction and interpretation. The applications we create are strictly
database engine agnostic and even the application in question agnostic. We always want that
the application in question could do easily more than requested using always more abstraction that
actually needed. It seems strange or even
contradictory while it really is not as we explain in the sequel. Agnostic means here "knowing
nothing about" or "unaware of".
First we explain the agnosticism with respect to a database engine.
We try to write applications so that no particular
database engine is assumed. It can be even changed transparently and the application in question will
not notice. Actually we are using simultaneously Interbase, ORACLE and Paradox and except for slight
differences (ORACLE does not support multiple BLOBs within one record while Interbase does) the
same application can be run with any of these databases without any change. Specifically we are using
a kind of abstract interface that is translating every database oriented code using sql constructs only.
We will extend the list of databases supported in the future but once Interbase was released as
a free product, our motivation to cover other engines decreased significantly. Thus most of our applications
can be run without any relation to particular database engine. We even support the fact that one part
of the system is using Interbase while another one is pumping data from ORACLE. All looks like if a unified
and always the same database engine is running.
Let us turn to the more intrinsic agnosticism with respect to the application in question. The majority of
jobs we are doing is using Windows based PC's as workstations while the database and other parts of the
application are running under Unix or Linux on a dedicated server. Typically these are hospital laboratories,
small or even huge companies - briefly the kind of their business is very variable. Having limited
number of employees we decided to unify all of the applications to one "super-application" that is interpreting
a particular application's specification. Thus we are using ALWAYS the same front-end binary that is fed during
the start-up by the detailed specifications of what and how it will be servicing.
Well that means the words like "account", "patient", "microb",
"surname", "product" are NEVER mentioned in our binary. It is agnostic of any particular data model,
it knows no particular working scenario beforehand. Everything including the type of database that is used
is written in simple text files as a specification and the binary is interpreting or just-in-time compiling
these specifications when needed. It is understood that such a program is able to do majority of
the most ordinary working scenarios, it has a sophisticated statistical subsystem as well as
graphics. That is browsing thru data, creating and arranging different views, scanning
forms,
chaining elementary actions into more sophisticated ones etc. More complicated than trivial scenarios
are constructed thru a powerful scripting language. The system handles calmly any inconsistency between
the current data model and the existing data and data dictionary within the application. Thus you can
use not yet stable data model having different tables and items than you suppose to use in your forms.
The system handles such situations doing its best. E.g. you can perform operations as "Here is
a set of named values and let every table in the system takes its values as it wants and performs an
insert ...". This is practical namely thru the development phase of the project in question.
It is important to stress that the binary in question is generally the only (except for web presentation)
front-end visible to our customers. Generally we supply our products as rather autonomous sub-systems aimed to some
specialized job. Typically we supply:
Totally database engine and application agnostic front-end written in Delphi.
Totally particular hence totally gnostic application specification.
Adequate database model and configured database server - the database in question is generated thru
the above front-end as it translates specifications into the particular back-end database in question
even for create clauses.
General and again rather agnostic web based presentation interface.
Two-way Java and Interclient based interface to and from our system / database.
Adequately configured hardware and operating system with automated administering cron jobs.
Our approach is always a compromise. It replaces coding in a programming language by
another kind of coding in a specification language. The difference is that the specification
language is very powerful provided the requested application is coherent with the
general model we accepted. In such a case the "code" of the application is extremely
compact and short. Some of the strong points of this method are:
The binary in question is a very stable one generating nearly never run-time errors. We change it
rather seldom only when new inner functions are needed or some hard bugs are discovered. To give
you an idea: one of our customers is running 3 years 24h/24h the program in question on many PC's
across whole of the country. There are various versions of the binary running including the oldest ones.
During this time the data model was totally changed, many new features were implemented and even
the database changed from ORACLE to Interbase. The users did not notice. Only some of those that
were using the oldest version of the binary were complaining about the fact that they could not invoke
some newly implemented features...
We are having equally in our hands "hardware" (the binary) and the "software" (the application in
question). Thus potential bugs in the binary can be worked around in the scripting language and
conversely a repeatedly requested and complicated function can be "hardwired" within the binary.
This is to some extent similar approach Transmeta is using in its new Crusoe chips. Continuing
in this analogy our binary is a kind of morphing software that is translating the application
into a native code and executing it then.
Because of the fact that the basic style of the work with the system is always the same the expenses
for documentation and training are reduced. In hospitals where we are implementing various particular
applications the personnel can easily migrate from one department to another. Basically they are all
using the same software that is tuned for even extremely different purposes (micro biology thru neural
surgery).
Even rather complicated applications can be build quickly. Construction of a fully functional first
iteration of a non trivial system can be question of one or two weeks. The system is improved then
step by step but is ROCK-SOLID FROM THE VERY BEGINNING. This is important for those customers that
are experiencing frequent changes of their needs or for those that are simply unable to give
their needs specification. With technology evolving this is very often the case as you will see
later on when some of our applications will be described in details. Actually there are many
applications that cannot be defined - they simply change faster than anyone can compile a good
definition. Thus they can be defined only by themselves using self-reference. This way is our
approach most suitable for.
Technical support of such software is particularly easy and can be done by persons having no
training in programming or computer science. This is a very important point because of the fact
that many customers need to do slight changes of their software without hiring programmers.
Let us mention also some weak points:
The binary in question is always the same. Even if it is highly configurable it cannot cover
all types of applications. Our experience is that about 90 % of the cases can be handled
by our approach with good results while the other 10 % of the cases cannot be handled regardless
to effort we apply. These 10 % are simply "hardwired" that is to say coded separately
in C, Java, Delphi or
whatever appropriate environment. It can result in an a totally independent application or in
an external plug-in to our binary.
Some customers are having very strict ideas of how their future software will behave, what kind of
colored buttons they will press ... Many of them want to reinvent wheel and it is a very difficult job
to smash their prejudices out. Building an application always from scratch is
wasting of time but the
customers sometimes are requesting exactly this approach. What is worse, they are able to accept some
limitations provided they are caused by say ORACLE or other company's development tools but they do not
accept that easily that we, too, are having some limitations from the very beginning.
On some old machines the interpretation approach can be slow resulting in bad performance. This argument
is now very marginal - most of even mediocre PC's are INCREDIBLY fast and the size of the binary in question
is only about 2MB.
Practically all customers are not very happy having a similar system to their
neighbor's ... They are
unable to understand that a system is important because it offers some functions and not because of
how exactly the functions are offered. Understood that the implicit way of offering of the functionality
is generally acceptable ... here paradoxically it helps the fact that the neighbor
also was able to accept
a similar system.
Why and how we decided to go the Interbase way
Now we turn to our first experience with Interbase, the description of the applications and the methodology
will continue in the sequel. We started to use Interbase 5.1 under Linux having some ideas about
Interbase version 4 under Linux as well as under WNT. The job in question was a kind of complete rewriting old
fashioned 24h/24h running DOS applications in a Novell based network. The new target were Windows.
The customer insisted
that also the database must run under Windows. Thus ORACLE was selected first and the rather huge
system was rewritten step by step using the above approach (we call it VaxNt for some already
unclear reasons ...). Very soon we discovered that the first version of the new system will be used
by a limited number of users - simply because of the fact that the old software worked well, was
rock-stable and covered critical business. Thus the new system could be applied only when
100 % approved. We switched between Paradox and ORACLE experiencing difficulties with both.
While the WNT server was a really good machine (in 1997) having 256MB RAM, 350 MHZ Intel PII and
plenty of disks it was never able to run longer than one month servicing a limited number of users
generating only a fragment of the target load.
Paradox was unstable because of network environments, ORACLE suffered from bad stability of
WNT.
The server run nothing else than ORACLE and IIS and was down too frequently. The customer insisted
on WNT and it turned to be a bad decision. For testing purposes we tuned Paradox inventing rather
sophisticated methods for recovering corrupted database files. The system was fully developed
and tested for Paradox and the question of final database engine was left open.
WNT and ORACLE suffered and killed each other on the server and were down sometimes even if they were
out of usage ... Thus we were looking for another system. We knew that ORACLE on unix systems worked okay.
But we preferred Linux that we already used for file-sharing and web pages servicing. As the very
first step we replaced with good results Novell servers by Linux ones (using the same hardware)
and we wanted to run the database engine under Linux logically. Interbase v4 was installed but
the results were not satisfactory. We felt that both Redhat 4.2 and Interbase 4 were
guilty and
not ripe enough at those times. The situation improved with Interbase 5.1 released. Some fixes
on Redhat were needed to limit problems with IP - these problems disappeared with new kernel versions.
The weak point was the rather slow communication between satellite PC and server even thru 100Mbit LAN.
With new drivers for Windows and new versions of Linux kernel the situation improved step by step.
Note: when we started no ORACLE version for Linux was available.
Thus we installed on similar hardware WNT+ORACLE and Linux+Interbase. The nature of the application
was such that we were able to run simultaneously both of the configurations with identical functionality.
Both of the machines were Intel PII with 128MB RAM and similar disks. The size of database was
a limited one - about 100MB only. The performance was roughly the same. The most common action
was browsing thru a huge table having about 50,000 of records and resolving joins of several
tables each one having about 10,000 of records. We also reloaded the databases using the
application itself converting the data to text and loading it back into the tables. Again the speed
was similar and the load of the machines too.
ORACLE when installed performed EXTREMELY poorly.
We knew that there was a tuning utility available (but not cheap) and thus we experienced blindly with
various constants in the configuration files guessing what the meaning could be.
With some effort we arrived to run ORACLE about 10 times faster than
it run with its defaults. As far as we remember the biggest magic was tuning of memory sizes
used for sorting, caching and the like. Really a nasty job! Moreover, ORACLE did not survive
some of our experiments and we needed to reinstall it over and over and, worse, WNT too. Anyway we obtained a
running configuration and the battle continued with the sizes of roll-back segments and the like ...
Here we deduced that ORACLE is probably using still some very old ideas and is excessively
complicated to maintain.
On the side of satellite PC's we used W95 and the ORACLE driver was good. Also BDE interface
behaved well until we switched to version 8 of ORACLE - it was a real catastrophe. Until now we
have no fool-proof BDE interface for ORACLE 8 ... The problem is really in BDE and not in ORACLE.
Honestly speaking, we were never sure whether the latest patches (service-packs) were applied
as necessary during iterative reinstalls of WNT. Very frustrating fact was that after application of some
service pack we need to
reinstall ORACLE - the service pack simply killed ORACLE ... But the same was true
when an imaging software was installed. Thus who knows what the problem really was ?
The whole procedure of installing WNT and ORACLE
was that tiring that we were not able, at least some of us, to speak about WNT calmly.
To the contrary, Interbase behaved well and the only tuning we did was increasing the number of
buffers - we used about 1024 buffers and 8KB pages. Adding even more buffers turned out to be
non productive. The only problem was with our language (Czech with really devilish
accents ...)
and it was not that easy to find out a suitable combination of CHARACTER SET (WIN1250), COLLATE
(PXW_CSY)
and local language driver for satellite PC's (PdoxAnsiCzech). There were really too
many combinations and nearly no documentation. Who on evil will ever guess what PXW_CSY could mean ...
Having these two configurations we martyred the two machines pressing the speeder down. The
behavior was
similar in all cases except for the fact that WNT went down from time to time. Also the management
of ORACLE under WNT was a difficult job. WNT did nothing else while Linux was our normal
server doing all of everyday jobs including compilation of huge sources, file sharing and the like.
Thus we decided to go the Interbase way and stopped the ORACLE one. The major inhibitor on
WNT+ORACLE was the instability and difficult management. The instability had some common points with
memory leaking. When used intensively, both ORACLE and IIS blew-up eating by small chunks
and never giving their memory
back to WNT. WNT begun to swap and that was the beginning of the end. We need to reinstall WNT
several times after such hang-up and we never discovered why.
One thing must by clear. We have no general hint or comparison between ORACLE and Interbase.
We simply did our best during a difficult decision making procedure and Interbase proved
to be a good solution, at least for our purposes.
Our decision was made using experiments of our own. We also tuned the testing system to our application
only.
In this case and also in other cases we programmed later on we discovered that the difference
between Interbase and ORACLE is nearly negligible in performance and Interbase is winner as far
as simplicity of maintenance is concerned. The best machine we were having was multi PA-RISC CPU HP Box
with 2GB of RAM and ORACLE running on this box was not much better than Interbase we were running
on a dual Intel PII/350 box with 256 MB of RAM.
Finally, there remains the question about Interbase under WNT. We really had this configuration, too.
We only measured performance and discovered bugs in drivers when the only Linux Interbase version
was 4. The performance was about the same as under Linux. But we were tired because of frequent
WNT crashes so that we never used Interbase under WNT. Here we can say nothing of any value.
Maybe W2000 will perform better but for us it is too late. The combination Linux+Interbase
proved to be a good solution and actually we are having no motivation to find out some
better one. Not until we will see a heavily loaded W2000 box with uptime more than 300 days.
What our first application was about
The customer we did the above job for is the far biggest data operator in our country - Ceske
Radiokomunikace. It is also
responsible for all TV and high frequency signals distribution and about 50% of GSM traffic
is also passing thru this customer's network. Our software is responsible for overall management of all
equipment. These are about 20,000 items covering TV emitters, satellite antennas, Radio-Relay networks,
ATM networks, enormous routers and the like. The traffic is managed thru our software, repairs are planned
and contributions or mobile transmissions are organized as well. The system covers also contracts, billing
and many of other functions. The system is run on two Compaq machines, each one having one processor
PII/450, 256 MB of RAM and plenty of SCSI disks. Both machines are identically configured Linux Redhat 6
and the database engine used to be Interbase 5.1 and now it is Interbase 6 - in both cases classical
architecture. All of the traffic is run on one server while the other is used as a hot swap reserve where
the data are stored. The low-level control of the devices is certainly not performed by our system -
this is case by case solved by specialized hardware we obtain filtered outputs only from. Some of the equipment has no
on-line control. There is another Linux server installed and used to store critical data and logs in the
form of flat files. This server is a kind of buffer and it is an ordinary PC with plenty of disks and a DAT tape.
This third server is a Redhat 4.2 running non-stop since installed three years ago.
The system is outsourced to our company that is responsible for all repairs, system recovery and namely
development of new functions. With new technology and new needs of customers of our customer we are
charged to implement minor changes daily and major ones about every two months. The upgrades must be
run without system restart or even any forced restart of software run on satellite PC's. That is
why our VaxNt approach and the interpretation is important - we can do even deep changes by tuning specifications
without changing the binaries. Thus no restart is needed.
Our software is run in a large WAN across the whole country and the network is not always 100% stable.
The number of client PC's running the software is about 400 but only a fragment of these are really
connecting to database for a non trivial amount of time. The majority is acessing the data thru an inner Web server
with no transient connections. Here good point for Interbase - connecting to it is really fast (unlike ORACLE).
The Linux server is running typically about 30 simultaneous gds_inet_server processes - the other connections
are polled by the application itself that is disconnecting when not used for a defined
amount of time
and connecting again when necessary. It is also running about 100 smbd processes servicing shared
disks. With this there are several httpd and sshd daemons and that is all.
The whole system is administered automatically using cron jobs and saves usable images of the database
every hour, every day of week and every month in year. The recovery is backed by a proprietary
solution - mirroring and other methods proved to be unusable. Shortly spoken, the system is making
extensive logging on the application level and in case of destruction it can replay the logs on the application
level again. But this is out of the scope of this paper.
The average long-term load of the system is about 20% and drops do 6% when the cron jobs are blocked out.
Every hour the load amounts to about 99% for several minutes during administering jobs but the final
users do not notice. These administering cron jobs are nice'd so that they run in the background.
The amount of data stored is rather big. All of about 20,000 devices are generating some data daily,
all break-downs are monitored fully detailed and all customer related or billing related data are collected ...
The average inflow is about 50,000 records daily - most of them are inserted in the background on the Linux side
without any manual intervention of the operators. There are about 1,000 manual interventions a day (manual
input of data) and more than 100,000 SQL queries are resolved daily - most of them being automatic
refreshes of screens at dispatch centers (7 units equipped by 3-6 PC's each). To sum-up, the system
is every second answering nontrivial queries, making inserts, deletes or updates.
Note: speaking about GSM it must be clear that we do not track the traffic up to
final users, from our point
of view the GSM provider is only one big customer using some of our equipment to distribute its signal.
The same is true for data distribution using high speed networks.
The peek predictable insert/update rate is about 2,000 per minute during the daily cleanup and
the peek rate detected so far was about 60,000 during several minutes long technology breakdown causing an
unexpected flooding of messages.
The stability seems to be good. Linux itself never came down except when hardware was upgraded / maintained. The
typical uptime is 300 days and more. Interbase was a bit less stable having from time to time problems with
corrupt indices. We feel that the problem is in unstable network and every time the problem arrived there
were also heavy difficulties on routers or so. Also W95 IP implementation is guilty - all of the PC's are running
W95 and are going down rather frequently because of WinWord, Excel or IE run-time errors. Thus the problems
with Interbase did not influence the customer simply because of the fact he was influenced much more
by Windows related problems. As the PC's were not able to run at all, no problem they potentially could not connect
to Interbase.
The repair time needed to recover from corrupt indices was typically several minutes and the response was
always qualified as good by the customer. The problem was infrequent with period of about 2-3 months.
There are some problems with Windows worth of mentioning. In large installations there you rather often find
a PC with Windows system files corrupt. Such a PC can generate false locks a often it finds the
error "Another user changed record" ... Reinstalling Windows resumes normal function. Windows 98
proved to be more stable, WNT were used seldom to report some significant data.
With rather big inflow of data - millions of records - we needed to develop some methodology of how to
store these data while enabling low recovery times for potentially corrupt database. It is not
an easy task to backup databases having many GB of data. It is a time and space consuming procedure.
If you want to create consistent system snapshots every hour, you must complete total backup every hour ...
How we did it is described in the next section ...
Reducing database size and other non standard tricks
The data collected in technological databases are seldom updated. Most of them are stored once for
ever. Only a kernel, generally configuration data, is heavily updated. With a Radio-Relay data channel
you have some configuration data (configuration, technical parameters, capacity, location, traffic attached ...)
that are changing
rather seldom. Then you are having actual status data, maybe breakdown parameters. And the device can
generate a stream of signals. This stream of signals and events represents the total majority of
data related to one device. These streams are normally filtered so that only an interesting part
remains and infinite sequences ot type "YYYYMMDD HHMMSS: operation continues OK" are thrown away.
Even frequently changed data (today's traffic and events) become less and less frequently changed
as the time passes. The data originating last month will nearly never be changed once computed, analyzed
and stored.
Thus one can break the database into frequently changing or frequently accessed kernel part and
a huge archive. Here one can use with good results the fact that Interbase is supporting external files
as if they were database tables. Thus the streams of signals can be stored in database tables first and
when old enough they can flow out to external files. This "purge" can be done say monthly. The GREAT
advantage is that you need to backup such data only once a month ...
The external files can be used with extremely good results as buffers catching unexpected flooding
of records. Appending lines to a text file is by an order of magnitude faster than any insert into
a database table could be. Thus you can calmly catch the data and let the database chew them
at its own speed. The trick is that the flat file appears exactly as a table. Only deleting from
such buffer tables is a bit tricky - must be done on files system level by an UDF or so. The application
pumps the data from the buffer external text file table into a real one as if both were regular tables and
is unaware of the peek speed of incoming data.
To our knowledge one can access such external tables with good response times even when having
millions of rows. It is important that external tables are having no indices ... It is essential to find out
a good compromise - normally nearly no one is accessing one year old data concerning signals of one
particular device. And it if is the case, the person in question could simply wait one minute or so.
There is another trick for saving database space. In medical applications there are very often results
generated - text files describing in some readable format the exam results or summary. A typical micro
biology laboratory can generate about 100,000 of such results a year generating a few GB of disk space.
You can zip such data before storing them into BLOB items. However, the typical gain is not that
good. Using similarity between such results you can obtain at least 10-times better compression
when compressing say 1,000 of such results being stored in one BLOB. Better, you can store it again as
an external file and manipulate such zipped archives using UDF functions. Using this method we
reduced about 12-times a large medical database. Again we are using the fact that once a result is
generated, it is never changed and can be stored in a different way. And again, the current results
can be stored in database tables while the old ones (less frequently or nearly never accessed)
can be purged into zipped archives. From the application point of view nothing is changed - the UDF
transparently searches for the data if they were already purged out of the current horizon.
Balancing the load of the system and using the computational power of client PC's
Here we have only a few comments. Generally there are two extreme approaches. Using totally
client-server model everything is done by the client software and consequently even incredible
amounts of data are being transferred thru the local or even broader computer network. The other
way is to program the majority of computations on the server side using stored procedures or
other programs that are run on the same computer as the database engine in question. Here the
server can be heavily loaded due to the fact that the majority of computation is done by
only one machine.
It is not very easy to decide which way is better. On one side you can buy cheap but powerfull PC's - no problem
having a 1GHz PC under $1,000. On the other side it is not that easy to buy for reasonable price a
server which is about 10 to 20 times that fast.
The client-server model is more suitable for "agnostic" style of programming and to some extent
the load is better distributed within the network - every machine computes a bit.
We decided again to use a compromise. Our clients generally do not like to spend huge
amounts of money for servers. Thus the majority of the load is distributed to cheaper client PC's but really computationally
intensive tasks are scheduled to the server. The agnosticism imposes that the stored procedures need to
be used seldom - the language of these as well as their functionality extremely differ from one engine
to another. Wherever possible we try to generate even huge SQL-script files and these are scheduled
and run on the server. It is in some sense a kind of batch processing.
Advantages and limits of agnosticism
Generally speaking when the application in question becomes stable and well petrified
within the customer's
company the needs for database-engine agnosticism do drop down. It is not probable that a well functioning
system will be moved just for fun to another database engine. In such a case the usage of database-engine
specific code (stored procedures,...) need no longer be limited that strictly. As every dogmatic idea the
hard agnosticism becomes a problem in such a case.
We try anyway to limit all decisions that suppose a particular engine to an absolute and well based
minimum.
As every restriction accepted the agnosticism limits what you can use from the potentially broader
functionality your engine is offering. However it can turn to be an advantage. E.g. the migration to newer
engine versions is much cheaper - you need to verify only a limited part of your code.
Using only or nearly only the standard and common part of the functionality every engine has
is a good general practice. Using more exotic features is always risky. You never know what will
be your next customer's beloved engine.
Writing applications with more abstract level that really necessary is also a good practice.
Surprisingly you obtain often much shorter and much more compact code.
Our approach is in some way extremely nearly absurd abstraction and interpretation.
There are for sure many other groups that are using a similar and maybe more ripe in house methodology of their own.
Such an approach requires always a rather homogenous group of
people working together. This is a weak point of the method. The higher the abstraction
the more different and well based opinions you will have. With decreasing affinity
to abstraction
you can more easily create coherent and larger groups.
One of our objectives was that
the customers will be given binaries and simple abstract tools using which they will tune their applications
to perfection while paying us some money for it. This goal was achieved only partially, however.
Examples of our typical applications runnig with Interbase
Microbiology application - including sophisticated system for acquired (nosocomial) infections surveillance
and other hi-tech components including automated hypothesis formation. Generates several GB of raw data
a year.
Data operator dispatching system - very specialized application covering hundreds
of database tables and
about one thousand of different working scenarios. Really a complicated system, generated tens of GB of
raw data and really intensive usage.
Neural surgery - complete system including operation preparation, statistical sub-system, billing ...
Positron Emission Tomography - again a totally original system of overall planning, results distribution ...
Total of 12 other specialized medical applications ranging from scheduling of rehabilitation procedures
up to non invasive surgery in gynecology.
Actually our company is running about 15 different applications with Interbase as database engine, in
all cases the resulting information system is fully outsourced to our company. For all of the customers
the system in question is a critical one and 24h/24h running is required. All of the systems are using
Interbase 6 Classical Architecture under Linux Redhat 6. All of the systems are supported remotely
thru modems or thru secure channels over internet and most of then are changing functionality
rather often.
Final remarks, bugs found and some tricks
As is always the case in the community of programmers, the documentation was consulted only as the
last possibility. We mostly jumped into the problem being armed only by some experience. We would like to
stress that even writing UDF function under Linux for Interbase turned out to be a simple task and
we started simply using the examples in the distribution.
Whenever using another language that English you are a bit disqualified. The tools like isql do not expect
the letters with accents even within literal constants. Thus writing INSERT INTO
X(NAME) VALUES('Řehoříček Blažej')
is tiresome if impossible - speaking about Linux and Unix generally. You will even find difficult to
write this statement on an ordinary keyboard at the console. If you have a huge amount
of
text data that is to be thru a cron job inserted into database tables, then the most ordinary way
is to generate a huge sql script and pass it to isql to do the job. With such devil letters inside the
data you cannot do it this way. The problem
can be solved using again external files. You declare an appropriate external table (suitable
character set and collate) and
the data insert simply
by writing them as a fixed column formatted text. Then using a one-line sql command you pump the
data from one table to the other. Here you must imagine that a hard core Unix programmer or administrator is
writing specialized C-programs only if the problem cannot be solved at script level within shell by
commands as sort, cut, sed, awk and similar ones chained in a pipe.
There was one hopeless moment when one of really important databases was corrupt in some deterministic way
so that even older backups tend to core dump all standard tools (isql, gbak and
gfix). The database was
in a 24h/24h usage. Even the documentation was not clear but finally we saved ourselves using -G,-I and
-IG
switches for gbak. The repair tool gfix is not documented well - now with sources available we will
write our own version of gfix and gbak so that even largely corrupt databases will be
partially recovered.
It is a very frustrating moment when a recovery tool core dumps ...
Creating backups of large databases consumes large amount of disk space and it will be
helpful if gbak
could read / write to pipes instead of to files only. Then compressed streams could be used and this will
save normally about 80% of the disk space. Actually you need gzip the backup files only when they are
fully created ...
This item relates to the classical architecture. We found a configuration running two gds_lock_mgr processes.
Along the isc_lock1* file blown to an incredible size. Killing both gds_lock_mgr a removing the isc_lock1* file
helped but we do not now what was the real cause of the problem. At the same time one of the PC's was
corrupt and needed total reinstall of Windows. We think that the handshake between the driver on Windows side
and the server is not absolutely perfect. Maybe Linux itself is guilty. The problem was found in one installation
once thru 2 years of non-stop run.
Wild gds_inet_server. Due to errors of any kind you can from time to time find gds_inet_server consuming 99%
of the CPU time while doing nothing of importance. You can simply kill it in the quasi totality of cases. We think
that the process is a kind of zombie left by abruptly destroyed IP connection. We were using a cron job that was analyzing
via "ps ax" the list of all gds_inet_servers and killing the wild ones if they look strange. With the version 5
and 6 this problem disappeared.
Allowing users to write sql queries of their own can result in totally blocking the server. The users will for
sure join multiple tables without imposing some conditions on keys ... In such a situation Interbase is creating
enormous work files (to be found in /tmp). We solved this by resizing /tmp to at least 1GB and invoking a cron job
that is purging old temporary files in /tmp. Old means about one hour. But that depends on the nature of the
application in question.
The docs are often inexact. E.g. we were not able to redirect temporary sort files. They were always in
/tmp as
if hardwired within the binary. The same was true with first version of distribution of version 6 - it installed
in one directory while requesting some files (isc4*) in another one. It was only a matter of some guessing to fix
this problem.
Writing external programs or UDFs interacting with Interbase was rather simple comparing it to what you need
when compiling a C-written program for interaction with ORACLE. Normally the licenses you obtain from ORACLE
do not contain the development option, you must request them (and pay) explicitly. Interbase comes as a whole
with all features.
There are some absurd bugs in the parser of SQL queries. We were having examples where bad
implicit casting (varchar to integer) generated incomplete answers - some evident rows were missing. The problem was that the parser
did not report syntax errors while producing absurd responses. The problem was fixed using
explicit casting.
Construction of additional indices helped much in improving the performance. Because of the nature of our applications
we were not able to guess off-hand what the frequented queries will be like. Thus we constructed
indices only
after having accumulated some information about the real traffic in the system.
This paper was written by Ladislav Janiga
thru one night and upon request made by Claudio Valderrama. The author believes that the paper will more
help than hurt anyone that was courageous enough to read it up here ...
Updated 20000930. Comments: Lada.Janiga@jlabs.cz
|