Biblio Tech
Review
Information Technology for Libraries

Databases

Updated: 12 April, 2001


Picture

Tech briefings

RFID
ILL Update
Z39.50 part 2
Lotus Notes
Z39.50
Java, NCs and NetPcs
Databases
Barcoding

Advertise in BTR


The most focussed advertising resource in the library automation industry.
 

Line

Database Management Systems (DBMS)

Contents

Types of Database, Hierarchical, Network, Relational, Nested, PICK, Object Oriented, Mumps, AS/400, Hybrids, Summary, Future, Product+DBMS list   UPDATE - March 98

Introduction

Database Management Systems (DBMS) are important.  They underpin all the activities of a library management system by providing the basic storage and retrieval technology. The library application software sends data to and receives data from the DBMS which, if it is working properly, is hardly noticed at all.  Yet great claims are made for different types of database and their particular offerings. You should at least be able to understand the basics to understand what you might be getting - or missing when you choose a Library Management System (LMS).

Some perspective

Early library systems developed their own systems for storing and retrieving records.  Geac's GLIS, IME's TinLib, BLCMP's Circo all developed a methods of storing bibliographic records that were peculiar to themselves.  By the early eighties, commercial products began to appear that offered advantages to application developers. Library software began to become more than just a fetch and display operation - the complexity of the business rules in all modules of library management software meant that off-loading the efficient storage and retrieval of records to a specialist piece of software became very cost effective.

Suppliers began to use the new relational database management systems (RDBMS) like Oracle and Ingres (CLSI).  Other suppliers used PICK for products such as Dynix, ADVANCE and Bookshelf. Still others retained their own proprietary databases for example Adlib. It is important to note that a system can be relational in concept without necessarily employing a commercial Relational DBMS - both Libertas and Tinlib prove this (amongst others). By the mid-nineties virtually all systems use commercially maintained database software of some sort. The questions are - does it matter to the library which system is being used?  What constraints and advantages are offered by the different systems? And where is the future taking database technology?  How should you satisfy yourselves that the database technology will meet your needs.

Types of database

Database management systems have evolved fairly slowly over the last 30 years. The first commercial programming languages had file handling systems that took the responsibility from the programmer for maintaining file structures and indexes - e.g. ISAM (Indexed Sequential Access Method). A DBMS can be viewed as a more sophisticated and flexible form of file management together with a flexible tool for data extraction and often other "high level" tools.  Without being too fussy about the niceties of DBMS, and ignoring those that do not figure much in library systems, these are the main types - in a rough chronological/sophistication order of their development. Note that these divisions are not completely mutually exclusive - some DBMS can be considered to fall into more than one camp - others into none.

Hierarchical

Now obsolete, a hierarchical DBMS assumed hierarchical relationships between data i.e. parent - child.  Data structures were often forced to conform to the hierarchical model in order to take advantage of the management and programming aspects of the products.

Network

Network DBMS allowed complex data structures to be built but were inflexible and required careful design. Very efficient in storage and fast however - best examples are airline booking systems. Generally conform to the CODASYL standards.  Example: IDMS from Cullinet. Note: Network DBMS describes the connections between data elements - not the ability to operate over a network. A pre-cursor to and largely superseded by Relational DBMS

Advantages

  • Fast
  • Efficient
Disadvantages
  • Inflexible
  • Technically obsolete (although many in commercial use).

Relational

Arose from theoretical considerations of data structures in IBM by Dr Codd.  True Relational DBMS use the Structured Query Language (SQL) to extract and update data and conform as closely as possible to the theoretical relational rules of normalisation. Oracle, Sybase, Informix etc are examples. Work best when the data structures have been "normalised" to eliminate data and field duplication.  Data is organised within "Tables" (files) and relationships expressed between tables and data elements. Note that just because a system uses a Relational DBMS, it does not mean that the data structures have been properly defined in the first place.  You can build rotten data structures with a good tool. See Data Structures. SQL is now the industry standard for data querying and updating of databases. Relational DBMS lend themselves very well to the library concepts of authority files.

Advantages
  • Overwhelmingly, the most popular type of DBMS in use and as a result technical development effort ensures that advances e.g. object orientation, web serving etc appear quickly and reliably.
  • There are many, many third party tools such as report writers that are tuned to work with the popular Relational DBMS via standards such as Open Database Connectivity (ODBC).
  • Offer distributed database and distributed processing options which might be advantageous for some large consortium libraries.
  • Extremely well developed management tools and security with automatic data logging and recovery.
  • Have Referential integrity controls ensure data consistency.
  • Have Transactional integrity features to ensure that incomplete transactions do not occur.
Disadvantages
  • In the early days they were slow - Relational DBMS have to employ many tables to conform absolutely to the various normalisation rules. This can make them slow and resource hungry compared to more flexible (less rigorous?) systems.  Most Relational DBMS do not now have performance problems.
  • Some restrictions in field lengths.  Field lengths are usually defined with a maximum. This can lead to occasional practical problems e.g. a publisher with a 300 character name - they are rare but it can happen!
  • SQL does not provide an efficient way to browse alphabetically through an index. Thus some systems cannot provide a simple title A-Z browse.

Nested

These systems are all derived from the innovative PICK system developed by Dick Pick and IBM in the late 60s/early 70s.  The original PICK system was designed as a database cum operating system with the tools for data retrieval built in - the same concept as with a modern relational DBMS like Oracle.  The main differences between a Relational DBMS and the PICK-like "nested" or "post relational" systems are:

  • They allow related multiple values and sub values within a field - groups of related data.  For example multiple authors, series, publishers for a work can be easily handled.
  • They easily support variable length, non-limited fields.  Early versions of relational DBMS had maximum field lengths of about 255 characters. Library data requires several fields that might contain very long fields.
  • Note that the Oracle8 now allows multiple values - so PICK was about 30 years ahead of Oracle in this area!
Advantages
  • Fast and flexible development - no problems handling complex text oriented data structures as found in library data.
  • Low administrative costs - they are simpler to administer than a Relational DBMS.
  • More efficient - more users on less power and memory.
Disadvantages
  • A minor market segment when compared to the likes of Oracle and Sybase - can be thought non-standard by some corporate IT departments - virtually unknown in some industries.
  • In the early days, PICK was prone to data corruption.  Reliability now as good as Relational DBMS with transaction logging and similar features built in.
  • The data query language although easier to use than SQL was "not SQL" and associated tools for data querying could not be used against a Nested DBMS.  Now both main Nested Relational DBMS suppliers can be queried using SQL

Proprietary

Some library system suppliers are continuing to maintain and develop their own DBMS.  This may seem anachronistic but there are some advantages amongst the obvious dangers of support should the company fail.  However, if the company should fail, then knowledge of the DBMS is less critical in support terms that the application knowledge - i.e. whichever DBMS a company is using, if they fail commercially, then there will be more problems with supporting the application than the DBMS.  Of course a proprietary DBMS may well be Relational, Network or Hierarchical.  If it uses SQL then it usually conforms to the relational model.

Advantages
  • The DBMS can be totally designed around the problems and idiosyncrasies of the library application.  Inverted indexes can be built in (most  relational DBMS have no support for inverted indexes - so library system developers using them have to build these components themselves). A good example of this is the Adlib system that is currently being enhanced with native (built-in) Z39.50 support.
  • Speed of response to problems - should a problem arise with a proprietary database, then the company can usually fix it more quickly than bigger company where a small "libraries only" problem may not get priority.
Disadvantages
  • A small company maintaining its own database is probably not a big risk at the basic technical level provided it can demonstrate the capabilities for creating the right data structures reliably and flexibly.  The main disadvantage is probably in the area of keeping up with new developments like web serving extensions and object storage. Lack of compliance with standards (e.g.ODBC SQL)may mean that standard tools like MS Access and Excel for reporting and analysis may not be available.
  • System migration - hardware options may be limited.

Text Retrieval Systems or "Free Form" Databases.

Text retrieval is one of the weasel words in the library automation industry.  It describes a process in terms of what it retrieves but it tends to be used to define the data structures, what is indexed and the tools available to retrieve the text. In some ways they are at the opposite end of the spectrum to a relational database. A "Free Form" database is the better term since there is little structure imposed on the designer of these systems. A library using a system such as BRS Search or Status can define their own record structures. The search "engine" then indexes every field in an inverted index structure so that any string of data may be used to retrieve a record. The search sophistication possible with such indexes is enormous - Boolean combinations between terms both within and across fields are simply done and the typical text retrieval functions of adjacency, truncation, set manipulation etc are all built in.

Disadvantages
  • Free Form databases have no structure and thus the advantages of a relational structure to create authority files where data is kept just once and concepts such as global edit and heading merge are not easy to achieve.
  • Functions other than retrieving records from the information file e.g. circulation, acquisitions are difficult to achieve without database structures.
Advantages
  • The main advantage is the power of search and retrieve functions.  Where there are large numbers of technical reports to be searched for information in an unpredictable way then a text retrieval engine is required since its efficiency and flexibility of indexing mean that you can index the "full text" of the report.  A text retrieval engine verges on textual analysis.
  • A text retrieval engine can be combined with a structured DBMS to produce the best of both worlds. Such "hybrid systems" such as Unicorn from Sirsi are beginning to impact on the academic as well as the special library market since disc space is no longer a system limitation.

Object Oriented DBMS (OODBMS)

Object orientation for a database means the capability of storing and retrieving objects in addition to mere data. As their name suggests - Database Management Systems were designed to look after data - numbers, words etc. Objects are complex and not well handled by standard Relational DBMS.  Object Oriented DBMS have been emerging over the last few years and established products like Oracle with their Oracle8 have announced object capability.  The implications for library systems are unclear. Most systems can handle images, video and other objects but do so in a non-standard way in many cases. The first system to announce the use of an Object Oriented DBMS is Taos from Data Research Associates.

Others

There are some other databases - commercially available that do not fit into the above, technical categories.

Mumps

Mumps was developed in the 1970s for use in the Health sector for very large on-line databases with an emphasis on transaction processing.  It can handle relational, network or hierarchical data models. The database management functions are closely coupled with the application language and as a result Mumps is a very fast and efficient database management environment.  The original specification for Mumps was in the public domain and has now become commercialised by InterSystems as OpenM.

OpenM can be used to manage a variable length, textual, relational data structure - as is required for library applications - whilst still retaining its speed and efficiency. It has a wide user base, is very well matured yet responding well with new Web oriented tools and add-ons.  It can provide SQL and ODBC connectivity in the OpenM version.

AS/400

The AS/400 series of minicomputers from IBM comes bundled with a DBMS as part of the operating environment -OS/400. The system is very powerful and flexible - has similarities to the Nested Relational DBMS model - allowing relational database structures and has a simple to use database enquiry facilities for ad-hoc reporting.

Advantages
  • Fast
  • Efficient - can run very large databases and support many thousands of users.
  • Flexible
Disadvantages
  • Only runs on one hardware platform - although NT can run on an AS/400 machine so if you were running a library system on AS/400 hardware you could possible change the software by changing the operating system.

Hybrids

There are hybrid systems and hybrid databases. Basically, a hybrid system is one that uses two DBMS for handling the complex requirements of a library system - complex searching where a free form DBMS excels and a relational DBMS for structured data transaction processing.

A hybrid database combines the features of two of the main types.  The current debate (mid 1997) is whether the hybrid Object-relational DBMS like Oracle8 or Informix/Illustra will prove more effective at handling future requirements than a "pure" OODBMS like ObjectStore.

Summary

The database that is proffered with your library application is important. It is important that it works well (reliably, efficiently and flexibly), can respond to the up-coming changes in the computer and information handling world and is commercially viable.  Adherence to industry standards like SQL, ODBC etc will ensure that you can use third party tools like Access against the database to get data to the desktop.

So when looking at a library system consider the advantages and disadvantages of the various types of DBMS that underlie the system and assess whether your needs are likely to limited by the technologies offered.

If you need lower hardware and administration costs then the greater efficiency of Nested Relational DBMS or OpenM could be useful.  If you must retain corporate standards and these are Relational DBMS like Oracle, then you can take advantage of the many tools and skills that may be available from the corporate IT department. If you need in-depth retrieval based on the text of the documents rather than assigned terms then look for a product with a free form DBMS or text retrieval engine incorporated. See the Product+Database list for a list of Library Management Systems and the databases they use.

The Future

The development pace of computing appears to accelerate year on year.  DBMS have been maturing slowly over the last twenty years and have reached a high level of reliability.  The future will call for efficient handling of objects and sophisticated Web serving.  Simple Web serving is a feature of most systems and should not pose problems - Object storage will be interesting to watch - especially as the demand ramps up for libraries to store and deliver more and more images, digitised texts, and video etc.

Return to top of page