previous upper level next

4. Semantic and relational data models comparative analysis


In this part we will attempt to carry out comparative estimation of semantic and relational data models' efficiency. To do that we will apply some common for such a task criteria, namely, data retrieval time, redundancy, errors sensitivity, etc.
We must note that estimates made by means of involved criteria may greatly vary for the same model depending on the data type and amount. Taking into account this fact we will pay a peculiar attention to the modeling of most complicate data types that are distinguished by their large volume, intricate interrelations and high-grade dynamics.

Formal data models description is a necessary condition for their comparative analysing. Above we have already presented such a description for the semantic data model, so here we confine ourselves to the relational model's brief description.
Relational model's formal description is founded on mathematical conception of relationship, the latter may be defined as follows. Let some sets D1, D2, ..., DN exist. Then P is a relationship defined on N sets, if P is a set of N-position tuples, first element of each tuple belonging to D1, second - D2 and so on. Sets Di are usually called domains, N being called a relationship degree. Therefore the relational data model consists of a number of relationships' schemes.
Formally in the relational model an abstract data aggregate is defined as follows:

D = {d1,d2, ..., dI }, where d1 is an element of D1, d2 - of D2 ,...
D' = {dK,dK+1, ..., dN }, where dK is an element of DK, dK+1 - of DK and so on.

Here each relationship is represented by an individual table. Columns of the table stand for the domains, while table's rows (also called records) stand for specific relationships' realizations. It is important to note that the tables by all means must contain common columns, since otherwise information in the different tables can not be compared.

Both the relational and semantic models formal description having been had, we are able to estimate their efficiency to manipulate with data of various complexity degree. The data complexity degree is an inexact term. Informally one may define it proceeding from the data volume and also from an assumption which type of relations among modelled data prevails: 1-to-1, 1-to-many or many-to-many.

Let us consider an example. Example. Having at one's disposal the database information about various types of repairs and repairs' carrying out one must get to know the cost of repair works for every well.

Fig.4a

Let us assume that information about repairs is restricted to their cost, meanwhile well repair data consists of repair's name and accomplishment date only. So in the first case we deal with the simplest kind of data relations - 1-to-1, and in second case - many-to-many. This information representation method for both of the data models is sketched at the drawings. Let us note, that the relational model of data involved contains two tables , because in these data models many-to-many relations are expressed by a separate relation.

Fig.4b

Let us determine how many compare and search operations are needed to fulfill the query of repair works cost. Let number of wells be equal to Nw, number of repairs - Nr, and average number of repairs per well be equal to Nra. To get information about the repairs for each well in the relational model requires explicit or implicit transformation of Wells X Repairs relation (which is of many-to-many type and presented in table 4a-1) to Nw Single_well X Repairs relations. This procedure implemented by means of identical well names seeking in the table 4a-1 requires at least (Nw)x(Nw-1)/2 search and compare operations applied to the text strings. Furthermore, provided relations Single_well X Repairs have been somehow got and the table 4a-2 is available, total repair cost definition for every well is stipulated by Nw x Nra appropriate repair names search passes through the table 4a-2. Every such a pass requires about Nr/2 search and compare operations, exact number of operations depending on rows arrangement in the table 4a-2.
Let us now consider the semantic model. Here the involved task is solved much more easily, since necessary logical links have already exist in explicit form. There must be only Nra (according to repairs number) data search operations for every well. Thus, total number of operations equals to Nra x Nw.

Let us discuss the result obtained. Relational scheme seems to be certainly less complicated in realization then the semantic one. The latter requires some expenses connected with appropriate database objects' links establishing, and, additionally, relational model has more understandable structure. But the semantic model has undoubted advantages in data storage and retrieval organization. In the above- discussed example a quantity of the compare and search operations needed to solve rather simple task in both of the models differs significantly. For example, provided number of wells Nw equals 100, average number of repair per a well equals 3 and total number of repairs types Nr equals 6, total number of operations reaches about 5700 in the relational model and 300 in the semantic one.
In this example we have had one 1-to-many (Repair X Cost) relation type and one many-to-many (Wells X Repairs) relation type. Certainly, if a query for the relational model had involved more tables, the correlation of operations numbers for both of the models would have increased proportionally to the tables number. This is caused by abrupt increase of operations quantity to fulfill a query when the latter involves every new table. While complicated data being manipulated, number of the engaged tables increases inevitably since many-to-many type relations are expressed here by an individual table. Therefore, amount of unnecessary, superfluous operations in the relational model may grow drastically and exceed amount of operations required for the same tasks solution in the semantic model as many as tens of thousands times.
The same may be said about data redundancy too. In the semantic model every data element is stored at a single location, namely in an appropriate domain. Access to the domain is realized by means of the references. On the contrary, in the relational model data is repeatedly duplicated. In the above-discussed example, provided values of Nw, Nra and Nr are the same, in appropriate column of the table 4a-1 wells' names are duplicated Nw x (Nra-1)=200 times, and repairs' names are duplicated Nw x (Nra/Nr) x (Nra-1)=100 times at least. Such a redundancy of data is engendered by one table only. Naturally, it grows proportionally to the number of tables containing the column involved.
Large information redundancy aggravates the data check-out and correctness support. Suppose that in the above example the some repair name in the table 4a-2 is not correct. It means that such a repair will never participate in the appropriate query, the fact being hardy detected due to large data volume. On the contrary in the semantic model the repair name as well as any other one is allocated at a single address, therefore, name misspelling, firstly, by no means affects query implementation and, secondly, is easily detected and corrected.
Because of high-grade volatility of data the model adaptability to new information types incorporation is believed to be of great importance. In the semantic model the adaptability problem is resolved by means of pseudo-object introduction. Pseudo- object can be at any time extended up to full-quality object, the extension by no means affecting the rest of the base. As to the relational model, parameters specification extension is relatively painless only provided newly added parameters relate to having existed ones as one-to-one. Every new one-to-many relation appearance in the database means just another redundancy growth while introduction of many-to- many relation causes one more table construction. The latter means not only redundancy growth but also an abrupt build-up of operations to retrieve data.

Taking into account the above-stated one may conclude that the relational model use is preferable only when small amount of data with prevailing simple interrelations (one-to-one) is engaged in. In this case the relational scheme realization simplicity becomes evident advantage, the data and operations redundancy being compensated by computer capacities.
But when the data is rather voluminous, volatile and interrelated in a complicated manner, the relational model can not ensure required efficiency level. The relational approach use causes permanent growth of requirements to the computer resources and forces to employ specialized and expensive hardware. That is why transfer to more perfect but at some extent more complicate semantic data model seems to be unquestionable here.


previous upper level next