upper level next

1. Primary data and their structurization techniques


First stage of any application computer modeling is a structurization of relevant data. The data structurization determines data storage and access methods. It also determines the data representation versions for the customers and so on. In other words, the data model is a formal logical structure for data representation and manipulation tools. Oil and gas industry has some distinctive peculiarities related to the information technologies. Firstly, the primary data as well as their processing results attain huge dimensions. Secondly, the involved data is heterogeneous and fragmentary due to some inherent disintegrating factors such as dispersal of their origin and processing places at various industry sectors, diversity of means and techniques for their obtaining, etc. In this connection the problem of the data model choice acquires great importance.
Another typical for huge mankind activity spheres feature is an impossibility to define explicitly the whole data aggregate required for high-grade processes' modeling in the involved field. Often there may arise a necessity to take into account new kinds of parameters, and moreover, the specialists which are running appropriate model may change their viewpoints at the subject they deal with. Such an approach to the concerned problem puts forward the data model flexibility requirement, that is the model capability to adequately react to changing data composition and character.

Firstly the data model realizes transition from fragmentary and chaotic data sets to harmonious form of application field data representation. The simplest technique is to unite the similar data to the corresponding groups. It is the method that is applied to create relational tables (see fig. 1), which form the relational data model basis.

Fig.1

The information retrieval in the table is performed according to specified parameter value, such a parameter being considered as a key one. For example, to determine some well j branch affiliation requires search in appropriate table of the row which has in column well the value j and fetching relevant information from column branch.
Relational model tables by no means connected one to another. To collect distributed in a number of tables information one must compare the key parameters values in the columns kept in different tables simultaneously. For example, getting information about well i exploitation commencement date in couple with the used pump type involves search by the key parameter well in both of the tables.
Relational scheme has own obvious advantages such as, for example, the data representation simplicity and availability to ordinary user skill. But its faults are apparent too. First of all, relational database information logical integrity is supported only at the level of database queries. Indeed, the relational tables exist quite independently one on another and may be connected only at the moment of information retrieval (see above). In addition to noticeable compare and search operations amount increase to get required data such a scheme results to significant database information redundancy. The same value may be duplicated many times in different tables, that impedes the data check-out and engenders a lot of errors. Other substantial fault of the relational model is an extremely inefficient handling of data which related as one-to-many and many-to-many. First relationship type modeling causes large redundancy of information while the second type relationship is always expressed by separate table. When modelled application field is described by the intricately interconnected data, new parameter addition to the relational base provokes an abrupt growth of computing resources requirements.

Above-mentioned relational model faults sometimes may become a considerable unfavourable factor, especially in the cases when there exist huge data sets with complicated interrelations. These faults are considered to be overcome by use of a model that first of all would provide information logical integrity at the data organization level rather then at the data retrieval level. Among data models realizing this possibility the network model is a most common one. Like the relational model the network model has the tables as its important element. Those tables records describe some object or event (entities tables). Every record (entity) can have subordinated records stored in other tables. Subordination relation is expressed by means of links. Figure 2 illustrates how the application field primary data can be structured by the network model data tools.

Fig.2

A lot lesser information redundancy is likely to be obvious merit of the model as compared with the relational one. Relations one-to-many (oilfield-wells, for example) being expressed here by means of links, number of operations to retrieve required data decreases at a noticeable extent. For example, to get to know which wells belong to the selected oilfield (see figure 2) in the network model one may confine oneself to the includes wells links scanning in the relevant record of oilfield table. For comparison, to solve the same task in the relational model (see figure 1) would require all the rows scanning in the affiliation table and selection of those amongst them which relate to the oilfield involved (taking into account appropriate column). That would result in many times more number of operations. The network model shortcomings become salient when demand for stored data set extension emerges. To be more exact, when such an extension requires new entity (entities table) creation. Figure 2 illustrates the situation when parameters describing the branches should be included to the database. Several new columns addition to the wells entities table is not advisable, because one branch has, as a rule, tens or hundreds of wells. Such a decision significantly increases the information redundancy and yields a great number of hidden errors. Alternative is a database structure reorganization, that is, new type of entity (new entities table) introduction. However this variant also brings apparent problems connected with an insertion of a lot of corrections into the database.

As an intermediate result, in addition to some particular problems one can detach the common problem for described models. The matter is that new parameters addition to the data structure existed often causes large difficulties. Network model resolves the problem partly when, for example, required corrections concern only one entity's attributes. But in the case when a parameter from a table column is to be transformed to the separate entity (see above), the network data model becomes very vulnerable.

We conclude that the stated problem resolving is in use of data model which from the very beginning treats any information element as an entity (object) or potential entity. It is idea that is realized by semantic object-oriented data model. Basic notion of that model is an object - a structure which reflects a real object or event. An object is defined by some set of parameters, their values being supported by the references to other objects or pseudo-objects (elementary entities) (see figure 3).

Fig.3

Such a structure possesses a following advantage. No matter how our data set and interrelation conception is changed, all involved database structure modification is restricted to new objects (entities) addition and correction of references to existing ones. Let us imagine, for example, that during such a database running a necessity emerged to extend a number of parameters for branch description (see figure 3). In this case pseudo-object branch (currently containing only own name) is converted to full-quality object described by the relevant parameters set. Those corrections are not of any importance for the rest of the database except the branch data reference type change (it has pointed to the pseudo-object before correction but now it points to full-quality object). Another important advantage of the semantic model is that the reference mechanism ensures a great degree of stored data logical integrity. It enables to model the data interrelations of any desired complexity and, therefore, reflects the application field most adequately. Semantic data model provides the efficient data access means as well. The whole parameters specification held in one object is visible by another object as soon as reference of the latter to the former is constructed.


  upper level next