cSiTra

The design

cSiTra is written in Java and uses the MVC design pattern in its approach. The cSiTra system is made up of 9 different components interacting with each other. The following five components: the RDBMS, Cassandra, their corresponding meta-models (SQL meta-model and the Cassadra meta-model) and the transformation rules for SiTra make up the model. The SQL service, Cassandra service and the Main_SiTra make up the controllers, and it offers one component as the view that is the UI. In the next section each of these components are elaborated on what they are and what exactly they do.

The model

RDBMS

As to what this is, the relational database management system is the food(or source) for cSiTra, (as to what exactly they do) the RDBMS supplies the data and the related schema that are supposed to be processed and made as Cassandra equivalents and pushed into Cassandra. The data in an RDBMS are usually nothing but alphanumeric values that exists as cells, and carry no meaning all by themselves, however these cells are arranged in rows and columns which give them some meaning and in turn these rows and columns form a table. Another way to say this is that the data is stored as rows of cells in the tables made meaningful of columns. And these tables relate to one another using the concept of constrains, where a constraint is a particular column in a table that can be used by other tables to relate each other. The following image shows a typical relational database:

Cassandra

Cassandra is nothing but just another database, however it’s principally very different if we compare it with an RDBMS. Though both Cassandra and RDBMS are nothing but just databases and basically just store “data” in them the approach they take to achieve this is very different. To give a quick real world analogy consider that a person wishes to travel from London to Paris for a holiday, this person can choose between taking a train or to fly in an airplane, though the end result is still nothing but reaching Paris from London for a holiday, the means by which he can achieve are principally very different from one another (and yes of course both train and an airplane have their own pros and cons). Cassandra is built with a different point of view in regard to the data it stores. Cassandra just does not store data in a relational manner and has its own unique way. The data in Cassandra exists as a pair of key and value, where value is the actual alphanumeric like data (it may also be something which is not exactly alphanumeric) and a key is the key to identify these values. These key value pairs are stored in a KeySpace which is the main entity of Cassandra. As the name says a KeySpace is a space for storing key-value pairs, and yes it would be highly illogical to just suspend the data randomly in a space and these key-value pairs would need a meaningful structure to sit on. To solve this problem a KepSpace offers structures called as Column families to hold the key-value pairs and it has some properties of its own that help to identify them. The following is an image that shows a typical KeySpace in Cassandra.

SQL meta-model

The data and schema we fetch from the RDBMS must be preprocessed and given a form and shape before we start transforming it into Cassandra equivalents, and the SQL meta-model provides a container for this need. The SQL meta-model is nothing but a subset of the RDBMS data model and it represents how the RDBMS data and related schema will exists in context of programming environment (which is Java in our case). An instance of a SQL meta-model will be used as a container to store the data and schema we fetch from the RDBMS and these object are then ready for further processing. The following is the image of cSiTra’s SQL meta-model and what follows it is a brief description about each of its subcomponents.

What follows is the description of each element of the model and the model as a whole. A colour scheme is used in this part where blue represents the parent element and orange represents the child elements and a red represents the properties of the parent.

The Key element of this model is a Database.

Each Database has zero or more Sequences and Tables, and each Sequence and Table belong to exactly one Database. If Database is removed, so are all of its Sequences and Tables.
Each Database has a property that represents its name (Datatype: String).

Sequences: are a feature of some database products which just creates unique values. It just increments a value and returns it.

Each Sequences has a property that represents its name (Datatype: String).
A property minValue (Datatype: int).
A property maxValue (Datatype: long).
A property incrementBy (Datatype: int).
A property startWith (Datatype: long).
A property curentValue (Datatype: long).
A property cycle (Datatype: boolean).

Table: is data structure which has some properties and stores values in form of rows and columns

Each Table belongs to exactly to one Database and has zero or more Constraints and Rows, and each of these Constraint and Row belong to exactly that one Table. If Table is removed, so are all of its Constraints and Rows.
Additionally each Table has one or more Columns, and each of these Column belong to exactly that one Table. If Table is removed, so are all of its Columns.
Each Table has a property that represents its name (Datatype: String).

Row: represents a single, implicitly structured data item in a Table.

Each Row belongs to exactly to one Table and has zero or more Cells, and each of these Cells belong to exactly that one Row. If Row is removed, so are all of its Cells.

Constraints: are used to specify rules for the data in a Table.

Each Constraint belongs exactly to one Table and has one or more Columns associated to it.
Each Constraint has a property that represents its name (Datatype: String).
A property that represents its type (Datatype: enum-ConstraintType).
And a property that refers the columns to which it is associated with presents called reference (Datatype: String).

Columns: are a set of data values of a particular simple type, one for each Row of the Table. They provide the structure according to which the Rows are composed.

Each Column belongs to exactly to one Table and has zero or more Cells and Constraints If Column is removed, so are all of its Cells and Constraints.
Each Column has a property that represents its name (Datatype: String).
A property that represents its type (Datatype: enum-DataType).
A property that represents its size (Datatype: String).
And a property that represents if it is a null called nullable (Datatype: boolean).

Cells: a database table is composed of rows and columns of Cells, they hold the actual data.

Each Cell belongs to exactly to one Row and one Column of a Table.
Each Cell has a property to store data called, value (Datatype: String).

The datatypes and constraint-types has been represented as enumerations

Cassandra meta-model

The main objective of the Cassandra meta-model is as same as that of the SQL meta-model. Cassandra meta-model provides a container on which the processed data can sit before it is pushed to Cassandra. The Cassandra meta-model is nothing but a subset of the Cassandra data model and it represents how this data will exists in context of programming environment (which is Java in our case). An instance of a Cassandra meta-model will be used as a container to store the processed data and these objects are then pushed into Cassandra. The following is the image of cSiTra’s Cassandra meta-model and what follows it is a brief description about each of its subcomponents.

The Key element of this model is a KeySpace (which is the container for your application data).

Each KeySpace has zero or more Indexes and Options, and each Index and Option belong to exactly one KeySpace. If KeySpace is removed, so are all of its Indexes and Options.
Additionally and more importantly each KeySpace has one or more ColumnFamilies, and each ColumnFamily belong to exactly one KeySpace. If KeySpace is removed, so are all of its ColumnFamilies.
Each KeySpace has a property that represents its name (Datatype: String).

ColumnFamily: is a NoSQL object that contains columns of related data. It is a tuple (pair) that consists of a key-value pair, where the key is mapped to a value that is a set of columns. In analogy with relational databases, a standard column family is as a "table"

Each ColumnFamily belongs to exactly to one KeySpace and has zero or more Rows, and exactly one PrimaryKey (composed of one or more columns). And each of these Row belong to exactly to that one ColumnFamily. If ColumnFamily is removed, so are all of its Rows and PrimaryKey.
Additionally each ColumnFamily has one or more Columns, and each of these Column belong to exactly that one ColumnFamily. If ColumnFamily is removed, so are all of its Columns.
Each ColumnFamily has a property that represents its name (Datatype: String).
And a property that to add a comment to it called, comment (Datatype: String).

Row: represents a single, implicitly structured data item in a ColumnFamily.

Each Row belongs to exactly to one ColumnFamily and has zero or more Cells, and each of these Cells belong to exactly that one Row. If Row is removed, so are all of its Cells.
Each Row may have additional Columns.

PrimaryKeys: are used to specify primary keys of a ColumnFamily.

Each PrimaryKey belongs to exactly to one ColumnFamily and has one or more Columns associated to it.

Columns: in Cassandra a column is a pair of name and data values of a particular simple type, one for each Row of the ColumnFamily.

Each Column belongs to to a Cell in a Row of a Column Family.
Each Column has a property that represents its name (Datatype: String).
A property that represents its datatype (Datatype: enum-Type).
A property that represents its size (Datatype: String).

Cells: is an entity they hold the actual data.

Each Cell belongs to exactly to one Row of a Column Family. They refer to a Column.
Each Cell has a property to store data called, value (Datatype: String).

Index: A key index, speeds up random access to data in the ColumnFamily. In Cassandra, the primary index for a column family is the index of its row keys.

Each Index belongs to exactly to one KeySpace.
Each Index has a property to represent its name (Datatype: String).
And a property that refers the columns to which it is associated with presents called reference (Datatype: String).

Transformation rules

“One Rule to change them all, One Rule to find them, One Rule to bring them all, and in the context bind them.”

Basically a transformation rule is, the principle in logic establishing the conditions under which one statement can be derived or validly deduced from one or more other statements especially in a formalized language.

SiTra is a simple Java library for supporting a programming approach to writing transformations aiming to, firstly use Java for writing transformations, and secondly, to provide a minimal framework for the execution of transformations. SiTra consists of two interfaces and a class that implements a transformation algorithm. The aim is to facilitate a style of programming that incorporates the concept of transformation rules.

Transformer – the primary transformation object; it contains a collection of rules, and manages the process of transforming source model objects into target model objects.

Rule – a rule deals with specific detail of how to map an object from a source model into an object of the target model. One or more rules may or may be applicable for the same type of object and it is necessary to have a means to determine the applicability of a rule for a specific object, not just its type.

Following are the rules that are incorporated in order to transform SQL to Cassandra (NoSQL)

`Database2Keyspace` (Database to Keyspace)

This rule applies to the SQL database’s schema and to the Cassandra’s Keyspace.
This rule takes the Database object and copies its schema name, and creates a new keyspace object with the same name as the SQL Database Schema.

`Table2ColumnFamily` (Tables to ColumnFamily)

This rule applies to the SQL table and to the Cassandra’s column family.
This rule, for each table in the SQL, takes the table object and creates a column family, in the keyspace, with the same name as the table.
It adds any options to the table if specified.

`SQLColumns2NoSQLColumns`

This rule applies to the SQL columns and to the Cassandra columns.
This rule applies to each column present in the SQL table object, and it takes a column object and creates a Cassandra column object with the same name.
The following table represents a sample list of SQL data types and the corresponding NoSQL data types to which they are mapped (cSiTra does not map all of them).

`SQLConstraints2NoSQLConstraints`

If the table has a constraint of type PRIMARY or COMPOSITE_PRIMARY then create a PrimaryKey for the column family with the reference of the column(s) object.
This rule applies to the SQL’s Constraints and to the Cassandra’s Constraints.
This rule applies to each constraint object present in the SQL table, and it takes the Constraint object and checks if the constraint is primary key constraint if yes, it creates a Cassandra primary key object.
A special case to be considered is, If the table has a constraint of type FOREIGN, then:

Create a column family for each foreign key instance (its naming convention is: <CurrentTableName>_<ReferenceTableName>).
Each column family has the primary key as referred table's primary key and the consequent columns are the referee table's primary key.

`SQLCell2NoSQLCell`

This rule applies to the SQL’s Cells and to the Cassandra’s Cells.
This rule applies to each cell object present in the SQL table, and it takes the cell object and copies the value of the cell, and it creates a Cassandra cell object with the same value.

`SQLDatatype2NoSQLDatatype` (SQL data type to NoSQL data type)

This rule applies to the SQL’s data types and to that of the Cassandra.
This rule applies to each data type present in the SQL columns, and its values is matched to an equivalent in Cassandra.

`SQLRows2NoSQLRows` (SQL rows to NoSQL rows)

This rule applies to the SQL’s rows and to that of the Cassandra.
This rule mainly comes into picture in while reading the data from the database row by row, and each of these rows then converted as corresponding Cassandra rows.

The controller

SQL Service

The SQL service is a class in cSiTra that fetches data and schema from the RDBMS, it basically establishes a connection with the RDBMS and instantiates an object of the SQL meta-model, fetches the data from RDBMS and stores them as SQL java objects are produced this way. These objects are then passed to SiTra for transformation. The following pseudo code gives a general idea about the SQL Service class

Main_Sitra

This is the Main class of the cSiTra. It’s the master controller, it inputs connection parameters from the user interface, utilizes the SQL and Cassandra service to fetch and push the data from the respective databases and uses SiTra and the corresponding transformation rules to transform the source object to target object. The following snippet gives an overall idea.

Cassandra Service

This is a class in cSiTra which establishes a connection with the Cassandra and it creates the KeySpace and pushes the data into it. The following pseudo code gives a general idea about this class.

The view

The UI

The cSiTra offers only one component as the view. It’s the following user interface (shown as an image), it has fields for the user to enter the connection parameters for both RDMS and Cassandra databases.

Flow of Control

A Quick look at the flow of control in cSiTra.

When we run cSiTra the Main_SiTra is called which is the actual entry point and, the first event that takes place is displaying the UI for the user to enter the RDBMS and Cassandra database connection details and credentials:

SQL jdbc driver
Database name
Schema name
User name
Password
Cassandra sever IP
Port number.

And then when the user hits “ok” button the transformation will occur in the following sequence:

Main_SiTra calls the SQL service into action, by calling one of its methods that fetches data and schema from the RDBMS and produces SQL-meta-model objects.

Getting the RDBMS data & Schema and putting them as metamodel objects

After connecting to RDBMS Server by using the details provided by the user, cSiTra will immediately start generating the SQL meta model objects by calling generate() in SQLService class.
In generate function first the “Database” object is created and the name property is set to be “databaseName_schemaName”. Then getTables() will be called which in turn will call getColumns() and getRows() functions.
1. getTables : Get the tables meta data from table_schema , for each table create sqlTable object and fill the name, type , size properties with fetched meta model data then add the table to the database.
2. getColumns : Get the columns for a certain table from information_schema.columns. Generate a sqlColumn object for each column and add it the corresponding table.
3. getRows: Get the data for a certain table and fill it with cells objects. Where each row can contain may cells and each cell is connected with one columns. And the row objects will be added to the corresponding table.
After getting the objects it will be passed to SiTra to start transformation.
Next the Main_SiTra uses the SiTra & the available transformation rules and starts converting the source model (SQL-meta-model objects from step 1) to target model (Cassandra-meta-model objects).

SiTra transformation

The SiTra transformation is in a nested format. The logic of the aforementioned rules is implemented in classes extending the rule class.
1. Database2Keyspace: In here, the database object is converted into a keyspace object. The name is transferred as well as any options or comments specified. Then each of the tables in the database is transformed into a column family by calling the next rule. It returns the transformed keyspace.
2. Table2ColumnFamily: In here, the corresponding column family object is created in the above keyspace. The default replication factor is set to 1. The name, options and comments are transferred. After that each of the columns in the table is transformed by calling the next rule. The next step is to check each column for constraints.
  - If the column has a Primary Key constraint, the corresponding column in the column family is added in the references of the PK of the column family.
  - If the column has a Foreign Key constraint, a new reference table is created with the name <CurrentTableName>_<ReferenceTableName>. The Primary key is set to the Primary key of the referred table’s primary key columns. This table is then added to the main keyspace.
  After considering constraints, the rows of each table are transformed. The cells in each of the rows are considered one by one. They are mapped into their respective Cassandra cell and have the reference of the column in the column family they belong to. In case of the foreign key, the reference table is considered. If the row with the referred primary key already exists, the referee cell’s primary key is added on to the additional columns list. If not, a new row is added with the primary key and the latter step is executed. It returns the transformed column family.
3. SqlColumn2NoSqlColumn: In here, the SQL columns are mapped to Cassandra columns. The name is transferred as well as the Datatype, according to the next Datatype mapping rule. It returns the transformed column.
4. DatatypeMapping: This defines which DQL datatype closely matches with the datatype from Cassandra, and returns the transformed datatype.
After getting the objects it will be passed to SiTra to start transformation.
Now the control is transferred to Cassandra Service which parses the Cassandra equivalent java objects produced in the step 2 and pushes it to the Cassandra database.

Parse Cassandra meta-model objects and data

The result of the transformation is Cassandra meta-model objects which can be parsed to create Cassandra KeySpaces and column-families and fill it with data.
To start parsing, we will call generate() method in Cassandra Service and pass the objects to it. This method will call createSchema() which will generate Cassandra KeySpace and its column-families then call fillData() to fill the tables.
1. createSchema : create keyspace in Cassandra with options {'class':'SimpleStrategy', 'replication_factor':1};";, it will also create the tables for the keyspace, if the table is a foreign key table , it will has a foreign key column and another column to hold related data with list datatype.
2. fillData () : for each table in keypsace , this method will fill it with the related data stored in the nosql rows objects related to the table.

cSiTra prints each and everything it does to the console throughout the process, information such as timers for reading the data from RDBMS, conversion, and writing the data to Cassandra, and details of each entity read, converted, and written are also displayed, so are any warnings, exceptions, or errors if they occur.