Database Storage Engine Project

= Outline=

The project is focused around the Google Summer Of Code of 2010. Maarten ttielu) and Pieter-Jan MrP) are the students assigned to the project.

The project is based on the DB Storage Engine proposal found at Project ideas for GSOC 2010.

= Design and implementation=

A basic domain model of the problem can be found in this pdf file.

This domain model can be used to build multiple databases. One would be to see the objects as tables. For the inheritance there would have to be some analysis to see how they should be mapped to the database. Here it would be important to use a low amount of joins. This structure has the disadvantage that when querying all information of a certain user, loads of joins might be required (depending on the complexity of the survey). On the other hand, queries about all answers on a single question can be very efficient. On top of that, tables are small.

Another example could be to remove the QuestionDependency and make one table per section. This would allow a more flat database structure, which reduces the number of joins necessary to retrieve all data from a single user. This could however create large tables and removes the possibility to loop over one question. This question will have to be placed into a new section and the loop will have to go over that section.

To allow these multiple database formats, we plan on using a factory pattern. A DatabaseLayoutFactory can be used to choose the layout for a survey and in this way make it possible to add new database layouts for future use (for example a database which duplicates the data in both structures to get fast retrieval of both types of query).

Our plan would be to model the presented database models in detail and then implement at least one of them. Also we would want to model an API for the database (and of course implement that).

After creation of the survey, questions will be more or less static. To build upon this, it would be a good idea to store/cache the questions in a more static form, like a generated html-form, after creation of the survey.

Flat database
A flat database would exist of a single table containing all answers to all questions.

Properties of this layout are:
 * easy to export
 * fast database operations
 * hard to implement loops, branching & conditioning (introduce maximums on looping and have loads of NULL values OR use TEXT/BLOB fields containing sub structures => slow and unsearchable with quick operations)
 * not suited for exotic question types (upload picture, matrix questions, multiple answer, multiple type questions, ...)
 * database column limits might be hit
 * best for simple surveys with huge amounts of answer data

Multitable database
A multitable database would exist of multiple tables each containing the answers of a single question or questiontype.

Properties of this layout are:
 * more difficult to export
 * slower database operations (joins)
 * supports all sorts of branching, conditioning, loops, ... including future features
 * can support pictures, matrix questions, ... easily
 * no problems with database column limits
 * better for complex surveys with smaller amounts of replies

Hybrid database (section flat)
The section flat model is a mix of the previous two models. In stead of mapping all answers in one table, we would map all answers to one section in one table. This allows the more advanced features (like looping, ...) to be handled between sections while still having a fast export of the different sections.

Properties of this layout are:
 * easy to export
 * fast database operations for simple surveys, operations for surveys with loads of loops, branching, ... will need joins (to handle the dependencies)
 * database column limit only for really long sections (but they could be split in multiple sections, solving the problem)
 * can support multiple question types (might result in using more columns, like in case of matrix questions)
 * results in looping over sections instead of looping over questions in sections (hence if there is a loop over one question, there will be a section containing only this question; multiple questions in the same loop can be grouped in one section)

Layout mock-ups
Mock-ups for the database layouts can be found in this pdf file.

The survey used is:

1) how many children do you have? int

loop: amount of children {

2.1) give the name of you i'th child string

2.2) what is the age of your i'th child int

}

3) What kind of ice cream do you like? (multiple choice)

3.1) I don't like ice cream

3.2) vanilla

3.3) strawberry

3.4) chocolate

3.5) other: 3e) string

4) what do you eat? (multiple answer: checkboxes)

4.1) candy

4.2) cookies

4.3) fruit

= Action Plan=

(To be specified more in detail later)

Outline
Main deliverables:
 * Database design for flat database
 * Database design for multitable database
 * Database design for section-flat database
 * Class diagram for the database system
 * API and documentation with the database system - keep support for metadata in mind.
 * Implementation of the database system with at least one database layout (more if spare time is found)

Milestones
The first series of milestones is slightly smaller since both Maarten and Pieter-Jan have exams in the month of June.

The second series of milestones contains the biggest part of the implementation and testing.

= Meeting log=

The meeting log can be found here: Database Storage Engine Project Meeting Log