Project Ideas for GSoC 2012

=Welcome=

Welcome Google Summer of Code Student aspirants (:razz:)

This page lists project ideas developed by the LimeSurvey Community. These tend to be areas that will get the most support as projects since they have been developed by people who know the project and what it needs the most. However, if you have your own idea for a project discuss your awesome idea with us in our forums, mailing list or at #limesurvey on irc.freenode.net. Then submit your proposal. Good Luck (:biggrin:)

=Project ideas=

Refactor all question types to a modular OOP format
If you look at the LimeSurvey source code (yii branch) you will notice that the code for the question types is strewn across the application.

The goal of this task is to rewrite all question types to be objects, primarily inherited from one or more base question objects. A question type should have methods for display (several types of display like front ent/printable survey), response saving, configuration, statistics, etc.

Skills
 * Extensive knowledge of PHP & Yii framework
 * Very good kownledge of LimeSurvey inner workings
 * Very good OOP knowledge

Difficulty
 * Hard

Probable Mentors

Carsten Schmitz (c_schmitz)

Implement a fully featured XML-RPC API + API tests + unit test
LimeSurvey urgently needs an API so all administration functions (and later even survey-taking functions) can be accessed without using the GUI. In addition you will be implementing an according unit test framework and implement unit & API tests across the whole application so we can make sure at any point in the development process that the application is functioning as it should.

Skills
 * Good PHP, SQL, Yii knowledge
 * Good knowledge of the LimeSurvey administration and code

Difficulty
 * Hard

Probable Mentors

c_schmitz

User Interface Improvements and Backend Tidy Ups
Help improve LimeSurvey's usability and flexibility through a range of small tasks such as:
 * develop improved 'bulk' question handling UI
 * Tidy up input values on public surveys by moving required values from html inputs to javascript variables

The idea is to provide ways of updating and modifying multiple questions, or multiple groups in one go. Often in a survey a user will need to make parallel changes to 5 or 6 questions all at once across a range of different groups. Being able to select various elements of questions, and then select a range of questions to edit would dramatically improve LimeSurvey's usability from the administrators perspective.

Skills

PHP, SQL, Yii, Authentication protocols (openID, CAS, LDAP-bind, ...)

Difficulty

Medium

Probable Mentors

Jason Cleeland (jcleeland)

Implement a plugin framework for authentication/languages/ import/export modules
The idea is to design and implement a modular "framework" for LimeSurvey as well as convert the existing functionality to according modules. The framework will define the API each plugin type must (or may) implement and propose basic methods that can be used by plugins (for instance how to store the plugins parameters in DB, etc.).

The plugin framework should be in parts similar to the framework used by Joomla (XML description files in the plugin packages)

Example description for authentication moduls:

Currently authentication is only used for the survey-administration GUI and not the participants interface (tokens are used for this). With the new authentication framework it should be possible to define several authentication backends for frontend (survey participation) and backend (administration)

A generic authentication framework must define the following services:
 * User authentication: this interface must return the identity of the authenticated user if authentication is successful.
 * Authentication may not always be based on a simple user/password form, so the proposed framework must be generic enough to enable authentication based on other schemes (using any numbers or interaction-pages between the server and the end-user or any other contextual parameter such as Referrer, session variable...
 * User provisioning:
 * when activated on the survey administration interface, an authentication module might be able to create a newly authenticated user into the LimeSurvey internal DB (which is required for setting the user rights on the platform). The newly created user can be assigned a default profile or a per-user profile (queried from an external database).
 * when activated on the participants interface, the authentication module will provision the token table of the corresponding survey // alternatively the general-purpose cross-survey participants-database as described in the above GSoC idea.

Skills

PHP, SQL, Yii

Difficulty

Hard

Probable Mentors

c_schmitz

Diogo Gonçalves (dionet)

Want to know more

Please read the FAQ about this project

Custom Report Generation
The task is to make a module which will generate custom reports. Module should be able to do these things(at least) :-
 * Creating various types of reports i.e. using tables, pie charts, graphs and bar-charts to name few.
 * The resulting charts and other illustrations should be easy to read and understand, and should include the ability to export into standard office suites.
 * Reports can be general or survey specific.
 * General in the sense that it should show basic findings of a general survey graphically like number of users completing the survey, average time taken to complete the survey, average number of correct responses etc.
 * Survey specific results should also be addressed properly e.g. how many users choose first option as there answer of a particular question, how many users didn't answer a specific question etc.

Skills

OOP experience in PHP and experience with a PHP framework like CakePHP or CodeIgniter, jQuery. Strong mathematical background will help!

Difficulty

Moderate

Probable mentors

Carsten Schmitz (c_schmitz)

Jason Cleeland (jcleeland)

Diogo Gonçalves (dionet)

Adding new questions/subquestions and groups dynamically
Currently in LimeSurvey once a survey is activate we can not
 * Add or delete groups
 * Add or delete questions
 * Add or delete subquestions or change their codes

The idea is to cover these shortcomings. And in addition, all those participants who have already taken that survey must also be notified about the changes. You should also work out in detail what happens to existing results when a survey is changed that way.

Skills

Experience with the PHP framework Yii and knowledge of inner working of LimeSurvey is must!

Difficulty

Moderate

Probable mentors

Jason Cleeland (jcleeland)

Diogo Gonçalves (dionet)

Enhance ExpressionScript
Starting with LimeSurvey 1.92, all of the front-end (survey-taking) processing is now managed using ExpressionScript (EM). EM implements a safe subset of PHP syntax so that authors can write complex equations. It is integrated into LimeSuvey to control navigation (branching via relevance), validation, and tailoring/piping of content. Although it is quite powerful, users are already asking for enhancements.
 * 1) Add a GUI for ExpressionScript
 * Background:  LimeSurvey has a nice GUI for building conditions, but it does not support complex equations such as those available within EM.  EM does have robust syntax highlighting after an equation is saved, which makes it easy to fix any syntax errors.  Hoever, users would like a GUI to ease authoring of equations.
 * Strategy - use CodeMirror - see here
 * Create CodeMirror syntax file to do appropriate syntax highlighting (use C as base language), ensuring that CodeMirror knows the set of valid operators
 * Use CodeMirror API to let it know the names of registered function and variable names
 * Utilize auto-complete, ideally letting users see and choose among the availalbe function  syntaxes (e.g. when a function can take several paameters).
 * Auto-completion of variable names should show the question and question type to make it easier for users to pick the correct variable.
 * 1) Support subquestion-level Relevance
 * Background:  EM already supports user-entered relevance equations at the group and question levels.  It also generates subquestion-level relevance for features like array_filter.  Users would like to be able ot add additional relevance criteria at the subquestion level.
 * Strategy:
 * Add a relevance column to the display of subquestions, and ensure entered relevance is saved the to database (the data model is already OK)
 * Refactor createFieldMap (or its successor) to read subquestion-level relevance into the run-time data structures so that passed to EM
 * Enhance EM's subquestion-level data structures to hold subquestion-level releavance
 * Update EM's _ValidateQuestion and JavaScript generation functions to AND any manually-entered subquestion-level relevance equations with that auto-generated for array_filter and array_filter_exclude
 * Update ShowSurveyLogicFile to show subquestion-level relevance equations ***Rigorously unit and regresssion test these changes.
 * 1) Add Sub-question-level Validation
 * Background:  The question-level validation supports regular-expression-based validation of each subquestion.  Users would like to have subquestion-level validation to implement things like a question for collecting user contact information, where there are different validation criteria for address parts (e.g. city, state, postal code) and phone numbers.
 * Strategy:  Similar to subquestion-level relevance
 * Add data entry fields for regular expression validation at the subquestion level, and ensure saved to database (which already has a preg field at the subquestion level)
 * Ensure EM gets access to the new preg values (via upgrade to createFieldMap)
 * Upgrade EM _ValidateQuesation and function for generation of validation equations to include these new critiera.  EM already supports validation at the subquestion level (and changes the CSS style to show fields that fail this validation)
 * Upgrade ShowSurveyLogicFile to show subquestion-level validation rules
 * Rigorously unit and regresssion test these changes.
 * 1) Add Sub-question-level mandatory criteria
 * Background:  Many users have asked, via the forums, for ways to make parts of a multi-part questions mandatory, but make other parts optional.  LimeSurvey already provides many options, such as minimum and maxium numbers of answers; but users have asked for more fine-grained control, and currently can only achieve that via custom JavaScript.
 * Strategy:  Similar to subquestion-level relevance
 * Add checkbox to indicate mandatory status at the subquestion level
 * 1) Add native support for input-masks at the question and subquestion level
 * Background:  One of the commonly used work-arounds deals with input masks, such as the jQuery meimomask plugin.
 * Strategy:
 * Add mask attribute at question and subquestion level
 * Have EM generate needed JavaScript code to create and manage those masks
 * 1) Add EM reporting functions (tables)
 * Background: There is commonly request for enhancemnt of the print answers table at the end of the survey.  Some users want to generate custom reports mid-survey.
 * Strategy:
 * Implement showAllResponsesExcept(attributeList,attributeTitleList,questionList) function.  questionList = list of question identifiers; attributeList = pipe-delimited list of attributes (like question#, title, text, type - so you can decide what to show); attributeTitleList = pipe-delimited list of table headers, so can internationalize the report.
 * Implement showTheseResponses(attributeList,attributeTitleList,questionList) function. questionList = list of question identifiers; attributeList = pipe-delimited list of attributes (like question#, title, text, type - so you can decide what to show); attributeTitleList = pipe-delimited list of table headers, so can internationalize the report.
 * 1) Add better EM support for operations on array type questions
 * Background:  Many of the EM validation rules are effectively statements like count the number of empty subquestions, or sum the values of the subquestions.  EM generates these functions itself, so there is no burden on the user, even if there ar dozens of sub-quetions.  However, manual editing of these questions is cumbersom. Users could benefit from special variables to access all of the elements of a question so that they could write functions like (sum(this) == 10)) and have it be expanded to (sum(q1_1, q1_2, ..., q1_N) == 10).  Note, there is already a "this" variable, but it does not apply to subquestions.
 * Strategy:
 * Extend EM so that "this" variable gets expanded into a comma separate list of subquestion references if there are subquestions.  Have this expansion carry the suffix, so this.valueNAOK would become q_1.valueNAOK, q_2.valueNAOK, ..., q_N.valueNAOK.
 * This macro expansion should occur in group.php (e.g. in the process of generating JavaScript), rather than having this variable be resolved at run-time within JavaScript.
 * Provide similar array-expansion macros for all variables (not just "this") - such as qcode_vars.*.  This would also functions like sum(qcode_vars.NAOK), count(qcode_vars.NAOK), implode(' ', qcode_vars.valueNAOK)
 * For questions with comments, create macros like qcode_vars_nc and qcode_vars_oc for no-comments and  only-comments
 * For arrays that might need row or  column-level processing, create aliases  like qcode_rowname_vars and qcode_colname_vars so can expand each.  This would let us replace the current system for generating row and column sums with equations like sum(qcode_rowname_vars.NAOK) and be sure that the sums will honor array_filter and array_filter_exclude.
 * 1) Add EM functions to validate data entry against value sets managed by Enterprise Vocabulary Systems
 * Background:  Healthcare and biological sciences increasing use large controlled vocabularies, terminologies, or ontologies.  Data entry systems for such domains require validation against those vocabularies.  Large open-source projects, like LexEVS and Apelon DTS provide open APIs to access that content.  Such tools let one validate diagnostic codes, and even do incremental search into those vocabularies as one types.  The main open projects are standardizing on the CTS-2 (Clinical Terminology Services-2) specification.
 * Strategy:
 * Create EM-compatible PHP and JavaScript functions to access CTS-2 compliant EVS systems
 * Should include question and subquestion-level validation rules to validate the final entry.  Currently, thre are em_validation_q and em_validation_sq advanced quation attributes for validating questions and subquestions based upon calls to external functions.  It may be desirable to support custom subquestion-level validation to validate each subquestion against different value sets.  If so, can follow the model of adding subquestion-level validation and relevance.  Alternatively, may be desirable to add subquestion-level advanced question options (if LimeSurvey community feels that there may be enough such extensions that such customization should be stored in a general attribute table rather than making the questions table wider).
 * Should also include ability to do incremetal searches into the value sets

Skills

PHP, SQL, PHP Debugging (e.g. xdebug), Yii, LimeSurvey's ExpressionScript and code-base in general

Difficulty

Medium to Hard

Probable Mentors

Thomas White (TMSWhite)

Optimize LimeSurvey for Long Surveys
Although LimeSurvey is excellent for short and mid-length surveys, it is not optimized for surveys with hundreds or thousands of questions - such as those used by epidemiologists or in clinical trials. This project would tackle each of those main limitations. Those sub-tasks, and the development strategy include:
 * 1) Support for more than 1000 database columns
 * Background: LS creates a horizontal table for survey data collection, and such tables often are limited to at most 1000 columns
 * Strategies:
 * Remove unneeded columns from horizontal table (e.g. type 'X')
 * Conditionally remove unneeded equation columns (e.g. let users specify a prefix for variable names that should not be stored)
 * Add option for Entity Attribute Value data model for data collection (which can support unlimited number of columns.  See details here.
 * 1) Memory and Code Optimization
 * Background: LS loads the survey definition data model into several different data structures, such that it uses at least twice as much storage as it really needs.  ExpressionScript now holds all of the instrument definition data, so the other data stores are no longer needed.
 * Strategy
 * Remove need for buildsurveysession - gap analysis to add in any missing content to EM
 * Remove need for createFieldMap - similar gap analysis
 * Normalize EM data structures to avoid internal duplication (and refactor code to use normalized structures) (and document new data structures so that future developers know which to use, and how to use them)
 * gInfo - renaming of groupSeqInfo, plus add missing group-level attributes (relevance, description)
 * qInfo - renaming of questionSeq2relevance, plus add any missing attributes from $fieldarray; remove aid, sqid; move grelevance to gInfo
 * aInfo - for storing answer arrays?
 * sqInfo - renaming of q2subqInfo, indexed on sgqa(?); remove content gleaned from gInfo and qInfo
 * gStatus - renaming of indexGseq - make hold only dynamic values; so move gtext and gname to gInfo; also remove gRelInfo, keeping any unique variables it contains
 * qStatus - renaming of indexQseq - make hold only dynamic values; so move qtext, qhelp, gtext, gname to qInfo
 * sqStatus - renaming of subQrelInfo - make hold only dynamic values
 * groupRelevanceInfo - consolidate into indexGseq?
 * knownVars - instead of copying content, use reference to gInfo and qInfo (e.g. remove question, relevance, grelevance, qcode, ansList, ansArray, onlynum).
 * varNameAttr - remove; generate on the fly from gInfo, qInfo, and knownVars
 * alias2varName - remove; generate on the fly from knownVars
 * Refactor EM so that stores normalized copies of secondary language text
 * Optionally load only current group for each page transition rather than holding entire survey definition in memory
 * Refactor EM for consistent variable naming
 * e.g. questionId => qid; groupNum => gid; groupSeq => gseq; questionSeq => qseq
 * 1) Run-Time Performance Optimization
 * Background:  qanda.php loads the content to create the questions and answers.  It used to do this by separate database queries per question.
 * Strategy
 * Refactor qanda.php to remove queries (like for "other", or subquestion text) - have it get that information from EM
 * Refactor LS so that language switching does not require a re-load of the core logic, but just the new language content  (and that it gets this from EM)
 * Refactor replacements.php and EM
 * Should only need to call replacements.php once per page, so set those valeus as locally static in EM per page.
 * Refactor group.php to pass {QUESTION_*} via replacements array, rather than as globals passed to templatereplace
 * 1) Design-Time Performance Optimization
 * Background:  LS used to only load the information from the data model that was needed for the given question or group.  EM had to load the entire data model to properly syntax highlight everything.  This can lead to some performance degradation in very long surveys
 * Strategy:
 * Add methods to EM to just load changes to model as questions or groups are added, removed, or updated
 * Pass those updates into EM internals so that syntax highlighting continues to be correct
 * Ensure that admin pages only call the subset of EM functions needed to do accurate syntax highlighting.
 * 1) Optimizations for Rapid Development
 * Background:  LS has a nice GUI for editing single questions at a time.  However, it is not optimized to make changes to multiple questions at a time.  Some competitor systems let authors design surveys using an Excel template (e.g. so that they can do bulk find and replace, or easily copy similar portions or answer lists).  Short of a full-blown AJAX-enabled admin system, this has the highest throughput potential
 * Strategy:
 * Create an Excel data model that would work for importing surveys
 * Create import and validation routines from that model
 * A similar model, which could be extended for LS, is noted here.
 * 1) Performance Validation
 * Load test surveys of varying lenghts
 * Identify performance and memory bottlenecks
 * Idenfity minimum memory requirements for certain survey lengths and concurrent user volumes
 * Propose strategy to overcome those performance issues

Skills

PHP, SQL, PHP Debugging (e.g. xdebug), Yii, LimeSurvey's ExpressionScript and code-base in general, Load testing tool (e.g. webload), PHP Profiler

Difficulty

Medium to Hard

Probable Mentors

Thomas White (TMSWhite)

Idea template
Describe the idea here in general terms

Skills

Explain what sort of coding skills would be needed for a student to implement this project

Difficulty

Explain the level of difficulty involved

Probable Mentors

Put your name (and tag) here if you are willing to mentor a student for this idea

=More information=

Getting started
Check out our 'Get started' page for setting up the development environment, coding standards, and all the other important stuff that you need to know before the real fun begins!

Frequently Asked Questions
Check out our GSoC FAQ page.