Development - File Upload question type

=Functionalities=

This project aims at achieving the following functionalities.

survey participants can upload one or more files.

 * file upload feature with a beautiful interface (see mockups below)
 * an option to preview the file and/or edit its metadata like title, comments etc
 * The use of the file' URL for uploading from the web instead uploading from one's own computer

configuration options

 * max num of files
 * max/min filesize
 * file type(s) allowed

Safe storage in file system:

 * Validation of the User input
 * Take other security measures

Implementation in

 * statistics
 * data entry
 * printable survey
 * response browsing
 * response editing
 * RemoteControl

Hooks for 3rd party processing tools
=Components=

This project consists of the following basic components

Uploader
Every file upload question type will have a upload button. When the surveyee clicks on the upload button, he comes across a loading screen. (See mockup)

http://bluwiki.com/images/7/7e/Ajax_loading.png

Figure: Once you click on the upload file button for the file upload question, the loading bar comes up.

Then a box comes up where the surveyee can upload one or more files to the question. It also displays the file types that are permitted for this question. (The number of 'browse & upload' buttons will depend on the maximum number of files allowed for that question). Every file being uploaded has its own progress bar and cancel button. (See mockup)

http://bluwiki.com/images/c/cf/Upload_progress.png

Figure: A progress bar shows the progress of individual files, it also has an option to cancel the file uploads in the middle of the process.

While uploading files, we need to make sure that the server actually support a certain filesize because often servers are restricted on the post max size. This check can be added to the sanity checklist for the survey activation.

Also, once the file has been uploaded, a thumbnail should be created for easier browsing of results.

Once the files have been uploaded, the surveyee is taken to the gallery view. Here he can preview the file, edit its metadata like title, comments etc. If he wishes, he can also delete individual files or cancel the entire upload at this stage. (See mockup)

http://bluwiki.com/images/a/a3/Gallery.png

Figure: The gallery gives a preview of all the files uploaded. Here you can edit the title and add comments to the file. If you do not want to save it, you can also delete the file. Once done with editing etc, you can submit the files.
 * Some of the salient features of this uploader are as follows:
 * Rich progress and status information
 * Applying filters to file on the basis of filetype and filesize
 * Option to cancel the upload anytime
 * ability to keep a upper bound on the number of files uploaded per user/question/survey
 * Simple Uploader: A simple PHP uploader can be used as a backup If one has trouble uploading via the advanced uploader.
 * In-built error handling for the following errors
 * Invalid filetype
 * File too large
 * error while saving to filesystem

Database
Database Schema

In the edited schema, I will not create any new tables. When a new file upload question type is created, the following attributes will be saved into the already existing question_attributes table. I will not making any modification to the database schema for saving the question attribute.

max_filesize (int)

max_num_of_files (int)

min_num_of_files (int)

allowed_filetype (string)

So, basically no new field/attribute is being created for questions, they are all being saved into the pre-existing question_attributes table. The question_attribute table would look something like this:

qaid    qid           attribute              value

110     12            max_filesize            2048

111     12            max_num_of_files        3

112     12            min_num_of_files        2

113     12            allowed_filetype        "JPEG, png, bmp"

But, in the case of answers, we'll have to create the following fields in the survey_x table to save metadata information such as

`file_json` (string)

number_of_files (int)

where file_json will be a JSON string which will contain the following information for all the files belonging to that particular question:
 * title : The name the surveyee would like the admin to see
 * filename : The random filename generated by LS and used for saving into the filesystem
 * file_comments : The file comments added by the surveyee while submitting the survey
 * extension : The extension of the file
 * file_size : The actual size of the file
 * order : The order in which the surveyee wants the uploaded files to be arranged

The attributes such as file_comments, title, order are configurable, i.e. the admin can decide weather he wants these attributes to exist or not.

Compress (Zip) and download
PHP:ZipArchive class
 * From PHP 5.3.0, zip extension comes preinstalled.
 * Following is a code snippet to demonstrate the beauty with which the zipping task can be handled.

open('test_with_comment.zip');

print_r($za);

var_dump($za);

echo "numFiles: ". $za->numFiles. "\n";

echo "status: ". $za->status  . "\n";

echo "statusSys: ". $za->statusSys. "\n";

echo "filename: ". $za->filename. "\n";

echo "comment: ". $za->comment. "\n";

for ($i=0; $i<$za->numFiles;$i++) {

echo "index: $i\n";

// process the file contents

}

echo "numFile:". $za->numFiles. "\n";

?>

http://bluwiki.com/images/3/36/Zipdnld.jpg

Figure: The admin can select the response(s) by checking the checkboxes and download them in zipped format.

System Security
"Security is a process, not a product, and adopting a sound approach to security during the process of application development helps produce tighter, more robust code."

In order to make the upload process secure, the following security measures and practices will be taken into consideration during development.

1. Restrict Filetypes: Check the mime-type and file extension and only allow certain types to be uploaded.

2. Rename Files: Rename the files that are uploaded. Eliminate the files with extensions that are not allowed.

3. Blacklist Files: Blacklist files such as exe, zip, .php, .js, .py etc using a preg match and prevent upload of any such file.

4. Change Permissions: The files within the upload folder shouldn't be executable, hence change the upload folder's permissions to make sure that it is not executable.

5. .htaccess files: Place a .htaccess file inside the uploads and tmp directory to make them inaccessible

6. Disable Directory Indexes: add a index.php file into all the directories including tmp/ and upload/ to make sure that they are never exposed if they somehow become accessible due to administrative errors. If the directories are not supposed to be exposed or user accessible, the index.php must redirect them to the home page.

7. Upload folder outside WWW root: Doing so might not be a possibility for shared hosting. Hence, it must be a configurable option in the config-defaults.php

8. Access Control flaw: do a user authentication check on every page to make sure that the enduser is unable to access sensitive information that the admin can access. The user authentication project idea should take care of it. (if that idea is selected for gsoc this year)

9. Make sure that you never create a non PHP file in the web-exposed directory for files containing sensitive information. For example, renaming a file to fileName.php.bak would be a huge security risk. If someone stumbles upon the URL, it will be displayed as it is without being parsed by the webserver. And if the file contains password or other sensitive information, it can be read by the user. It could even end up being indexed by Google if the spider stumbled upon it !!!

File Security
Given that the surveyee can upload files to the system, it becomes very essential to ensure that the uploaded file is not malicious.

I don't have any prior experience in the field of file system security. Hence I did quite a lot of research on the issue, googled, exchanged emails, discussed over forums and mailing lists, and also had some interesting conversations with developers on #mantishelp (Mantis - the bugtracker used by a lot of organizations) as they have a similar file upload feature. This section on file security is more like a discussion of the various available options and not a precise plan. Implementation of this feature will require quite a lot of brainstorming and discussion because each of the techniques mentioned here have their own merits and demerits.
 * Sandbox: The best and most efficient means to secure a filesystem is sandboxing. Sandbox provides a security mechanism for running unverified programs. We'll need to process the uploaded files using third party tools etc. A malacious file could take undue advantage of a security hole in the third party tool. Hence, all the third party tools will execute in a sandbox which will provide a tightly controlled set of resources with strict permissions.
 * Principle of least previlege: Eevery module or file must be able to access only such information and resources that are necessary to its legitimate purpose. There is no reason why the uploaded directory should be executable. Hence, we must strip the directory permissions to the bare minimum. If a file needs to be manipulated, it can be moved to the sandbox, and manipulated. After the required manipulation has been done, it can be moved back to the upload directory. It will involve some overhead, but it would be a foolproof method against any malware/virus etc
 * PHP Fileinfo Extension: This extension tries to guess the content type and encoding of a file by looking for certain magic byte sequences at specific positions within the file. Hence, this file allows PHP to guess MIME type of the uploaded file. This information can be used to ensure fool proof blacklisting of dangerous files.

Hence, even if we get a virus or malware in our system, we can make sure that it never gets an opportunity to execute and hence prevent any damage to the server. This technique is very popular in virtual systems as well as software testing.

I also observed that the file upload task is something very similar to the adding attachment feature in the Mantis bugtracker. I got in touch with the Mantis developers on their mailing list, and also had quite an interesting conversation with them on their IRC channel #mantishelp.

I was surprized to discover that currently Mantis doesn't have any protection measures against threats. They do have some specific usecases though. It turns out the primary reason behind this is that most webservers have their own proprietary Anti virus software and the uploaded files are never executed on the server. And when a file is uploaded, it is by default checked for virus by the antivirus.

But, we obviously cannot count on this, so I've come up with the following idea for explicit virus checking:
 * Clamscan: Clam AntiVirus is an open source (GPL) anti-virus toolkit for UNIX. It provides several utilities such as command line scanner as well as automatic database updates (virus definitions). It can be called from a PHP script.
 * The admin can configure whether to use clamAV or his own proprietary Anti virus software. He can browse to locate his Anit virus executable on his system or enter the path to AV/clamAV . The antivirus can be triggered from a PHP script and the uploaded file can be checked for malware.

Response Browser
Once the files have been uploaded, the admin should be able to view them online (i.e. without having to download them to his system). I'll attempt to add support for as many file types as time permits. But, the support for the following is the required mininum:
 * 1) Images: Gallery to browse images
 * 2) Text files: ability to view text files (security issues - take a leaf from mantis' page!)

When dislaying text we should only display the first X chars and link to the file, otherwise when having huge texts the overview might be messed up!

Hooks for third party tools
Once the files have been uploaded, there should be an option to modify them using third party tools. For example, the admin should be able to perform a batch operation like converting all videos from .mp4 format to .avi format. Once the admin selects these files by checking the checkboxes corresponding to these responses (similar to compress and download screen - see above), the third party tool will be triggered and the operation will be performed on all the files.

Disk Space Issues
Given that we will be saving the files to the hard drive, running out of disk space is quite an issue here. We can use a PHP script with the PHP's inbuilt total_disk_space and free_disk_space functions to find out the amount of free disk space available with the system.

We can also configure a predefined limit (say 80% disk full), after which an email will be sent by the LimeSurvey's email notification system informing him about the low disk space issues. The admin can zip and download the files onto another drive.

We can have the following restrictions built into the system and can be configured while creating the file upload type question.
 * 1) maximum file size
 * 2) total disk space per question
 * 3) total disk space per user
 * 4) total disk space per survey

If any of these limits are reached, we can flash an error to the surveyee and prevent him from uploading any more files to the system.

We can also include a total_filesize_per_user stat in the statistics so that the admin can get an idea of the per user diskspace requirement and he can accordingly tweak the allowed_file_size in the configuration.

Dataentry screen
self Note : add mockup

Add a mockup for the dataentry screen. The same uploader module is to be used, but the interface should be compact and without any popup.

Statistics Page
The following filters will be applied to the statistics page corresponding to the File Upload Question Type:
 * Number of files uploaded: Range [0, maxfiles] where maxfiles = number of maximum files that can be uploaded by the surveyee for a particular question.
 * File Type i.e file extension: eg: pdf, doc, txt, etc

Import/Export
Modify current import/export module to implement the import/export for File Upload Question Type. The results will be exported zipping the response files with appropriate nomenclature, so that they can be easily navigated.

=Timeline=

I strongly believe in Agile methodology and stick to it. Agile practices make all the more sense in open source projects like LimeSurvey, where the features are added/modified on the basis of the user's feedback. We should have the ability to swiftly modify the product or feature on the basis of the feedback from the community.

The following gives a brief idea about the iteration cycle that I'll be following during the development period:

The iteration cycle is of 1 week's duration starting Tuesday night
 * Tuesday night: developer's meeting: Discuss the deliverables for the next iteration with mentor and community. Also elaborate on the functionality and implementation methodology.
 * Wednesday morning to Saturday night: Work on the implementation
 * Saturday night: Check in the implemented functionality. Send an email to the LimeSurvey developers mailing-list announcing the new features implemented in this iteration.
 * Sunday - Monday: The developers and enthusiastic users can play with the feature and send feedback, bug report etc.
 * Tuesday (all day): fix bugs, tweak UI etc depending on the feedback. Also document everything while its still fresh in my mind.
 * Tuesday night: developer's meeting: disucss last iteration. Discuss the deliverables for next iteration.

I spend over 12 hours idling on the LimeSurvey's IRC channel. I can use IRC as a medium to collect feedback from the users about the features.

Week and corresponding activities:

 * Community Bonding Period starts: April 26
 * I've been associated with limesurvey since last one month and by the time the Community bonding period starts I'd have over 2 months of experience with limesurvey.
 * I have quite a decent understanding of the mechanics of LimeSurvey.
 * I spend quite some time on the IRC and hence feel that I'm pretty much well versed with who takes care of what in LimeSurvey project.
 * I feel confident enough to start cracking on the project without spending any time explicitly on Community bonding.

Week 1 (April 26)
Brainstorm proposal with peers and mentor: Discuss the Database schema, implementation details, technologies etc. and collect feedback. At the end of the first week, create a dedicated page for upload question type on limesurvey's tikiwiki. It will enlist all the deliverables for this project along with the timeline, so that we can keep a track of the project's progress. This page will be updated with status of the project at the end of each iteration. (1 week)
 * Status: Completed
 * Work done this week:
 * DB schema discussed and finalized
 * Dedicated wikipage for FUQT created on tikiwiki
 * introduction mails on mailing list, posts on forum

Week 2 (May 3)
Community Bonding: Had my Bachelors thesis defence seminar, and hence couldn't spend time working on the LS FUQT. (1 week)
 * Status: Completed

Week 3 (May 10)
Implement Simple Uploader: one can upload multiple files to the system using the file upload feature. This iteration will be more or less dedicated to backend implementation in the database. Modify survey creation to be able to add fields to survey_x table for file metadata information. (1 week)
 * Status: Completed
 * Work done this iteration:
 * started work on the backend
 * implemented the database related changes
 * added facility to create form for file upload question type
 * r8741: FUQT-Work in progress: added facility to upload files using simple browser uploader
 * r8746: FUQT-Work in progress: added support to save uploaded file metadata as a JSON string into the database

May 24: Coding period starts (as per official GSOC rules)

Week 4 (May 17)
Implement Advanced Browser Uploader: show progress bar, cancel upload and other cool GUI features. The surveyee can preview the uploaded file before submitting it While previewing, he has the option to edit the file's metadata like title, comments and even delete it ! (2 weeks)
 * status: Completed
 * Work done this iteration:
 * added facility to save uploaded files into the file system (r8746)
 * added facility to save responses from surveyee into the database
 * added facility to save the file metadata as JSON string
 * added a shadowbox to display the tabs for uploading files
 * added the feature to upload multiple files on the "From Computer" tab (r8783)
 * added feature to edit meta-data of uploaded files on the "Gallery" Tab
 * added feature to delete uploaded files on the "Gallery Tab"
 * r8768: Fixed an issue with survey activation and DB field creation for File Upload Question Type
 * r8783: GSoC-FUQT Facility to upload files using Advanced File Uploader - implemented Upload from Computer (Work in progress)
 * r8786: removing colorbox and sTabs plugin. will be replaced by jQueryUI modal dialog and jQueryUI Tabs respectively
 * r8788: GSoC-FUQT Advanced File Uploader: added support for jQueryUI Tabs and jQueryUI modal dialog.
 * r8803: GSoC-FUQT Advanced File Uploader: added facility to pass file metadata from uploader frame to survey form as a JSON string. Added facility to save the file metadata into the database.
 * r8807: GSoC-FUQT Advanced File Uploader: added facility to view uploaded files in the gallery tab
 * r8815: GSoC-FUQT Advanced File Uploader: added facility to edit and/or delete uploaded files in the gallery tab.

Week 6 (May 31)
Implementation in Data Entry screen, response browsing and response editing. No advanced online browsing of images, videos etc yet (1 week)
 * Status: Completed
 * Work done this iteration:
 * r8825: GSoC-FUQT Response Browsing: Added facility to browse responses for file upload question type.
 * r8829: GSoC-FUQT Added support to edit and update file metadata.
 * r8831: GSoC-FUQT Added support for uploading files and inserting associated metadata using the data entry screen.

Week 7 (June 7)
Zipper cum downloader: The admin can selectively download files using the browse screen on the admin panel (1 week)
 * Status: Completed
 * Work done this iteration:
 * r8833: GSoC-FUQTAdded support for downloading uploaded files in zipped format from the browse screen

Week 8 (June 14)
Implementation in printable survey and Advanced Uploader bug fixing (1 week)
 * Status: Completed
 * Work done this iteration:
 * Printable Survey modified to include the File Upload Question Type

Week 9 (June 21)
Vacation : will be on a vacation from June 20th to June 30th and hence won't be working on the project(10 days)

Week 10 (July 1)
Bug fixing and tweaking the Advanced File Uploader
 * Status: Completed
 * Work done this iteration
 * r8886: [GSOC-FUQT] Advance File Uploader: AJAXifying the file uploader
 * r8887: [GSOC-FUQT] Fixing a couple issues with browse screen for FUQT
 * r8888: [GSOC-FUQT] Fixing dataentry and adding support for file upload question type for Printable survey

Week 11 (July 5)
Fix AJAX Uploader(2 weeks)
 * Status: Completed

July 16 Mid-term evaluation

Week 13 (July 19)
Fix bugs in AJAX uploader, and update the browse, zip and download, data entry and edit screens (2 week)
 * Status: completed
 * Work Completed this week:
 * Modified template to include file valid message
 * implemented basic file upload
 * Checking file uploaded via the basic file upload method for size/file-extension etc
 * Show popup if the uploaded file does not fit the criteria

Week 15 (Aug 2)
Implementation in statistics
 * We'll be showing statistics related to the file upload question type such as
 * Total number of files uploaded
 * Number of respondents who uploaded at least one file (with the inverse being the number of respondents who didn&rsquo;t upload any)
 * Average number of files per "respondent" (probably this average should only count those who uploaded at least one file)
 * Average size of file per respondent
 * Summary/count of file types (ie: 37 jpg, 65 gif, 12 png)
 * Total size of all files (useful if you&rsquo;re about to download them all)
 * "smallest file size", "largest file size", "median file size"
 * Status: Completed
 * Work completed this week:
 * Added support in statistics for File Upload Question Type

Week 16 (Aug 9)
Adding hooks for third party tools

Add third party hooks to the uploader as well as admin panel so that the uploaded files can be tweaked/modified/edited by the surveyee(during upload process) as well as admin (post-upload).

We invite the community to suggest third party tools that can be useful for the admin and/or surveyee. One such tool can be image cropping, resizing etc. Other example would be an online video format converter. (2 weeks)
 * Status: Incomplete

August 16: Firm Pencils Down

Week 17 (Aug 16)
Implementation in Remote Control and advanced browsing implementation: Implement the File Upload Question Type in the remote control so that it can be used on an external application. Adding support for browsing images, watching videos, viewing text files and listening to audio files online. (1 week)
 * Status: Incomplete

August 20: Final Evaluation

= TODO=
 * no GETs and POSTs in delete.php and upload.php
 * use random filenames
 * Have groups of file types that can be activated by checkboxes, something like: IMAGES [ ]png, [ ]jpg, [ ]gif etc (a 'lil javascript magic is required)
 * the user should be able to move the dialog box (fix the vanishing buttons)
 * Edit panel should be able to delete uploaded files
 * have special icons showing the file type (PDF, csv, word) (grab a decent file type icon set off the web)
 * get the simple mode up and running (uncomment it!)
 * uploading file with the same name
 * if there are no files uploaded yet, open the file select right away (unable to trigger button.click - google it)
 * Internationalization
 * cancel button along with save and exit -> with confirm warning (complicated, will have to put delete/upload actions in queue, and the queue must be processed only when the dialog is closed ! discuss !)
 * add a cancel upload button to handle large files (how to do this one? tweak the ajaxuploader.js file? check !)
 * on activation, check if file already exists, and give error/warning
 * Option to configure upload from URL
 * client side validation check for basic upload
 * remove code duplication, move all code for extracting metadata in browse, export etc to common.php
 * title/comment -> option to make them mandatory
 * Fix passthruReplace error during survey Runtime
 * implement statistics for PDF and Excel as well
 * Clear and Exit button should delete the files uploaded by AJAX uploader.
 * have a link at each uploaded file "edit file data" which just opens the uploader again so the user can edit the files
 * show error when the uploaded file is larger than php max_file_size limit
 * browse screen individual file download - rawurlencode clips the rest of the filename
 * fix dataentry
 * file uploaded to upload/tmp/ during preview must be deleted at the end of the preview
 * add message to inform the user that x number of files more need to be uploaded/at max y more files can be uploaded (text/graph?)
 * browse individual file screen: downloading files with spaces in their names
 * change delete icon to use pre existing one (images/delete.png)
 * deleting the files on the browse screen should delete the files in the filesystem
 * add download button on individual response browse screen
 * facility to download just one file in browse response screen
 * replace "download these files" -> "Download all files in this response as ZIP file' or siimilar"
 * remove the confirmation on save and exit
 * rename uploadedfiles.zip to something more meaningful -> response_{response_id}
 * if the user wants to escape the dialog, warn him and let him close the dialog even if he hasn't uploaded the minimum required number of files
 * Add flash notice to the modal dialog which provides feedback to activities such as file deleted, file uploaded etc.
 * Use Random filenames while saving
 * The administrator must be able to preview the question
 * Show the Basic as well as Advanced version on the survey page
 * Option to set the visibility of title/comment
 * unlink doesn't work on windows m/c
 * remove the mandatory settings for file upload questions
 * show uploading progress (by using the extra tailing dots on the upload button)
 * Add a close button on top right (X)
 * Confirm alert on clicking "Save and Exit"
 * decode special characters on browse screen
 * replace surveyee by participant everywhere
 * Download button doesn't download all the files
 * fix the iframe trouble -> kill the old frame? (think!)

Issues mentioned by Thibault

In "upload.php"

==
 * The security restrictions (maxsize, authorized extensions) are taken from the POSTed arguments: this means that you trust what was posted in the form ==> You should never trust this cause it is so easy to change the hidden field before submitting a form.
 * Solution: Instead these restrictions should be recorded in sessions for each FUQT instances.
 * Anyone can call the upload.php script even people not answering a survey.
 * Solution: the upload script should be only available to people having a valid LimeSurvey session
 * in upload.php it seems that you planned to have a different behaviour for preview mode and for activated surveys 'You have an if ($preview) ... else ... structure). However the actions for both modes are exactly the same.

I've tested this and the bahaviour is the following:
 * in activated survey mode: the file is uploaded to upload/tmp and then after submit it seems to be moved somewhere else (but I don't know exactly where for now)
 * in preview mode the file is uploaded to upload/tmp and then NEVER deleted which could cause a directory saturation. We must find a solution in order to delete Files uploaded to the tmp directory in preview mode.

In delete.php

=
 * same issue: you shouldn't be able to access this script withpout a valid LimeSurvey session
 * you shouldn't pass the filename to be deleted from the form, because anyone could delete any file in the tmp directory! Infact we want only the files uplaoded by this user to be removable.
 * Solution: I propose that the uploaded filenames are recorded in SESSION for each FUQT. Then the delete form only send an index of the file to be deleted.

= Checklist=
 * File type validation
 * saving the files into the file system
 * deleting the files from file system, in case of "exit and clear"