Serving CSV data in CGI mode to populate a jQuery DataTable


First published: May 5, 2011
Last modification: Nov. 29, 2011

In this article, we will expose a concrete example on how to populate a <table> by using the DataTables jQuery plugin on the client side and data served from a CSV file on CGI mode.


The context

FAF Datatables

During the implementation of the new FAF-Drugs server, we needed to find a way to display a columns sortable table with up to 50.000 rows. Letting Firefox displays this page did not appear to be the good idea -- except if you want your browser to use 2Gb RAM and wait 2 minutes for each column sorting ... -- Thus, we needed a paginated view of sorted data. Considering that data are stored on the server side, the obvious solution appears to use the DataTables jQuery plugin. The originality of this post relies on fact that data are not stored in a (SQL) database but in a csv file and served on-demand through a python CGI script.

Client side: DataTables integration

This part is inspired from the DataTables example page.

<HTML> part

In your HTML page, you will need to load required javascript plugins and styles in the <header> section:

1
2
3
4
5
6
<style type="text/css" title="currentStyle">
  @import "/media/css/demo_page.css";
  @import "/media/css/demo_table.css";
</style>
<script type="text/javascript" src="/media/js/jquery.js"></script>
<script type="text/javascript" src="/media/js/jquery.dataTables.js"></script>

Your table which will be populated should look like:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
<table class="display" id="results">
  <thead>
    <tr>
      <th>#</th>
      <th>ID</th>
      <th>MW</th>
      ...
      <th>Status</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td colspan="27" class="dataTables_empty">Loading data from server</td>
    </tr>
  </tbody>
  <tfoot>
    <tr>
      <th>#</th>
      <th>ID</th>
      <th>MW</th>
      ...
      <th>Status</th>
    </tr>
  </tfoot>
</table>

jQuery part

Now make your #example <table> a dataTable instance include this js code snippet to your page.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
$(document).ready(function() {
  var oTable = $('#results').dataTable( {
    "bProcessing": true,
    "bServerSide": true,
    "sAjaxSource": "http://foo.bar/cgi-bin/csv.cgi?url=http://foo.bar/tmp/spam.csv",
    "sScrollX": "100%",
    "sPaginationType": "full_numbers",
    "fnRowCallback": function( nRow, aData, iDisplayIndex, iDisplayIndexFull ) {
                       switch ( aData[26] ) {
                         case "Accepted":
                           $(nRow).addClass("gradeA");
                           break;
                         case "Rejected":
                           $(nRow).addClass("gradeX");
                           break;
                         default:
                           break;
                       }
                       return nRow;
                     }
  } );
} );

So let's explain datatables' options we used in this case:

  • bProcessing: displays a processing indicator while loading data.
  • bServerSide: data are processed on the server side.
  • sAjaxSource: the url to the script that will receive the request and send json data. Note that datatables will send all required parameters for you to the server, you only need to give the url and additionnal request parameters here.
  • sScrollX: as your table is larger than our page, this option makes your table scrollable along the x-axis and the table width is 100% of the allowed size.
  • sPaginationType: pagination style, nothing crucial.
  • fnRowCallback: we use this callback feature to stylize rows with a particular state (Accepted or Rejected) dicted by the 27th column. If you are not familiar with callbacks, note that this function is called when AJAX request is done.

For the full list of datatable object initialization options, please refer to http://www.datatables.net/usage/

Server side: serve CSV data

When loading data into the table, datatables instance will make AJAJ calls to the script http://foo.bar/cgi-bin/csv.cgi. This script should be able to parse datatables queries and serve data according to DataTables API. In our case, instead of using a database, we had data stored in a csv files (generated for each FAF-Drugs job). We decided to use existing data (up to 50.000 rows), and it appears that Python's CSV module is really surprising by its efficiency: the whole process takes only about 300 ms. The CGI python code is the following:

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
import time

# starting time
st = time.time()

# ------------------------------- import
import csv
import json
import sys
import logging
import operator
import urllib2
import os.path
import string
import re
import cgi


# ------------------------------- vars
PARMS = {
    "furl":["url","str"],
    "sidx":["iSortCol_0","int"],
    "start":["iDisplayStart","int"],
    "rows":["iDisplayLength","int"],
    "sord":["sSortDir_0","str"],
    "echo":["sEcho","int"],
    "rexp":["sSearch","str"],
    }

# The 27 columns csv file types
CTYPES = (int,str,float,float,float,float,int,int,int,int,int,int,int,int,int,int,int,int,float,int,float,str,str,int,float,int,str)


# ------------------------------- config
LOG_FILENAME = '/tmp/csv2json.log' # The log file
logging.basicConfig( filename=LOG_FILENAME, level=logging.DEBUG )


# ------------------------------- starts
logging.info( ":"*50+" %s" % time.strftime("%c") )
logging.info( "Job started" )


# ------------------------------- core
def cast(values, types=CTYPES):
    """
    Cast list data according to predefined types

    Note that values and types should have the same length!

    @param values: values to cast
    @param types : types for values
    @return: casted values
    """
    return [t(v) for t, v in zip(types, values)]

def format(values, types=CTYPES):
    """
    Float values are converted to strings with two decimals.

    ADVISORY: only use this function for final data to jsonify since
    data type is lost

    @param values: values to cast
    @param types : types for values
    @return: formated values
    """
    return ["%.2f"%v if t == float else v for t,v in zip(types, values)]

def search(values, rexp):
    """
    Float values are converted to strings with two decimals.

    ADVISORY: only use this function for final data to jsonify since
    data type is lost

    @param values: values to cast
    @param types : types for values
    @return: formated values
    """
    for v in values:
        if re.search(rexp,str(v)):
            logging.debug( "REGEXP (%s) Match for %s" % (rexp,str(v)) )
            return True
    return False

def csv2json( furl, sidx, start=1, rows=50, sord="asc", echo=0, rexp=None, pfn=None ):
    """

    csv2json

    * Parse www csv file
    * Sort data according to the input key
    * Extract a subset of data
    * Encode the subset in JSON
    * Print the data

    @param furl : csv file url (can be a local path)
    @param sidx : column used to sort data
    @param start: data chunck/page number (default: 1)
    @param rows : how many values do we display (default: 50)
    @param sord : sort order, set to desc for reversed (default: True)
    @param echo : simple internal iterator used by datatables for rendering
    @param pfn  : pickle file name -- only used if PICKLE = True (default: None)

    @return:None
    """

    logging.info( "Input file is: %s"%furl )

    # read csv data
    t = time.time()
    c = csv.reader( urllib2.urlopen(furl), delimiter=';' )
    logging.info( "Data loaded in %8.3f s" % (time.time() - t) )

    # First raw corresponds to parameters names
    k = c.next()
    logging.info( "Input csv keys are " + str(k)  )

    # filter data (search case)
    if rexp != "None": # rexp is casted to str
        t = time.time()
        c = filter(lambda x: search(x,rexp), list(c))
        logging.info( "Data filtred in %8.3f s" % (time.time() - t) )

    # cast data -- note that we need to sort the whole list for
    # sorting consistency
    t = time.time()
    c = map(lambda x: cast(x,types=CTYPES), list(c))
    logging.info( "Data casted in %8.3f s" % (time.time() - t) )

    # sort data according to the item rank
    t = time.time()
    r = True if sord == "desc" else False
    s = sorted( c, key=operator.itemgetter( sidx ), reverse=r )
    logging.info( "Data sorted in %8.3f s" % (time.time() - t) )

    # select data range
    t = time.time()
    max = len(s)
    ff = start if start else 0
    tt = ff+rows if ff+rows < max else max
    logging.info( "Data range is: %d-%d" % (ff,tt) )
    logging.info( "Data selected in %8.3f s" % (time.time() - t) )

    # format floats to avoid X decimal
    t = time.time()
    f = map(lambda x: format(x, types=CTYPES), s[ff:tt])
    logging.info( "Floats formated in %8.3f s" % (time.time() - t) )

    # format data
    j = {
        "sEcho":echo,
        "iTotalRecords":str(len(s)),
        "iTotalDisplayRecords":str(len(s)),
        "sColumns":string.join(k,","),
        "aaData":f
        }

    # json data
    t = time.time()

    # This first line header allows to execute XMLHttpRequest from
    # anywhere, it inactivates web browsers CORS (Cross-Origin
    # Resource Sharing) -- It has been commented for security
    # considerations
    # print "Access-Control-Allow-Origin: *"
    # print "Content-type: application/json\n"
    print json.dumps(j)
    logging.info( "Data encoded (json) in %8.3f s" % (time.time() - t) )
    logging.debug( "Data encoded (json) content %s" % str(json.dumps(j)) )

    # ------------------------------- ends
    logging.info( "Elapsed time %8.3f s" % (time.time() - st) )

    return


# -- Form --
def formErr( msg, debugMsg = "", debug = False ):
    """
    Print an error message (stdout/HTML) and exit

    @param msg: the error message
    """

    print "Content-type: text/html\n\n"
    print "<b>An error occured while processing your request: %s!</b>" % str(msg)

    if debug:
        print "<p>"
        print debugMsg
        print "</p>"

    sys.exit(1)

def checkForm( theForm, parms ):
    """
    Check each form parms and fill the dict ligand

    @param theForm: input form
    @param parms: input form parameters keys
    """

    for aParm in parms:
        try:
            if theForm.has_key( PARMS[aParm][0] ) and theForm[ PARMS[aParm][0] ].value:
                val = str(theForm[ PARMS[aParm][0] ].value)
            else:
                val = None
            logging.debug("checkForm: aParm (%s) val (%s)" % (aParm,val) )
            # Set var according to dict keys and cast values
            exec "%s=%s('%s')" % ( aParm, PARMS[aParm][1], val)
        except:
            formErr( "wrong value for parameter %s" % aParm )

    return furl,sidx,start,rows,sord,echo,rexp

# -- Main --
if __name__=='__main__':

    # Parse the input request
    form = cgi.FieldStorage()

    # Check submitted parms
    furl, sidx, start, rows, sord, echo, rexp = checkForm( form, PARMS.keys() )

    # Process
    try:
        csv2json( furl, sidx, start=start, rows=rows, sord=sord, echo=echo, rexp=rexp )
    except:
        formErr( "contact an administrator" )

I think the code is sufficiently commented to stop boring you here! Do not hesitate to give me feedback about this post.


Categories: Web Dev

Associated tags

ajax , cgi , jquery , python


Comments (0)

Comments are not allowed for this article.



Latest posts


Feb. 11, 2012
Check-out my new home page

A few weeks ago I released my new home page to demonstrate an application of the jQuery-parallax plugin. Please check this out!


July 27, 2011
Marie Curie Actions Calls - FP7-PEOPLE - LaTeX Template

For those who are concerned in writing Framework 7 Proposal (FP7), but dislike using MS Word, Open/Libre-Office or any WYSIWYG this resource is for you!


July 10, 2011
Make a sliding navigator for jQueryTOOLS Scrollable

jQueryTOOLS scrollable is an incredible tool, but my case I needed to have as many items as I wanted in the navigator plugin. The idea of this post is to ...


June 10, 2011
GGMM 2011 3D Printed Trophy

The past week I went to the 2011 edition of the GGMM meeting. At this occasion, the organizing comity asked the RPBS platform to ...