This documentation contains some information about SparseLOGREG, an
efficient implementation of the sparse logistic regression algorithm discussed
in the technical report,
S. K. Shevade and S. S. Keerthi (2002), A Simple and Efficient Algorithm for Gene Selection using Sparse Logistic
Regression,
Technical Report No. CD-02-22,
Control Division, Department of Mechanical Engineering,
National University of Singapore, Singapore - 117 576.
It is a good idea to read the report mentioned above before using the program described below.
Download SparseLOGREG.tar.gz. Unzip and untar this file.
gunzip SparseLOGREG.tar.gz tar xvf SparseLOGREG.tarWhen you tar xvf, you will get a directory called SparseLOGREG. This directory must contain two directories called bin and datasets. The former contains the source programs while the latter contains some sample datasets.
There are two main source files, "FindCounts.c" and "FindGenes.c" in the sub-directory bin.
Create the executables of these files FindCounts and FindGenes,
by executing the following commands:
cd SparseLOGREG/bin make allIf this doesn't work, you may have to edit the Makefile in the bin directory to adjust the compiler settings.
Note that some of the programs, nrutil.c, nrutil.h, ran1.c, and sort.c are taken from the Numerical Recipes in C software library. These minor routines are used by SparseLOGREG to handle memory allocation, deallocation, random number generation and sorting.
Both the executable programs read the input from the file, "in.txt". The syntax of this file is given below. Every line in this file begins with a string (without any blank character) followed by the actual inputs. The users are expected to specify the inputs in the same order as given below.
The first line of this file denotes the number of features with nonzero relevance count. This is followed by the the feature number and its relevance count, arranged in descending order of relevance count.
We suggest the user not to alter this file since it forms the input for the program FindGenes.
If the user only wants to know the feature rankings and say, use the top ranked features for some other purposes, he/she can do it by running only the program FindCounts and extracting the required number of features from FeaturesFile.
The file gives the following: the average validation error for every feature added; and the final classifier design, with the feature number followed by the corresponding weight, and the value of the final bias term. See the technical report for the details.
Note that the file "in.txt" should reside in a directory from which the commands FindCounts and FindGenes are executed. One sample of this file is given in SparseLOGREG/datasets/colon directory.
../../bin/FindCounts ../../bin/FindGenes
You should get the FeaturesFile containing the ranked features and the ClassifierFile which contains the final classifier.
In case of any problems associated with this software, send me an e-mail.