PDL-Stats statistics modules in Perl Data Language
This is a collection of statistics modules in Perl Data Language, with a quick-start guide for non-PDL people.
They make perldl--the simple shell for PDL--work like a teenie weenie R, but with PDL threading--"the fast (and automagic) vectorised iteration of 'elementary operations' over arbitrary slices of multidimensional data"--of procedures including t-test, ordinary least squares regression, and k-means clustering.
Of course, they also work in perl scripts, which makes the package an ideal tool for statistical natural language processing--you get all the text processing power of perl as well as the fast number crunching capabilities of a data language.
Loads modules named below. Pod includes quick-start for non-PDL people.
Basic statistics and related utilities (standard deviation, variance, correlation, t-test, etc.).
Parameter estimations and probability density functions for distributions.
General linear modeling methods (multiple linear regression, factorial, repeated measures, and mixed model anova, etc.) and logistic regression.
Classic k-means cluster analysis.
Basic time series analysis functions.
PDL interface to GSL Cumulative Distribution Functions.
Perl Data Language. Preferably installed with a Fortran compiler. A few methods (logistic regression and all plotting methods) will only work with a Fortran compiler and some methods (ordinary least squares regression and pca) work much faster with a Fortran compiler.
The recommended PDL version is 2.4.8. PDL-2.4.7 introduced a bug in lu_decomp() which caused a few functions in PDL::Stats::GLM to fail. Otherwise the minimum compatible PDL version is 2.4.4.
GNU Scientific Library. This is required by PDL::Stats::Distr and PDL::GSL::CDF, the latter of which provides p-values for PDL::Stats::GLM. GSL is NOT required for core PDL::Stats modules to work, ie. Basic, GLM, and Kmeans.
PDL-Stats currently uses PGPLOT for plotting. There are three pgplot/PGPLOT modules. This has led to much confusion upon installation. First there is the pgplot Fortran library. Then there is the perl PGPLOT module, the perl interface to pgplot. Finally there is PDL::Graphics::PGPLOT, which depends on pgplot and PGPLOT, that PDL-Stats uses for plotting.
If you are using debian Wheezy or Ubuntu 12.10 and above, you can easily install the package by
sudo apt-get install libpdl-stats-perl
Or, you can use cpan
sudo cpan PDL::Stats
You can also follow the standard perl module installation method in *nix environment and build it from the source,
tar xvf PDL-Stats-xxx.tar.gz cd PDL-Stats-xxx perl Makefile.PL make make test sudo make install
Thanks to Sisyphus, Windows users can download and install the ppm version of PDL-Stats and all dependencies using the PPM utility included in ActiveState perl or Strawberry perl. You can also get the PPM utility from CPAN.
ppm install http://www.sisyphusion.tk/ppm/PGPLOT.ppd ppm install http://www.sisyphusion.tk/ppm/PDL.ppd ppm install http://www.sisyphusion.tk/ppm/PDL-Stats.ppd