SDF or Structures Data File is a common file format developed by Molecular Design Limited to handle a list of molecular structures with associated properties. The file format has been published (Dalby et al., J. Chem. Inf. Comput. Sci. 1992, 32: 244-255).
The purpose of this SDF toolkit is to provide functions to read and parse SDFs, filter, and add/remove properties. It can also read comma separated value (CSV) tables which contain new fields to be added to the SD file. A typical application is to add calculated Log P values or biological data exported from a spreadsheet. The new SDF can thereafter be displayed with the new data fields with e.g. ChemFinder, the CACTVS system browser csbr, and probably many other programs.
The SDF toolkit is written in Perl 5, a free, widely available, scripting language.
One useful application (at least for us) that has been written with this toolkit: "add_prop_sdf". This script reads an SDF, adds properties from a CSV file and prints out the new SD file. No GUI here, it's a batch mode program. Also of interest is the script select_sdf which can be use to extract specific records of an SDF. Random selection of records from an SDF can be made with the help of the gen_rnd script.
The SDF_toolkit is freely available under the GNU public license granted by the author. The U.S. Government imposes no license restrictions of its own on the toolkit.
As of April 2007, version 2 of the SDF_toolkit contains 17 new tools (yielding a total of 31 tools). Click here for more details. The SDF parser is more robust and can correct some errors in the input file. The toolkit has been used extensively by the author to prepare datasets for QSAR studies. It is, however, still evolving and is not yet available for download on this site. If you would like to obtain a copy, please contact the author.
You'll need a recent version of Perl (5 or above) installed on your system. Unfortunately, I have tested the toolkit only on Unix systems (Linux, IRIX and OpenStep). Thanks to wide availability of Perl, one can expect that it would be easy to run the toolkit on other platforms such as Mac. It has been ported to Win32. The SDF toolkit does not contain any features specific to a particular platform.
The toolkit is distributed as a tar archive compressed with compress; or, more recently, as a .tgz file. To extract an archive in .tar.Z format, use the following standard command:
uncompress < sdf_toolkit.tar.Z | tar xfv -
A) Using your shell, change the working directory to the installation directory and type the command:
perl test_sdf_fields < sdf_fields.txt
If the output is:
fields => ARRAY(0x205c0)
> <Formula> (11)
> <BOILING.POINT> (MD-08974) FROM ARCHIVES
> <Formula> (11)
> <MolWeight> (11)
go to step C. If not, there was a problem with the installation, and we go to step B.
B) Check if Perl is correctly installed on your system. In your shell, type
If you get a message like:
perl: Command not found.
you are not lucky and you'll need to install Perl 5 or change your shell's $PATH variable. See your system administrator.
If you get a message like:
This is perl, version 5.001
you are OK. Pay attention to the version number: it must be a number higher than 5. On some Unix system (e.g. IRIX), one has to type "perl5" to get the right version. If your version is < 5, you'll need to install a newer version. To find downloadable distributions of Perl, in both binary executable and source code format, you can go to http://www.perl.com.
C) Your Perl installation seems to be OK. The toolkit comes with a small test suite. To run it, type
The toolkit is OK if the make command does not stop with an error message.
D) To get a better idea of the toolkit's capabilities, type
perl add_prop_sdf -help
The output of this command describes a complete example with detailed explanations.
E) The script add_prop_sdf can be installed on a Unix system in such a way that it can be run from any directory. Type
to know where your Perl executable resides. A typical output is:
The SDF toolkit contains a set of packages (*.pm) that needs to be installed in a standard directory. Which directory to use will depend on your Perl installation. To find out, type:
perl -e 'print join("\n", @INC), "\n"'
This command prints a list of directories where the Perl packages can be installed. On my system, the output is one directory /usr/lib/perl5. If you wish, the packages can be installed in a non-default directory. Copy all the *.pm files into the directory you've chosen.
Edit the file add_prop_sdf. The first line shows which Perl executable is going to be used to interpret the script. The default is #!/usr/bin/perl. This first line must match the full path obtained above.
If the package files (*.pm) have been installed in a non-default directory, edit the first line of the script and add an -I option for the package files installation directory, such as:
#!/usr/bin/perl -I /home/brunob/lib/perl
(assuming here that the *.pm files have been copied to /home/brunob/lib/perl).
Copy the edited file add_prop_sdf into a directory which your shell reads to find programs (type "echo $PATH" to find the list of directories)
Note: Alternatively, one can set the environment variable PERL5LIB to point to the directories where the SDF_toolkit is installed. The Unix command to set an environment variable is setenv.
The SDF toolkit is quite strict about the syntactical correctness of the input files. Some programs export SD files that are not totally compliant with the published standard (Dalby et al., 1992). In some cases, the SDF toolkit might generate relatively cryptic error messages.
Using big CSV tables can consume large amounts of memory.
On some Linux systems (e.g. Red Hat 6.0), the test suite fails because 0.0000 values are written as -0.0000.
Dalby A, Nourse JG, Hounshell WD, Gushurst Aki, Grier DL, Leland BA, Laufer J. Description of several chemical-structure file formats used by computer-programs developed at Molecular Design Limited. Journal of Chemical Information and Computer Sciences 32:(3) 244-255, May-Jun 1992.
This toolkit was begun by Bruno Bienfait, Ph.D., while he was a postdoc in the Laboratory of Medicinal Chemistry at the National Cancer Institute, Bethesda, MD, USA. He has continued to work on it ever since. You can contact him here. Comments, criticisms, suggestions and bug reports are welcome.
Last Update: 2010-01-14