Reading GCMS netCDF files

Published by WebMaster on

After spending a few days writing a tool to read GC-MS data files into LabVIEW, I thought I’d publish what I found – so others do not have to repeat my mistakes!  The steps I describe here will also be useful to people who are using languages other than LabVIEW’s G to understand how to get GC-MS data from netCDF files.

I was trying to write a tool that we could use to batch process GC-MS data from a teaching laboratory to present grouped reports to the students, in a format that matched a similar output from the GC-FID systems we also have.  The aim was to reduce some of the burden on the technicians who were having to format the reports by hand – extremely dull and time consuming.

I could describe all the different routes I tried, but I suspect that just giving the method I have found that works best will be more useful for people who, like me, just want to be able to pull GCMS data from CDF files.

Step 1:

Convert GC-MS data files to ANDI/AIA/NetCDF interchange format

Our GC-MS systems are from Agilent.  The Agilent software can batch convert the Agilent format data to ANDI/AIA/NetCDF interchange format.  This is a common format used to move GC-MS data between different software regimes. Go to File > Export to AIA.  Then, choose a folder containing the Agilent .d files (well, folders actually) and choose those that you want to convert to AIA format.

Step 2:

Update the Agilent generated CDF files to a newer version

The Agilent netCDF files (AIA/ANDI files are a special form of netCDF files) are written in a practically antediluvian NetCDF  version – 2.3.2.  This version predates 1994!  The current version (as I write) is 4.5.0.  The benefit of the newer versions of NetCDF files is that they are written in hdf5 format. (wiki)

Hdf5 files can be opened in LabVIEW using the excellent h5labview toolkit, whereas the old 2.3.2 version netCDF files cannot.  The same problem will be encountered by others trying to read the Agilent generated cdf files as hdf5 format, using other languages and toolkits – e.g. the h5py toolkit for Python.  So, the cdf files generated by the Agilent software must be converted to the new cdf file structure before they can be opened as hdf5 files.  In fact, it was Martijn Jasperse, the author of the h5labview toolkit, who recommended this step and how to do it. So, a huge thank you to Martijn!

To undertake this conversion, first you need to install netCDF.  I was working on a Windows system – you can download an prebuilt version netCDF for Windows from here.  Install the netCDF4 version suitable for your system – I installed the 64bit version without DAP.

In the netcdf folder (probably in your Program Files), you will find a subfolder called “bin”:

There are two tools in here of particular note – I’ve highlighted them above.  ncdump.exe is a small program that will take a netcdf file and dump its contents into a text file so it can be read by a human. This can be useful if you just want to quickly look at a file and the information is contains.

More importantly for my being able to programmatically extract the data from the GC MS files, nccopy.exe is a small program that will copy a netcdf file – and most importantly, the copy can be updated to the most recent format so it can be read as an hdf5 file.

You can control nccopy from the command prompt in Windows, or by using the System Exec in LabVIEW, using this syntax:

full path to nccopy.exe” -k 3 -u “full path to the input CDF file” “full path to the output CDF file

The “-k 3” command is telling nccopy to update the output file to the new version.  The “-u” command is telling nccopy to convert unlimited dimensions to a fixed size in the output file.

A simple LabVIEW wrapper for nccopy can be written like this:

Note- the above image is a LabVIEW snippet – just click-and-drag it into a LabVIEW block diagram.

In my case, I had this in a loop so that it would process every *.CDF file in a folder.

So, now you will have your GCMS files exported to CDF format and updated to a version of netCDF that uses hdf5 file format.

Step 3:

Reading the data

Now the data is in an hdf5 format netCDF file, it can be read just like any other hdf5 file.  To access these in LabVIEW, I use the h5labview toolkit.

To get this to work – you first need to download and install HDF5 1.8.18.  Choose the right one for your system – 32 vs 64 bit.

Then you can install h5labview.  I have tended to find that I have to manually install it using these instructions.  Additionally, I always copy the hdf5.dll, szip.dll and zlib.dll files found in the hdf5 installed folders to the resource folder of the LabVIEW directory.

Net CDF files contain a number of different sets of data.  Only some of these are needed if you just want access to the GCMS data.  The retention time of each mass spectrum is stored as an array in the scan_acquisition_time set.

All the MS data is separated into the mass_values and intensity_values arrays.  Each of these arrays is one dimensional with the values for each spectrum following the last. Each spectrum is stored as centroided values only – and only storing those masses (and their intensities) for which an intensity was detected.

The scan_index and point_count arrays hold the information you need to pull out individual mass spectra from the mass_values and intensity_values arrays: scan_index will give you the starting index and point_count will give you the number elements to read.

If you have h5labview installed then this will help you get the information you need from the files:

Note- the above image is a LabVIEW snippet – just click-and-drag it into a LabVIEW block diagram.

We wanted to be able to extract selected ion chromatograms (XiC) from batches of files, integrate and align the peaks sharing certain characteristic masses, and export the results as a single table in a csv file.  So, my final solution looked like this:

If you have something specific you want to pull out of your GC-MS data – get in touch and we can put something together for you.