io module

Module containing functions to parse World Data Centre (WDC) files.

Part of the MagPySV package for geomagnetic data analysis. This module provides various functions to read, parse and manipulate the contents of World Data Centre (WDC) formatted files containing geomagnetic data and output data to comma separated values (CSV) files. Also contains functions to read output of code used for the COV-OBS magnetic field model series by Gillet et al. (links below).

magpysv.io.ae_parsefile(fname)[source]

Load a WDC-like format AE file and place contents into a dataframe.

Load a file of AE (Auroral Electroject) index hourly data in the format distributed by the Kyoto WDC at http://wdc.kugi.kyoto-u.ac.jp/dstae/index.html and extract the contents.

Parameters:fname (str) – path to a WDC-like formatted AE file.
Returns:dataframe containing hourly AE data. First column is a series of datetime objects (in the format yyyy-mm-dd hh:30:00) and second column contains theAE values at the specified times.
Return type:data (pandas.DataFrame)
magpysv.io.ae_readfile(fname)[source]

Wrapper function to call ae_parsefile and wdc_datetimes.

Parameters:fname (str) – path to a AE file in Kyoto WDC-like format. Assumes data for all years are contained within this file.
Returns:dataframe containing the data read from the WDC file. First column is a series of datetime objects (in the format yyyy-mm-dd hh:30:00) and second column contains AE values at the specified times (hourly means).
Return type:data (pandas.DataFrame)
magpysv.io.angles_to_geographic(data)[source]

Use D and H values to calculate the X and Y field components.

The declination (D) and horizontal intensity (H) relate to the north (Y) and east (X) components as follows:

X = H*cos(D)

Y = H*sin(D)

Parameters:data (pandas.DataFrame) – dataframe containing columns for datetime objects and hourly means of the magnetic field components (D, I, F, H, X, Y or Z).
Returns:the same dataframe with datetime objects in the first column and hourly means of the field components in either nT or degrees (depending on the component).
Return type:data (pandas.DataFrame)
magpysv.io.ap_readfile(fname)[source]

Load an kp/ap file and place the hourly ap values into a dataframe.

Load a datafile of 3-hourly ap data and extract the contents. Each of the 3-hourly values for a given day is repeated three times to give an hourly mean for all 24 hours of the day. This function is designed to read files downloaded from the GFZ, Potsdam server at ftp://ftp.gfz-potsdam.de/pub/home/obs/kp-ap/.

Parameters:fname (str) – path to an ap datafile.
Returns:dataframe containing hourly ap data. First column is a series of datetime objects (in the format yyyy-mm-dd hh:30:00) and second column contains ap values at the specified times.
Return type:data (pandas.DataFrame)
magpysv.io.append_ae_data(ae_data_path)[source]

Append AE data into a single dataframe containing all years.

Data downloaded from ftp://ftp.ngdc.noaa.gov/STP/GEOMAGNETIC_DATA/INDICES/AURORAL_ELECTROJET/HOURLY/ come in WDC-like format files with one file per year named aeyyyy.wdc (data provided by the WDC at Kyoto. Can be downloaded directly from http://wdc.kugi.kyoto-u.ac.jp/dstae/index.html)

Parameters:ae_data_path (str) – path to directory containing WDC-like format AE datafiles. All AE files should be located in the same directory.
Returns:dataframe containing all available hourly AE data. First column is a series of datetime objects (in the format yyyy-mm-dd hh:30:00) and second column contains AE values at the specified times.
Return type:data (pandas.DataFrame)
magpysv.io.append_ap_data(ap_data_path)[source]

Append ap data into a single dataframe containing all years.

Data downloaded from ftp://ftp.gfz-potsdam.de/pub/home/obs/kp-ap/wdc/ come in WDC-like format files with one file per year named kpyyyy.wdc. This function concatenates all years into a single dataframe.

Parameters:ap_data_path (str) – path to directory containing WDC-like format ap datafiles. All ap files should be located in the same directory.
Returns:dataframe containing all available hourly ap data. First column is a series of datetime objects (in the format yyyy-mm-dd hh:30:00) and second column contains ap values at the specified times.
Return type:data (pandas.DataFrame)
magpysv.io.append_wdc_data(*, obs_name, path=None)[source]

Append all WDC data for an observatory into a single dataframe.

Parameters:
  • obs_name (str) – observatory name (as 3-digit IAGA code).
  • path (str) – path to directory containing WDC datafiles. All files for the observatory should be located in the same directory.
Returns:

dataframe containing all available hourly geomagnetic data at a single observatory. First column is a series of datetime objects (in the format yyyy-mm-dd hh:30:00) and subsequent columns are the X, Y and Z components of the magnetic field at the specified times.

Return type:

data (pandas.DataFrame)

magpysv.io.combine_csv_data(*, start_date, end_date, sampling_rate='MS', obs_list, data_path, model_path, day_of_month=1)[source]

Read and combine observatory and model SV data for several locations.

Calls read_csv_data to read observatory data and field model predictions for each observatory in a list. The data and predictions for individual observatories are combined into their respective large dataframes. The first column contains datetime objects and subsequent columns contain X, Y and Z secular variation/field components (in groups of three) for all observatories.

Parameters:
  • start_date (datetime.datetime) – the start date of the data analysis.
  • end_date (datetime.datetime) – the end date of the analysis.
  • sampling_rate (str) – the sampling rate for the period of interest. The default is ‘MS’, which creates a range of dates between the specified values at monthly intervals with the day fixed as the first of each month. Use ‘M’ for the final day of each month. Other useful options are ‘AS’ (a series of dates at annual intervals, with the day and month fixed at 01 and January respectively) and ‘A’ (as for ‘AS’ but with the day/month fixed as 31 December.)
  • obs_list (list) – list of observatory names (as 3-digit IAGA codes).
  • data_path (str) – path to the CSV files containing observatory data.
  • model_path (str) – path to the CSV files containing model SV data.
  • day_of_month (int) – For SV data, first differences of monthly means have dates at the start of the month (i.e. MF of mid-Feb minus MF of mid-Jan should give SV at Feb 1st. For annual differences of monthly means the MF of mid-Jan year 2 minus MF of mid-Jan year 1 gives SV at mid-July year 1. The dates of COV-OBS output default to the first day of the month (compatible with dates of monthly first differences SV data, but not with those of annual differences). This option is used to set the day part of the dates column if required. Default to 1 (all output dataframes will have dates set at the first day of the month.)
Returns:

tuple containing:

  • obs_data (pandas.DataFrame):
    dataframe containing SV data for all observatories in obs_list.
  • model_sv_data (pandas.DataFrame):
    dataframe containing SV predictions for all observatories in obs_list.
  • model_mf_data (pandas.DataFrame):
    dataframe containing magnetic field predictions for all observatories in obs_list.

Return type:

(tuple)

magpysv.io.covobs_datetimes(data)[source]

Create datetime objects from COV-OBS field model output file.

The format output by the field model is year.decimalmonth e.g. 1960.08 is Jan 1960.

Parameters:data (pandas.DataFrame) – needs a column for decimal year (in yyyy.mm format).
Returns:the same dataframe with the decimal year column replced with a series of datetime objects in the format yyyy-mm-dd.
Return type:data (pandas.DataFrame)
magpysv.io.covobs_parsefile(*, fname, data_type)[source]

Loads MF and SV predictions from the COV-OBS geomagnetic field model.

Load a datafile containing SV/MF predictions from the COV-OBS magnetic field model series by Gillet et al. (2013, Geochem. Geophys. Geosyst., https://doi.org/10.1002/ggge.20041; 2015, Earth, Planets and Space, https://doi.org/10.1186/s40623-015-0225-z) field model.

Parameters:
  • fname (str) – path to a COV-OBS datafile.
  • data_type (str) – specify whether the file contains magnetic field data (‘mf’) or or secular variation data (‘sv’)
Returns:

dataframe containing hourly geomagnetic data. First column is a series of datetime objects (in the format yyyy-mm-dd) and subsequent columns are the X, Y and Z components of the SV/MF at the specified times.

Return type:

model_data (pandas.DataFrame)

magpysv.io.covobs_readfile(*, fname, data_type)[source]

Wrapper function to call covobs_parsefile and covobs_datetimes.

The COV-OBS code (publically available) can be used to produce synthetic observatory time series for other field models if the appropriate spline file is used. The output will be of the same format as COV-OBS output and can be read using MagPySV.

Parameters:
  • fname (str) – path to a COV-OBS format datafile.
  • data_type (str) – specify whether the file contains magnetic field data (‘mf’) or or secular variation data (‘sv’)
Returns:

dataframe containing the data read from the file. First column is a series of datetime objects (in the format yyyy-mm-dd) and subsequent columns are the X, Y and Z components of the SV/MF at the specified times.

Return type:

data (pandas.DataFrame)

magpysv.io.datetime_to_decimal(date)[source]

Convert a datetime object to a decimal year.

Parameters:date (datetime.datetime) – datetime object representing an observation time.
Returns:the same date represented as a decimal year.
Return type:date (float)
magpysv.io.hourly_mean_conversion(data)[source]

Use the tabular base to calculate hourly means in nT or degrees (D, I).

Uses the tabular base and hourly value from the WDC file to calculate the hourly means of magnetic field components. Value is in nT for H, F, X, Y or Z components and in degrees for D or I components. Called by wdc_xyz.

hourly_mean = tabular_base*100 + wdc_hourly_value (for components in nT)

hourly_mean = tabular_base + wdc_hourly_value/600 (for D and I components)

Parameters:data (pandas.DataFrame) – dataframe containing columns for datetime objects, magnetic field component (D, I, F, H, X, Y or Z), the tabular base and hourly mean.
Returns:dataframe with datetime objects in the first column and hourly means of the field components in either nT or degrees (depending on the component).
Return type:obs_data (pandas.DataFrame)
magpysv.io.read_csv_data(*, fname, data_type)[source]

Read dataframe from a CSV file.

Parameters:
  • fname (str) – path to a CSV datafile.
  • data_type (str) – specify whether the file contains magnetic field data (‘mf’) or or secular variation data (‘sv’)
Returns:

dataframe containing the data read from the CSV file.

Return type:

data (pandas.DataFrame)

magpysv.io.separate_hourly_vals(hourstring)[source]

Separate individual hourly field means from the string containing all 24 values in the WDC file. Called by wdc_parsefile.

Parameters:hourstring (str) – string containing the hourly magnetic field means parsed from a WDC file for a single day.
Returns:list containing the hourly field values.
Return type:hourly_vals_list (list)
magpysv.io.separate_hourly_vals_ae(hourstring)[source]

Separate individual hourly field means from the string containing all 24 values in the AE file. Called by ae_parsefile.

Parameters:hourstring (str) – string containing the hourly AE means parsed from a Kyoto WDC-like file for a single day.
Returns:list containing the hourly AE values.
Return type:hourly_vals_list (list)
magpysv.io.separate_three_hourly_vals(hourstring)[source]

Separate 3-hourly ap means from the string containing all 8 values.

Separate the 8 individual 3-hourly ap means from the string containing all values for the day. Each value is repeated 3 times to give a value for each hour. Called by ap_readfile.

Parameters:hourstring (str) – string containing the 3-hourly ap means parsed from a Kyoto WDC-like file for a single day.
Returns:list containing the hourly ap values.
Return type:hourly_vals_list (list)
magpysv.io.wdc_datetimes(data)[source]

Create datetime objects from the fields extracted from a WDC datafile.

Parameters:data (pandas.DataFrame) – needs columns for century, year (yy format), month, day and hour. Called by wdc_parsefile.
Returns:the same dataframe with a series of datetime objects (in the format yyyy-mm-dd hh:30:00) in the first column.
Return type:data (pandas.DataFrame)
magpysv.io.wdc_parsefile(fname)[source]

Load a WDC datafile and place the contents into a dataframe.

Load a datafile of WDC hourly geomagnetic data for a single observatory and extract the contents. Parses the current WDC file format, but not the previous format containing international quiet (Q) or disturbed (D) day designation in place of the century field - only the newer format is downloaded from the BGS servers. Detailed file format description can be found at http://www.wdc.bgs.ac.uk/catalog/format.html

Parameters:fname (str) – path to a WDC datafile.
Returns:dataframe containing hourly geomagnetic data. First column is a series of datetime objects (in the format yyyy-mm-dd hh:30:00) and subsequent columns are the X, Y and Z components of the magnetic field at the specified times.
Return type:data (pandas.DataFrame)
magpysv.io.wdc_readfile(fname)[source]

Wrapper function to call wdc_parsefile, wdc_datetimes and wdc_xyz.

Parameters:fname (str) – path to a WDC datafile.
Returns:dataframe containing the data read from the WDC file. First column is a series of datetime objects (in the format yyyy-mm-dd hh:30:00) and subsequent columns are the X, Y and Z components of the magnetic field at the specified times (hourly means).
Return type:data (pandas.DataFrame)
magpysv.io.wdc_to_hourly_csv(*, wdc_path=None, write_dir, obs_list, print_obs=True)[source]

Convert WDC file to X, Y and Z hourly means and save to CSV file.

Finds WDC hourly data files for all observatories in a directory path (assumes data for all observatories are located inside the same directory). The BGS downloading app distributes data inside a single directory with the naming convention obsyear.wdc where obs is a three digit observatory name in lowercase and year is a four digit year, e.g. ngk1990.wdc or clf2013.wdc. This function converts the hourly WDC format data to hourly X, Y and Z means, appends all years of data for a single observatory into a single dataframe and saves the dataframe to a CSV file.

Parameters:
  • wdc_path (str) – path to the directory containing datafiles.
  • write_dir (str) – directory path to which the output CSV files are written.
  • obs_list (list) – list of observatory names (as 3-digit IAGA codes).
  • print_obs (bool) – choose whether to print each observatory name as the function goes through the directories. Useful for checking progress as it can take a while to read the whole WDC dataset. Defaults to True.
magpysv.io.wdc_xyz(data)[source]

Convert extracted WDC data to hourly X, Y and Z components in nT.

Missing values (indicated by 9999 in the datafiles) are replaced with NaNs.

Parameters:data (pandas.DataFrame) – dataframe containing columns for datetime objects, magnetic field component (D, I, F, H, X, Y or Z), the tabular base and hourly mean.
Returns:the same dataframe with datetime objects in the first column and columns for X, Y and Z components of magnetic field (in nT).
Return type:data (pandas.DataFrame)
magpysv.io.write_csv_data(*, data, write_dir, obs_name, file_prefix=None, decimal_dates=False, header=True)[source]

Write dataframe to a CSV file.

Parameters:
  • data (pandas.DataFrame) – data to be written to file.
  • write_dir (str) – directory path to which the output CSV file is written.
  • obs_name (str) – name of observatory at which the data were obtained.
  • file_prefix (str) – optional string to prefix the output CSV filenames (useful for specifying parameters used to create the dataset etc).
  • decimal_dates (bool) – optional argument to specify that dates should be written in decimal format rather than datetime objects. Defaults to False.
  • header (bool) – option to include header in file. Defaults to True.