tools module

Module containing functions to parse files output by magnetic field models.

Part of the MagPySV package for geomagnetic data analysis. This module provides various functions to read SV files output by geomagnetic field models.

magpysv.tools.apply_Ap_threshold(*, Ap_file=None, obs_data, threshold)[source]

Remove observatory data for times with ap values above threshold value.

Parameters:
  • Ap_file (str) – path to file containing hourly values for the ap index.
  • obs_data (pandas.DataFrame) – dataframe containing hourly means of observed geomagnetic field values.
  • threshold (int) – the threshold Ap value. Data for days with a higher Ap value will be replaced with NaNs and omitted from monthly (or annual etc) means.
Returns:

data with ap threshold applied.

Return type:

obs_data (pandas.DataFrame)

magpysv.tools.calculate_correlation_index(*, dates, signal, index_file)[source]

Calculate correlation coefficient between signal and a geomagnetic index

Parameters:
  • dates (datetime.datetime) – dates of time series measurements.
  • signal (time series) – data to be compared with geomagnetic index.
  • index_file (str) – path to file containing geomagnetic index.
Returns:

tuple containing:

  • coeff (float):
    correlation coefficient.
  • merged (pandas.DataFrame):
    dataframe containing dates, signal and the geomagnetic index.

Return type:

(tuple)

magpysv.tools.calculate_residuals(*, obs_data, model_data)[source]

Calculate SV residuals (observed - prediction) using datetime objects.

Parameters:
  • obs_data (pandas.DataFrame) – dataframe containing means (usually monthly) of SV calculated from observed geomagnetic field values.
  • model_data (pandas.DataFrame) – dataframe containing the SV predicted by a geomagnetic field model.
Returns:

dataframe containing SV residuals.

Return type:

residuals (pandas.DataFrame)

magpysv.tools.calculate_sv(obs_data, mean_spacing=1)[source]

Calculate the secular variation from the observed magnetic field values.

Uses monthly means of geomagnetic observatory data to calculate the SV according to user-specified sampling. The typical choices are monthly differences of monthly means and annual differences of monthly means. For samplings other than monthly differences, the datetime objects of the calculated SV are taken the midpoint of the datetime objects of the field data. E.g. differencing the means of the field January 1999 and in January 2000 yields the SV for July 1999.

Parameters:
  • obs_data (pandas.DataFrame) – dataframe containing means (usually monthly) of observed geomagnetic field values.
  • mean_spacing (int) – the number of months separating the monthly mean values used to calculate the SV. Set to 1 (default) to use adjacent months of data (first differences of monthly means) or set to 12 to calculate annual differences of monthly means.
Returns:

dataframe containing SV data.

Return type:

obs_sv (pandas.DataFrame)

magpysv.tools.calculate_sv_index(obs_data, mean_spacing=1)[source]

Calculate the secular variation of a geomagnetic index.

Uses monthly means of a geomagnetic index (e.g. Dst) to calculate the SV according to user-specified sampling. The typical choices are monthly differences of monthly means and annual differences of monthly means. For samplings other than monthly differences, the datetime objects of the calculated SV are taken the midpoint of the datetime objects of the field data. E.g. differencing the means of the field January 1999 and in January 2000 yields the SV for July 1999.

Parameters:
  • obs_data (pandas.DataFrame) – dataframe containing means (usually monthly) of observed geomagnetic field values.
  • mean_spacing (int) – the number of months separating the monthly mean values used to calculate the SV. Set to 1 (default) to use adjacent months of data (first differences of monthly means) or set to 12 to calculate annual differences of monthly means.
Returns:

dataframe containing SV data.

Return type:

obs_sv (pandas.DataFrame)

magpysv.tools.correct_baseline_change(*, observatory, field_data, baseline_data, print_data)[source]

Correct documented baseline changes.

Parameters:
  • observatory (str) – observatory name given as a 3-digit IAGA code.
  • field_data (pandas.DataFrame) – uncorrected magnetic field data.
  • baseline_data (pandas.DataFrame) – baseline discontinuity data in the format output by get_baseline_info.
  • print_data (bool) – option to print the corrections made.
magpysv.tools.data_resampling(data, sampling='MS', average_date=True)[source]

Resample the hourly geomagnetic data to a specified frequency.

Parameters:
  • data (pandas.DataFrame) – dataframe containing datetime objects and hourly means of magnetic data.
  • sampling (str) – new sampling frequency. Default value is ‘MS’ (monthly means), which averages data for each month and sets the datetime object to the first day of that month. Use ‘M’ to set the datetime object to the final day of the month. Another useful option is ‘A’ (annual means), which averages data for a whole year and sets the datetime object to the final day of the year. Use ‘AS’ to set the datetime object to the first day of the year.
  • average_date (bool) – the specified resampling intervals only have options for setting the date to the first (‘MS’ and ‘AS’) or final (‘M’ and ‘A’) day of the month or year. For monthly averages, a more appropriate representative date is the middle of that month (i.e. the 15th day of the month). For annual averages, an appropriate representative date is the middle of that year (taken as July 1st of the year.) This option is used to set the dates to the centre of the sampling interval. Defaults to True.
Returns:

dataframe of datetime objects and monthly/annual means of observatory data.

Return type:

data (pandas.DataFrame)

magpysv.tools.get_baseline_info(*, fname=None)[source]

Read documented baseline changes from a file. :param fname: location of file containing documented baseline changes. :type fname: str

Returns:baseline change data.
Return type:data (pandas.DataFrame)
magpysv.tools.remove_selected_points(*, data, fname)[source]

Remove specified points from dataset based on list of points in a file.

Reads a list of unwanted points from a file and removes them from the dataframe if present. E.g. If the user had monthly SV means and wished to exclude the X value at NGK in January 2015 from the analysis, the following line would be written in a file

2015-01-01,ngk,X

It is preferable (and more repeatable) to use the included outlier detection algorithms to remove outliers.

Parameters:
  • data (pandas.DataFrame) – dataframe containing datetime objects and daily means of magnetic data.
  • fname (str) – path to file containing a list of unwanted data points.
Returns:

the same dataframe with the points removed.

Return type:

data (pandas.DataFrame)