{"id":3057,"date":"2020-06-08T20:53:45","date_gmt":"2020-06-08T20:53:45","guid":{"rendered":"http:\/\/vista.cira.colostate.edu\/Improve\/?page_id=3057"},"modified":"2024-05-29T18:27:58","modified_gmt":"2024-05-29T18:27:58","slug":"data-patching","status":"publish","type":"page","link":"https:\/\/vista.cira.colostate.edu\/Improve\/data-patching\/","title":{"rendered":"Data Patching"},"content":{"rendered":"\n<p><strong>Data Patching and Substitution for the Regional Haze Rule&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <\/strong>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Scott Copeland &#8211; 06\/20\/2020<\/p>\n\n\n\n<p><strong>Introduction<\/strong><\/p>\n\n\n\n<p>Two principal approaches are used to fill in missing data in the IMPROVE database. &nbsp;The first follows the algorithm in the U.S. Environmental Protection Agency\u2019s 2003 <em>Guidance for Tracking Progress Under the Regional Haze Rule<sup>1<\/sup><\/em>.&nbsp; This technique is routinely applied to all sites\u2019 data in the IMPROVE network.&nbsp; It uses a statistical approach, described in detail below, to analyze historical data from a site to fill in gaps at that site under certain limited circumstances.&nbsp; The guidance refers to this process as \u201csubstitution\u201d, but it is routinely called \u201cpatching\u201d.<\/p>\n\n\n\n<p>The other technique involves using collocated, similar measurements (e.g., hydrogen mass measured by PESA to infer OM) or scaled data from a nearby site after a demonstration of a suitable correlation.&nbsp; These two techniques are collectively referred to as \u201csubstitution\u201d and are used to replace large amounts of missing data for certain sites and certain years. &nbsp;The substitution analysis itself is done on an ad hoc basis, most often by MJOs, and is not a part of the routine processing performed by IMPROVE.&nbsp; Substituted data, when available, fills in missing values after the patching process.<\/p>\n\n\n\n<p><strong>Steps for Patching Data<\/strong><\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Gather all mass data for the seven aerosol species in the revised (RHR2) IMPROVE light extinction algorithm for a site. Include data for the target year being considered and the four previous years (or as many as are available if fewer than four).<\/li>\n\n\n\n<li>Set negative values to \u201c0\u201d. Make sure units are consistent.\u00a0 Prior to 2011, XRF or PIXE values below MDL are set to MDL\/2.<\/li>\n\n\n\n<li>Calculate the median concentration of each aerosol species for each quarter for the target year and previous four years.\n<ol class=\"wp-block-list\">\n<li>Only consider quarters with at least 50% valid observations and fewer than 10 consecutive invalid samples. This rule is applied to each species separately; e.g., a given quarter could be usable for sulfate but not soil.<\/li>\n<\/ol>\n<\/li>\n\n\n\n<li>Calculate the mean of five quarterly medians for each species for the target year and four previous years.\n<ol class=\"wp-block-list\">\n<li>These means for each quarter become the candidate patch values to be tested.<\/li>\n<\/ol>\n<\/li>\n\n\n\n<li>For up to five years being considered, select all days with all valid species. Calculate reconstructed light extinction, including Rayleigh scattering, using the revised IMPROVE light extinction algorithm.\u00a0 Replace a single species\u2019 actual mass measurement with its candidate patch value and recalculate the day\u2019s reconstructed light extinction using the patch value.<\/li>\n\n\n\n<li>Determine the number of times the difference between the original light extinction and the recalculated extinction based on the candidate patch value is less than 10% of the original reconstructed extinction.<\/li>\n\n\n\n<li>If the difference calculated for a species in step 6 is less than 10% for 90% or more of the tested sample days, then patching is allowed for that species for all four quarters of the target year.\n<ol class=\"wp-block-list\">\n<li>The candidate patch values are applied quarterly.<\/li>\n<\/ol>\n<\/li>\n\n\n\n<li>Patch values replace any missing occurrence of that species, not just sample days with all valid species. When patching is allowed, missing values of a species are replaced with the allowed quarterly patch value.\n<ol class=\"wp-block-list\">\n<li>After patching, set the \u201c_subbed\u201d flag for that species to \u201c1\u201d (\u201c0\u201d means valid measurement, and \u201c2\u201d means substitution was performed, not patching.).<\/li>\n<\/ol>\n<\/li>\n\n\n\n<li>The data can now be used to determine final data completeness and haziest day RHR metrics as well as impairment-based RHR metrics.<\/li>\n<\/ol>\n\n\n\n<p><strong>Extent of Patching Changed in late 2019<\/strong><\/p>\n\n\n\n<p>Prior to 12\/2019, the patching was applied to a maximum of one missing species for any sample day.&nbsp; Data set versions generated beginning in 12\/2019 allow up to two missing species per day to be patched.&nbsp; This potentially changes the data at every site for the entire data record including natural conditions, 2064 endpoints, and RH2 and impairment metrics.&nbsp; The changes are generally very small though there are cases where whole sample years now meet the RHR completeness requirement which previously did not.&nbsp; All versions of the data files are date stamped.&nbsp; More details are available in this <a href=\"https:\/\/vista.cira.colostate.edu\/DataWarehouse\/IMPROVE\/Data\/SummaryData\/RHR_2018\/Updated\/Changes to IMPROVE RHR Metric Data Processing since 10_2019 v5_20.pptx\">Powerpoint<\/a>.<\/p>\n\n\n\n<p><strong>Results<\/strong><\/p>\n\n\n\n<p>Patching and substitution are routinely performed on IMPROVE data that are used to generate regional haze metrics reported through FED (<a href=\"http:\/\/views.cira.colostate.edu\/fed\/\">http:\/\/views.cira.colostate.edu\/fed\/<\/a>) and IMPROVE (<a href=\"https:\/\/vista.cira.colostate.edu\/Improve\/\">https:\/\/vista.cira.colostate.edu\/Improve\/<\/a>).&nbsp; Although the patching affects a fairly small subset of all observations, it allows some site-years that otherwise would not to meet completeness requirements.&nbsp; Patched values can be identified by a \u201c1\u201d value in the \u201c_subbed\u201d flag column, substituted values by a \u201c2\u201d.&nbsp; It is recommended that patched and substituted values be removed for comparisons with model output or analytical techniques such as PMF, which could be biased by them.<\/p>\n\n\n\n<p><sup>1 <\/sup>U.S. Environmental Protection Agency. <em>Tracking Progress Under the Regional Haze Rule<\/em>. EPA-454\/B-03\u2013004, Washington, DC, September 2003.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Data Patching and Substitution for the Regional Haze Rule&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Scott Copeland &#8211; 06\/20\/2020 Introduction Two principal approaches are used to fill in missing data in the IMPROVE database. &nbsp;The first follows the algorithm in the U.S. Environmental Protection Agency\u2019s 2003 Guidance for Tracking Progress Under the Regional Haze Rule1.&nbsp; This technique is routinely applied [&hellip;]<\/p>\n","protected":false},"author":7,"featured_media":0,"parent":0,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"_acf_changed":false,"footnotes":""},"class_list":["post-3057","page","type-page","status-publish","hentry"],"acf":[],"_links":{"self":[{"href":"https:\/\/vista.cira.colostate.edu\/Improve\/wp-json\/wp\/v2\/pages\/3057","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/vista.cira.colostate.edu\/Improve\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/vista.cira.colostate.edu\/Improve\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/vista.cira.colostate.edu\/Improve\/wp-json\/wp\/v2\/users\/7"}],"replies":[{"embeddable":true,"href":"https:\/\/vista.cira.colostate.edu\/Improve\/wp-json\/wp\/v2\/comments?post=3057"}],"version-history":[{"count":5,"href":"https:\/\/vista.cira.colostate.edu\/Improve\/wp-json\/wp\/v2\/pages\/3057\/revisions"}],"predecessor-version":[{"id":3910,"href":"https:\/\/vista.cira.colostate.edu\/Improve\/wp-json\/wp\/v2\/pages\/3057\/revisions\/3910"}],"wp:attachment":[{"href":"https:\/\/vista.cira.colostate.edu\/Improve\/wp-json\/wp\/v2\/media?parent=3057"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}