Module: module_datacheck

module_datacheck

This file implements the datacheck module. This module combines functions ranging from data quality check to data processing as it runs through all the medical data records. In details:

  • It defines a set of datachecks and runs the data quality tests following instructions given by the developper in dev/dev-defined.js: module:module_datacheck.definition_value. Then it prepares the display to enable their visualization by the user.
    In order to run these tests and manage the outputs properly, the global values listed as members are stored in g, consult module:g.module_datacheck for the complete list.
  • It computes values that are parsed directly to g and are required for other functions all over the place to work properly (which is particularly confusing):

Since:
  • 1.0
TODO
  • The distinction between the two function (data check and processing) could be made in futur versions.

Requires

  • module:index.html
  • module:lang/module-lang.js
  • module:js/main-loadfiles.js

Members

static,constantmodule:module_datacheck.definition_recordArray.<{key: String, isnumber: Boolean}>

dev-defined

Defines the list of fields that are expected to constitute a unique identifier of a record to be used in the errors log in module:module_datacheck. The elements are coded in the following way:

{key:  'header_in_datafile', isnumber: boolean},

staticmodule:module_datacheck.diseasecheckBoolean

module-datacheck

Check if disease array module:g.medical_diseaseslist is provided empty. If true, the list of diseases is created in module:module_datacheck~dataprocessing. If the list is not provided empty, it is used to perform a datacheck in the same function (module:module_datacheck~dataprocessing).
Stored in module:g.module_datacheck.

staticmodule:module_datacheck.logArray.<String>

module-datacheck

A log of all errors and missing data found during the datacheck process. Written in 'csv' style to be easy to export, use and analyze by the user with any external software. All the elements of the array are 'csv' strings. Element 0 contains the headers and other elements contain values parsed by module:module_datacheck~dataprocessing > module:module_datacheck~errorlogging.
This log is then displayed to the user by module:module_datacheck~showlog.
Stored in module:g.module_datacheck.

staticmodule:module_datacheck.showemptyString

module-datacheck

Store the current state of visibility of empty errors by the end user in the errorlog. It is changed by the user in module:module_datacheck~showlog > module:module_datacheck~ShowEmpty and will affect module:module_datacheck~showlog in return.
Stored in module:g.module_datacheck.

staticmodule:module_datacheck.sumObject.<Object>

module-datacheck

A quantitative summary of the datacheck process. Stores separately for each key, number of errors on values and number of empty values. Also aggregates per error and per empty values and sums overall.
The object is populated in module:module_datacheck~dataprocessing > module:module_datacheck~errorlogging.
Stored in module:g.module_datacheck.

Methods

innercompletenessCheck(rec)

module-datacheck

Computes completeness percentage for each combination of administrative level and epiweek, combining records counting from module:g.medical_data is browsed in module:module_datacheck~dataprocessing and numbers of lowest administrative division per administrative division stored in module:g.geometry_subnum. Processing triggered in module:module_datacheck~dataprocessing.

Name Type Description
rec Object

A record from the module:g.medical_data Array.

innerCopyText(id)

module-datacheck

Triggered when the copy to clipboard button is pressed by the user in module:module_datacheck~showlog. Copies to clipboard the text previously selected by module:module_datacheck~SelectText.

Name Type Description
id String

of the div from which the text will be selected.

innerdataprocessing()

module-datacheck

Procedural component: should be rethought to be modular (OO programming...).
Main procedure of the datacheck module.
Contains the definition of the data checks at value and at record levels. Browses all the records of module:g.medical_data and all the values of each record, perform tests and produces an error log module:module_datacheck.log as well as a quantitative summary module:module_datacheck.sum.
Also produces some essential outputs along the way as mentioned in introduction module:module_datacheck. This has been coded this way in order to avoid browsing all the data multiple times but it brings some confusion and should be revised.
Is triggered in module:main_loadfiles~read_medical.

Requires:

Intermediaries: Returns:

TODO
  • The distinction between the two function (data check and processing) could be made in futur versions.
  • Should be broken down in smaller pieces.

innerdisplay()

module-datacheck

Converts module:g.module_datacheck.sum into html to be displayed for user.
Function triggered in module:main_loadfiles~generate_display.

innererrorlogging(test, test_type, key, rec)

module-datacheck

Write the results from a value or record check performed in module:module_datacheck~dataprocessing while browsing all the data. The log is written in a 'csv' style and stored in module:module_datacheck.log.

Name Type Description
test Boolean

Result of a value check test from module:module_datacheck~testvalue or of a record check test from module:module_datacheck~testrecord.

test_type String

Name of the value check test from g.module_datacheck.definition_value test_type or directly parsed by module:module_datacheck~dataprocessing for empty ord duplicate test types.

key String

A key to read a value in rec taken from module:g.medical_headerlist Array.

rec Object

A record from the module:g.medical_data Array.

innerinteraction()

module-datacheck

Show/hide error log interaction by user click on button defined in module:module_datacheck~showlog.
Function triggered in module:main_loadfiles~generate_display.

innerSelectText(id)

module-datacheck

Triggered when the copy to clipboard button is pressed by the user in module:module_datacheck~showlog > module:module_datacheck~CopyText. Selects all the 'csv'-style text of the error log.

Name Type Description
id String

of the div from which the text will be selected.

innerShowEmpty()

module-datacheck

Triggered when the show/hide missing errors button is pressed by the user in module:module_datacheck~showlog. Toggles module:module_datacheck.showempty, redraws the log with module:module_datacheck~showlog and resettle show/hide error log interaction with module:module_datacheck~interaction.

innershowlog()

module-datacheck

Converts module:g.module_datacheck.log into html to be displayed for user. Includes buttons to show/hide missing values (current states stored in module:module_datacheck.showempty) and copy to clipboard. The html string is parsed to #datalog div defined in module:main_loadfiles~generate_display and this showlog function is also triggerd by the same function.

innertestrecord(rec){Boolean}

module-datacheck

Defines datacheck tests on entire records.

Requires module:g.medical_headerlist
Currently implemented test_types is:

  • duplicates which checks if the record is a duplicate - test performed on all records. Uses intermediary uniquereclist and duplicatereclist objects to perform the test.

Name Type Description
rec Object

A record from the module:g.medical_data Array.

Returns:
Type Description
Boolean

innertestvalue(rec, key, other){Boolean}

module-datacheck

Defines datacheck tests on values.
The association between keys in module:g.medical_data and these definitions are made in module:module_datacheck.definition_value.
Requires module:g.medical_headerlist, eventually some more elements for specific tests.
Tests are coded in the following way:

'test_key': function(rec,key,none){
                var value = rec[g.medical_headerlist[key]];
                var cond = ... performs the test on the value
                return cond;
            },

Currently implemented test_types are:
  • integer which checks if the value is an integer,
  • float which checks if the value is a float,
  • positivefloat which checks if the value is a positivefloat,
  • epiwk which checks format is 'YYYY-WW',
  • inlist which checks if the value is in a list (typically parsed in module:module_datacheck.definition_value setup),
  • ingeometry which checks whether the location name in the module:g.medical_data matches any location name in the module:g.geometry_loclists and populates module:g.medical_loclists with matching location names,
  • empty which checks if the value either equals undefined or '' - this test is performed to all values.

Name Type Description
rec Object

A record from the module:g.medical_data Array.

key String

A key to read a value in rec taken from module:g.medical_headerlist Array.

other

A third argument that have to be parsed in some specific cases (currently an Array of elements in inlist test).

Returns:
Type Description
Boolean
Example

Example usage for epiwk test.

epiwk: function(rec,key,none){
	var value = rec[g.medical_headerlist[key]];
	var year = value.split('-')[0];
	var week = value.split('-')[1];
	var cond_1 = (year.length == 4) && (week.length == 2);
	var cond_2 = year > 0;
	var cond_3 = week > 0 && week < 54;
	return cond_1 && cond_2 && cond_3;
},

innertoTitleCase(str){String}

module-datacheck

Takes a string and returns the same string with title case (first letter of each word is capitalized). Used to normalize strings such as administrative division names or disease names.

Name Type Description
str String
Returns:
Type Description
String