Skip to content
Snippets Groups Projects
Commit 1f6a89ce authored by Tarun karthik kumar Mamidi's avatar Tarun karthik kumar Mamidi
Browse files

Merge branch 'unit-test-investigation' into 'master'

Unit test investigation

See merge request center-for-computational-genomics-and-data-science/sciops/covid-19_risk_predictor!4
parents 0681578a e0b53dce
No related branches found
No related tags found
No related merge requests found
Pipeline #4676 canceled with stage
pipeline {
agent any
options {
timestamps()
ansiColor('xterm')
}
environment {
GITLAB_API_TOKEN = credentials('GitLabToken')
BASE_GITLAB_URL = credentials('BaseGitlabUrl')
}
stages {
stage('Static Analysis') {
agent {
docker { image '${BASE_GITLAB_URL}/center-for-computational-genomics-and-data-science/utility-images/static-analysis:v1.1'}
}
steps {
sh '/bin/linting.sh'
}
post {
success {
sh "curl --request POST --header \"PRIVATE-TOKEN: ${GITLAB_API_TOKEN}\" \"https://gitlab.rc.uab.edu/api/v4/projects/1585/statuses/${GIT_COMMIT}?state=success&name=jenkins_static_analysis\""
}
failure {
sh "curl --request POST --header \"PRIVATE-TOKEN: ${GITLAB_API_TOKEN}\" \"https://gitlab.rc.uab.edu/api/v4/projects/1585/statuses/${GIT_COMMIT}?state=canceled&name=jenkins_static_analysis\""
}
}
}
stage('Unit Test') {
agent {
docker { image 'continuumio/miniconda3:4.9.2' }
}
steps {
sh 'conda env create --file configs/environment.yaml'
sh 'python -m unittest -v testing/unit_test.py'
}
post {
success {
sh "curl --request POST --header \"PRIVATE-TOKEN: ${GITLAB_API_TOKEN}\" \"https://gitlab.rc.uab.edu/api/v4/projects/1585/statuses/${GIT_COMMIT}?state=success&name=jenkins_unit_tests\""
}
failure {
sh "curl --request POST --header \"PRIVATE-TOKEN: ${GITLAB_API_TOKEN}\" \"https://gitlab.rc.uab.edu/api/v4/projects/1585/statuses/${GIT_COMMIT}?state=canceled&name=jenkins_unit_tests\""
}
}
}
}
post {
success {
sh "curl --request POST --header \"PRIVATE-TOKEN: ${GITLAB_API_TOKEN}\" \"https://gitlab.rc.uab.edu/api/v4/projects/1585/statuses/${GIT_COMMIT}?state=success&name=jenkins\""
}
failure {
sh "curl --request POST --header \"PRIVATE-TOKEN: ${GITLAB_API_TOKEN}\" \"https://gitlab.rc.uab.edu/api/v4/projects/1585/statuses/${GIT_COMMIT}?state=canceled&name=jenkins\""
}
}
}
The MIT License (MIT) # The MIT License (MIT)
Copyright (c) 2021 Center for Computational Genomics and Data Science Copyright (c) 2021 Center for Computational Genomics and Data Science
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated
documentation files (the "Software"), to deal in the Software without restriction, including without limitation the
rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit
persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. The above copyright notice and this permission notice shall be included in all copies or substantial portions of the
Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE
\ No newline at end of file WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR
COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR
OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
- [COVID-19_RISK_PREDICTOR](#COVID-19_RISK_PREDICTOR)
- [Data availability](#Data-availability)
- [Usage](#Usage)
- [Installation](#Installation)
- [Requirements](#Requirements)
- [Activate conda environment](#Activate-conda-environment)
- [Run parser](#Run-parser)
- [Run model training](#Run-model-training)
- [Build Streamlit app](#Build-Streamlit-app)
- [Contact information](#Contact-information)
# COVID-19_RISK_PREDICTOR # COVID-19_RISK_PREDICTOR
***!!! For research purposes only !!!*** ***!!! For research purposes only !!!***
**Aim:** To develop a model that takes in demographics, living style and symptoms/conditions to predict risk of COVID-19 infection for patients. - [COVID-19_RISK_PREDICTOR](#covid-19_risk_predictor)
- [Data availability](#data-availability)
- [Usage](#usage)
- [Installation](#installation)
- [Requirements](#requirements)
- [Activate conda environment](#activate-conda-environment)
- [Run parser](#run-parser)
- [Run model training](#run-model-training)
- [Build Streamlit app](#build-streamlit-app)
- [Unit Testing](#unit-testing)
- [Contact information](#contact-information)
**Aim:** To develop a model that takes in demographics, living style and symptoms/conditions to predict risk of COVID-19
infection for patients.
## Data availability ## Data availability
Data was made available through the UAB Biomedical Research Information Technology Enhancement (U-BRITE) framework. Access to the level-2 i2b2 data was granted upon self-service pursuant to an IRB exemption. [link](https://www.uab.edu/ccts/research-commons/berd/55-research-commons/informatics/325-i2b2)
Data was made available through the UAB Biomedical Research Information Technology Enhancement (U-BRITE) framework.
Access to the level-2 i2b2 data was granted upon self-service pursuant to an IRB exemption.
[link](https://www.uab.edu/ccts/research-commons/berd/55-research-commons/informatics/325-i2b2)
### Directory structure used to parse data from positive and negative cohorts ### Directory structure used to parse data from positive and negative cohorts
Dataset used was transformed to adhere to the [OMOP Common Data Model Version 5.3.1](https://ohdsi.github.io/CommonDataModel/cdm531.html) to enable systemic analyses of EHR data from disparate sources.
``` Dataset used was transformed to adhere to the [OMOP Common Data Model Version 5.3.1](https://ohdsi.github.io/CommonDataModel/cdm531.html)
to enable systemic analyses of EHR data from disparate sources.
```directory
Cohorts/ Cohorts/
├── positive <--- positive cohort directory ├── positive <--- positive cohort directory
│ ├── measurement.csv - test and results │ ├── measurement.csv - test and results
...@@ -38,10 +43,10 @@ Cohorts/ ...@@ -38,10 +43,10 @@ Cohorts/
└── README.md └── README.md
``` ```
## Usage ## Usage
### Installation ### Installation
Installation simply requires fetching the source code. Following are required: Installation simply requires fetching the source code. Following are required:
- Git - Git
...@@ -81,32 +86,62 @@ conda activate rico ...@@ -81,32 +86,62 @@ conda activate rico
``` ```
### Run parser ### Run parser
```
```sh
python src/filter_dataset.py --pos Cohorts/positive/ --neg Cohorts/negative/ python src/filter_dataset.py --pos Cohorts/positive/ --neg Cohorts/negative/
``` ```
For help, use the `-h` help argument For help, use the `-h` help argument
```
```sh
python src/filter_dataset.py -h python src/filter_dataset.py -h
``` ```
parsed files are saved in `./results` directory. parsed files are saved in `./results` directory.
### Run model training ### Run model training
```
```sh
python src/Model.py --input results/encoded-100-week-filter.csv python src/Model.py --input results/encoded-100-week-filter.csv
``` ```
output files are saved in `./results` directory. output files are saved in `./results` directory.
### Build Streamlit app ### Build Streamlit app
As an example, we created a streamlit app with the results from our model. Please refer to
To demonstrate the application of these models one of the four was chosen and a sample Streamlit app was created and included in the project. Please refer to
`src/streamlit/RICO.py` `src/streamlit/RICO.py`
**Note** - This Streamlit app is for demonstration of one of the models and is not a necessity for the pipeline but only for display of calculation and interpretation. The questionnaire from the models can be used manually without this. Hence, the Streamlit app is not tested and should be used at your own risk for demo purposes or as a guide for building from this work.
### Unit Testing
To test the functions in `filter_dataset.py`, use the below command -
```sh
python -m unittest -v testing/unit_test.py
```
To test the coverage of testing, use the below commands -
```sh
# test the coverage
coverage run -m unittest -v testing/unit_test.py
# To get a coverage report
coverage report
# To get annotated HTML listings
coverage html
```
**Note** - Functions in `Model.py` are adapted from [this Github repo](https://github.com/yandexdataschool/roc_comparison),
where they already implemented unit testing.
## Contact information ## Contact information
For issues, please send an email with clear description to
For issues, please send an email with clear description to
Tarun Mamidi - tmamidi@uab.edu Tarun Mamidi - tmamidi@uab.edu
Ryan Melvin - rmelvin@uabmc.edu Ryan Melvin - rmelvin@uabmc.edu
\ No newline at end of file
...@@ -10,8 +10,10 @@ dependencies: ...@@ -10,8 +10,10 @@ dependencies:
- pyyaml=5.4.1 - pyyaml=5.4.1
- matplotlib=3.3.4 - matplotlib=3.3.4
- scikit-learn=0.24.1 - scikit-learn=0.24.1
- pip - black=21.5b0
- parameterized=0.8.1
- pip=21.1.1
- pip: - pip:
- scorecardpy==0.1.9.2 - scorecardpy==0.1.9.2
- xverse==1.0.5 - xverse==1.0.5
- coverage==5.5
#libraries # libraries
import pandas as pd import pandas as pd
import numpy as np import numpy as np
import xverse import xverse
...@@ -8,13 +8,14 @@ import sklearn ...@@ -8,13 +8,14 @@ import sklearn
from sklearn.model_selection import train_test_split, StratifiedKFold from sklearn.model_selection import train_test_split, StratifiedKFold
import statsmodels.api as sm import statsmodels.api as sm
from matplotlib import pyplot from matplotlib import pyplot
#%matplotlib inline #%matplotlib inline
from joblib import dump, load from joblib import dump, load
import argparse import argparse
from scipy import stats from scipy import stats
# Functions for computing AUC CI using Delong's method # Functions for computing AUC CI using Delong's method
#!/usr/bin/python #!/usr/bin/python
""" """
AUC DeLong CI AUC DeLong CI
...@@ -47,7 +48,7 @@ def compute_midrank(x): ...@@ -47,7 +48,7 @@ def compute_midrank(x):
j = i j = i
while j < N and Z[j] == Z[i]: while j < N and Z[j] == Z[i]:
j += 1 j += 1
T[i:j] = 0.5*(i + j - 1) T[i:j] = 0.5 * (i + j - 1)
i = j i = j
T2 = np.empty(N, dtype=np.float) T2 = np.empty(N, dtype=np.float)
# Note(kazeevn) +1 is due to Python using 0-based indexing # Note(kazeevn) +1 is due to Python using 0-based indexing
...@@ -127,9 +128,9 @@ def fastDeLong_weights(predictions_sorted_transposed, label_1_count, sample_weig ...@@ -127,9 +128,9 @@ def fastDeLong_weights(predictions_sorted_transposed, label_1_count, sample_weig
total_negative_weights = sample_weight[m:].sum() total_negative_weights = sample_weight[m:].sum()
pair_weights = np.dot(sample_weight[:m, np.newaxis], sample_weight[np.newaxis, m:]) pair_weights = np.dot(sample_weight[:m, np.newaxis], sample_weight[np.newaxis, m:])
total_pair_weights = pair_weights.sum() total_pair_weights = pair_weights.sum()
aucs = (sample_weight[:m]*(tz[:, :m] - tx)).sum(axis=1) / total_pair_weights aucs = (sample_weight[:m] * (tz[:, :m] - tx)).sum(axis=1) / total_pair_weights
v01 = (tz[:, :m] - tx[:, :]) / total_negative_weights v01 = (tz[:, :m] - tx[:, :]) / total_negative_weights
v10 = 1. - (tz[:, m:] - ty[:, :]) / total_positive_weights v10 = 1.0 - (tz[:, m:] - ty[:, :]) / total_positive_weights
sx = np.cov(v01) sx = np.cov(v01)
sy = np.cov(v10) sy = np.cov(v10)
delongcov = sx / m + sy / n delongcov = sx / m + sy / n
...@@ -215,192 +216,183 @@ def delong_roc_variance(ground_truth, predictions, sample_weight=None): ...@@ -215,192 +216,183 @@ def delong_roc_variance(ground_truth, predictions, sample_weight=None):
predictions: np.array of floats of the probability of being class 1 predictions: np.array of floats of the probability of being class 1
""" """
order, label_1_count, ordered_sample_weight = compute_ground_truth_statistics( order, label_1_count, ordered_sample_weight = compute_ground_truth_statistics(
ground_truth, sample_weight) ground_truth, sample_weight
)
predictions_sorted_transposed = predictions[np.newaxis, order] predictions_sorted_transposed = predictions[np.newaxis, order]
aucs, delongcov = fastDeLong(predictions_sorted_transposed, label_1_count, ordered_sample_weight) aucs, delongcov = fastDeLong(
predictions_sorted_transposed, label_1_count, ordered_sample_weight
)
assert len(aucs) == 1, "There is a bug in the code, please forward this to the developers" assert len(aucs) == 1, "There is a bug in the code, please forward this to the developers"
return aucs[0], delongcov return aucs[0], delongcov
if __name__ == "__main__": if __name__ == "__main__":
# Data setup # Data setup
# Read, filter based on missingness and identical limits, # Read, filter based on missingness and identical limits,
# train/test split, and perform WoE transform. # train/test split, and perform WoE transform.
parser = argparse.ArgumentParser() parser = argparse.ArgumentParser()
parser.add_argument( parser.add_argument("--input", type=str, required=True, help="input encoded file")
"--input",
type=str,
required=True,
help="input encoded file")
args = parser.parse_args() args = parser.parse_args()
# load data # load data
encoded = pd.read_csv(args.input) encoded = pd.read_csv(args.input)
encoded = encoded.drop(['PERSON_ID'],axis=1) encoded = encoded.drop(["PERSON_ID"], axis=1)
# filter variable via missing rate, iv, identical value rate # filter variable via missing rate, iv, identical value rate
encoded_f = sc.var_filter(encoded encoded_f = sc.var_filter(
, y="class" encoded,
, positive='negative' y="class",
, identical_limit = 0.95 positive="negative",
, iv_limit = 0 identical_limit=0.95,
, missing_limit=0.95 iv_limit=0,
, return_rm_reason=False # makes output a dictionary referencing 2 dfs missing_limit=0.95,
, var_kp=['f_R06' return_rm_reason=False, # makes output a dictionary referencing 2 dfs
, 'f_R05' var_kp=[
, 'f_R50' "f_R06",
, 'f_R53' "f_R05",
, 'f_M79' "f_R50",
, 'f_R09' "f_R53",
, 'f_R51' "f_M79",
, 'f_J44' "f_R09",
, 'f_E11' "f_R51",
, 'f_I25' "f_J44",
, 'f_I10' "f_E11",
] "f_I25",
, var_rm = [ "f_I10",
'f_BMI-unknown' ],
, 'f_Unknown' var_rm=["f_BMI-unknown", "f_Unknown"],
] )
)
# breaking dt into train and test # breaking dt into train and test
train, test = sc.split_df(encoded_f, 'class').values() train, test = sc.split_df(encoded_f, "class").values()
# woe binning ------ # woe binning ------
bins = sc.woebin(encoded_f, y="class") bins = sc.woebin(encoded_f, y="class")
# converting train and test into woe values # converting train and test into woe values
train_woe = sc.woebin_ply(train, bins) train_woe = sc.woebin_ply(train, bins)
test_woe = sc.woebin_ply(test, bins) test_woe = sc.woebin_ply(test, bins)
# get xs and ys # get xs and ys
y_train = train_woe.loc[:,'class'] y_train = train_woe.loc[:, "class"]
X_train = train_woe.loc[:,train_woe.columns != 'class'] X_train = train_woe.loc[:, train_woe.columns != "class"]
y_test = test_woe.loc[:,'class'] y_test = test_woe.loc[:, "class"]
X_test = test_woe.loc[:,train_woe.columns != 'class'] X_test = test_woe.loc[:, train_woe.columns != "class"]
# Lasso-based regression # Lasso-based regression
# Determine a lambda for Lasso (l1) regularization using # Determine a lambda for Lasso (l1) regularization using
# 10-fold cross validation, get predictions from best model, score, and make scorecard # 10-fold cross validation, get predictions from best model, score, and make scorecard
# logistic regression ------ # logistic regression ------
# lasso # lasso
from sklearn.linear_model import LogisticRegressionCV, LogisticRegression from sklearn.linear_model import LogisticRegressionCV, LogisticRegression
lasso_cv = LogisticRegressionCV(penalty='l1'
, Cs = 100 lasso_cv = LogisticRegressionCV(
, solver='saga' penalty="l1",
, cv = StratifiedKFold(10) Cs=100,
, n_jobs=-1 solver="saga",
, max_iter = 10000 cv=StratifiedKFold(10),
, scoring = 'neg_log_loss' n_jobs=-1,
, class_weight = 'balanced' max_iter=10000,
) scoring="neg_log_loss",
class_weight="balanced",
)
lasso_cv.fit(X_train, y_train) lasso_cv.fit(X_train, y_train)
# plot training ROC # plot training ROC
sklearn.metrics.plot_roc_curve(lasso_cv, X_train, y_train) sklearn.metrics.plot_roc_curve(lasso_cv, X_train, y_train)
pyplot.plot([0, 1], [0, 1], color='black', lw=2, linestyle='--') pyplot.plot([0, 1], [0, 1], color="black", lw=2, linestyle="--")
pyplot.title('LASSO Training ROC') pyplot.title("LASSO Training ROC")
axes = pyplot.gca() axes = pyplot.gca()
axes.set_facecolor("white") axes.set_facecolor("white")
axes.set_clip_on(False) axes.set_clip_on(False)
pyplot.savefig('results/training_roc.png') pyplot.savefig("results/training_roc.png")
# plot testing ROC # plot testing ROC
sklearn.metrics.plot_roc_curve(lasso_cv, X_test, y_test) sklearn.metrics.plot_roc_curve(lasso_cv, X_test, y_test)
pyplot.plot([0, 1], [0, 1], color='black', lw=2, linestyle='--') pyplot.plot([0, 1], [0, 1], color="black", lw=2, linestyle="--")
pyplot.title('LASSO Testing ROC') pyplot.title("LASSO Testing ROC")
axes = pyplot.gca() axes = pyplot.gca()
axes.set_facecolor("white") axes.set_facecolor("white")
axes.set_clip_on(False) axes.set_clip_on(False)
pyplot.savefig('results/testing_roc.png') pyplot.savefig("results/testing_roc.png")
# predicted proability # predicted proability
train_pred = lasso_cv.predict_proba(X_train)[:,1] train_pred = lasso_cv.predict_proba(X_train)[:, 1]
train_pred_class = lasso_cv.predict(X_train) train_pred_class = lasso_cv.predict(X_train)
test_pred = lasso_cv.predict_proba(X_test)[:,1] test_pred = lasso_cv.predict_proba(X_test)[:, 1]
test_pred_class = lasso_cv.predict(X_test) test_pred_class = lasso_cv.predict(X_test)
# Make scorecard # Make scorecard
card = sc.scorecard(bins, lasso_cv, X_train.columns) card = sc.scorecard(bins, lasso_cv, X_train.columns)
# credit score # credit score
train_score = sc.scorecard_ply(train, card, print_step=0) train_score = sc.scorecard_ply(train, card, print_step=0)
test_score = sc.scorecard_ply(test, card, print_step=0) test_score = sc.scorecard_ply(test, card, print_step=0)
# psi # psi
pyplot.rcParams["font.size"] = "18" pyplot.rcParams["font.size"] = "18"
fig = sc.perf_psi( fig = sc.perf_psi(
score = {'train':train_score, 'test':test_score}, score={"train": train_score, "test": test_score},
label = {'train':y_train, 'test':y_test}, label={"train": y_train, "test": y_test},
x_tick_break=50 x_tick_break=50,
) )
fig['pic']['score'].set_size_inches(18.5, 10.5) fig["pic"]["score"].set_size_inches(18.5, 10.5)
fig['pic']['score'].savefig('results/dist.png') fig["pic"]["score"].savefig("results/dist.png")
card_df = pd.concat(card) card_df = pd.concat(card)
card_df.to_csv('results/lasso_card_df.csv') card_df.to_csv("results/lasso_card_df.csv")
scores_lasso_2week = sc.scorecard_ply(encoded, card, only_total_score=True, print_step=0, replace_blank_na=True) scores_lasso_2week = sc.scorecard_ply(
scores_lasso_2week.to_csv('results/scores_lasso.csv') encoded, card, only_total_score=True, print_step=0, replace_blank_na=True
)
scores_lasso_2week.to_csv("results/scores_lasso.csv")
# Training Metrics and AUC CI # Training Metrics and AUC CI
print("Training Metrics") print("Training Metrics")
# calculate accuracy # calculate accuracy
acc = sklearn.metrics.accuracy_score(y_train, train_pred_class) acc = sklearn.metrics.accuracy_score(y_train, train_pred_class)
print('Accuracy: %.3f' % acc) print("Accuracy: %.3f" % acc)
auc_score = sklearn.metrics.roc_auc_score(y_train, train_pred) auc_score = sklearn.metrics.roc_auc_score(y_train, train_pred)
print('AUC: %.3f' % auc_score) print("AUC: %.3f" % auc_score)
f_score = sklearn.metrics.f1_score(y_train, train_pred_class) f_score = sklearn.metrics.f1_score(y_train, train_pred_class)
print('FS: %.3f' % f_score) print("FS: %.3f" % f_score)
# delong ci # delong ci
delong_alpha = 0.95 delong_alpha = 0.95
auc, auc_cov = delong_roc_variance( auc, auc_cov = delong_roc_variance(np.ravel(y_train), np.ravel(train_pred))
np.ravel(y_train),
np.ravel(train_pred))
auc_std = np.sqrt(auc_cov) auc_std = np.sqrt(auc_cov)
lower_upper_q = np.abs(np.array([0, 1]) - (1 - delong_alpha) / 2) lower_upper_q = np.abs(np.array([0, 1]) - (1 - delong_alpha) / 2)
ci = stats.norm.ppf( ci = stats.norm.ppf(lower_upper_q, loc=auc_score, scale=auc_std)
lower_upper_q,
loc=auc_score,
scale=auc_std)
ci[ci > 1] = 1 ci[ci > 1] = 1
print('AUC COV:', round(auc_cov,2)) print("AUC COV:", round(auc_cov, 2))
print('95% AUC CI:', np.round(ci,2)) print("95% AUC CI:", np.round(ci, 2))
# Testing Metrics and AUC CI # Testing Metrics and AUC CI
print("Testing Metrics") print("Testing Metrics")
# calculate accuracy # calculate accuracy
acc = sklearn.metrics.accuracy_score(y_test, test_pred_class) acc = sklearn.metrics.accuracy_score(y_test, test_pred_class)
print('Accuracy: %.3f' % acc) print("Accuracy: %.3f" % acc)
auc_score = sklearn.metrics.roc_auc_score(y_test, test_pred) auc_score = sklearn.metrics.roc_auc_score(y_test, test_pred)
print('AUC: %.3f' % auc_score) print("AUC: %.3f" % auc_score)
f_score = sklearn.metrics.f1_score(y_test, test_pred_class) f_score = sklearn.metrics.f1_score(y_test, test_pred_class)
print('FS: %.3f' % f_score) print("FS: %.3f" % f_score)
# delong ci # delong ci
delong_alpha = 0.95 delong_alpha = 0.95
auc, auc_cov = delong_roc_variance( auc, auc_cov = delong_roc_variance(np.ravel(y_test), np.ravel(test_pred))
np.ravel(y_test),
np.ravel(test_pred))
auc_std = np.sqrt(auc_cov) auc_std = np.sqrt(auc_cov)
lower_upper_q = np.abs(np.array([0, 1]) - (1 - delong_alpha) / 2) lower_upper_q = np.abs(np.array([0, 1]) - (1 - delong_alpha) / 2)
ci = stats.norm.ppf( ci = stats.norm.ppf(lower_upper_q, loc=auc_score, scale=auc_std)
lower_upper_q,
loc=auc_score,
scale=auc_std)
ci[ci > 1] = 1 ci[ci > 1] = 1
print('AUC COV:', round(auc_cov,2)) print("AUC COV:", round(auc_cov, 2))
print('95% AUC CI:', np.round(ci,2)) print("95% AUC CI:", np.round(ci, 2))
\ No newline at end of file
This diff is collapsed.
...@@ -2,247 +2,255 @@ import streamlit as st ...@@ -2,247 +2,255 @@ import streamlit as st
import plotly.graph_objects as go import plotly.graph_objects as go
st.title("COVID-19 Risk Predictor") st.title("COVID-19 Risk Predictor")
st.markdown("<h3 style='text-align: right;'>for research purposes only</h3>", unsafe_allow_html=True) st.markdown(
''' "<h3 style='text-align: right;'>for research purposes only</h3>", unsafe_allow_html=True,
)
"""
''' """
pd.options.display.max_colwidth = 500 pd.options.display.max_colwidth = 500
def imc_chart(imc): def imc_chart(imc):
if (imc>=213): if imc >= 213:
color="red" color = "red"
'## Alert: Please take a COVID test immediately.' "## Alert: Please take a COVID test immediately."
# '### You are >20% likely.' # '### You are >20% likely.'
elif (imc>=170 and imc<213): elif imc >= 170 and imc < 213:
color="orange" color = "orange"
'## Alert: Please consult a doctor to take COVID test' "## Alert: Please consult a doctor to take COVID test"
elif (imc>=0 and imc<170): elif imc >= 0 and imc < 170:
color = "lightgreen" color = "lightgreen"
'## Alert: Please consult a doctor to take COVID test' "## Alert: Please consult a doctor to take COVID test"
elif (imc<0): elif imc < 0:
color="green" color = "green"
'## Alert: Please consult a doctor if you have symptoms' "## Alert: Please consult a doctor if you have symptoms"
fig = go.Figure(go.Indicator( fig = go.Figure(
mode = "gauge+number+delta", go.Indicator(
domain = {'x': [0, 1], 'y': [0, 1]}, mode="gauge+number+delta",
value = imc, domain={"x": [0, 1], "y": [0, 1]},
title = {'text': "Patient Risk Score"}, value=imc,
delta = {'reference': 213, 'increasing': {'color': "RebeccaPurple"}}, title={"text": "Patient Risk Score"},
gauge = { delta={"reference": 213, "increasing": {"color": "RebeccaPurple"}},
'axis': {'range': [-170, 350], 'tickwidth': 1, 'tickcolor': "darkblue"}, gauge={
'bar': {'color': color}, "axis": {"range": [-170, 350], "tickwidth": 1, "tickcolor": "darkblue"},
'steps' : [ "bar": {"color": color},
{'range': [-170, 350], 'color': "white"}], "steps": [{"range": [-170, 350], "color": "white"}],
'threshold' : {'line': {'color': 'red', 'width': 8}, "threshold": {
'thickness': 0.75, 'value': 213}})) "line": {"color": "red", "width": 8},
"thickness": 0.75,
"value": 213,
},
},
)
)
return fig return fig
age = st.sidebar.selectbox( age = st.sidebar.selectbox("Please select your age:", ("", "<20", "20-39", "40-54", ">55"))
'Please select your age:',
('','<20', '20-39','40-54','>55'))
if age =='': if age == "":
age = 0 age = 0
elif age =='<20': elif age == "<20":
age = 29 age = 29
elif age =='20-39': elif age == "20-39":
age = -15 age = -15
elif age =='40-54': elif age == "40-54":
age = 25 age = 25
elif age =='>55': elif age == ">55":
age = -6 age = -6
race = st.sidebar.selectbox( race = st.sidebar.selectbox(
'What was or would be your race on the 2020 census?', "What was or would be your race on the 2020 census?",
('','Decline to answer', 'Asian','White','Black or African American','Hispanic or Latino','Other or Multiple')) (
if race =='': "",
"Decline to answer",
"Asian",
"White",
"Black or African American",
"Hispanic or Latino",
"Other or Multiple",
),
)
if race == "":
race = 0 race = 0
elif race =='Decline to answer': elif race == "Decline to answer":
race = -34 race = -34
elif race =='Asian': elif race == "Asian":
race = 74 race = 74
elif race =='White': elif race == "White":
race = -27 race = -27
elif race =='Black or African American': elif race == "Black or African American":
race = 26 race = 26
elif race =='Hispanic or Latino': elif race == "Hispanic or Latino":
race = 18 race = 18
elif race =='Other or Multiple': elif race == "Other or Multiple":
race = 35 race = 35
cough = st.sidebar.selectbox("Do you have a cough?", ("", "Yes", "No"))
if cough == "":
cough = st.sidebar.selectbox(
'Do you have a cough?',
('','Yes', 'No'))
if cough == '':
cough = 0 cough = 0
elif cough == 'Yes': elif cough == "Yes":
cough = 79 cough = 79
elif cough == 'No': elif cough == "No":
cough = -37 cough = -37
smoke = st.sidebar.selectbox( smoke = st.sidebar.selectbox("Do you smoke?", ("", "Yes", "No"))
'Do you smoke?', if smoke == "":
('','Yes', 'No'))
if smoke == '':
smoke = 0 smoke = 0
elif smoke == 'Yes': elif smoke == "Yes":
smoke = -64 smoke = -64
elif smoke == 'No': elif smoke == "No":
smoke = 10 smoke = 10
drink = st.sidebar.selectbox( drink = st.sidebar.selectbox("Do you drink?", ("", "Yes", "No", "Former"))
'Do you drink?', if drink == "":
('','Yes', 'No','Former'))
if drink == '':
drink = 0 drink = 0
elif drink == 'Yes': elif drink == "Yes":
drink = 3 drink = 3
elif drink == 'No': elif drink == "No":
drink = 0 drink = 0
elif drink == 'Former': elif drink == "Former":
drink = -32 drink = -32
fever = st.sidebar.selectbox( fever = st.sidebar.selectbox("Do you have fever?", ("", "Yes", "No"))
'Do you have fever?', if fever == "":
('','Yes', 'No'))
if fever == '':
fever = 0 fever = 0
elif fever == 'Yes': elif fever == "Yes":
fever = 33 fever = 33
elif fever == 'No': elif fever == "No":
fever = -2 fever = -2
tired = st.sidebar.selectbox( tired = st.sidebar.selectbox("Do you feel tired?", ("", "Yes", "No"))
'Do you feel tired?', if tired == "":
('','Yes', 'No'))
if tired == '':
tired = 0 tired = 0
elif tired == 'Yes': elif tired == "Yes":
tired = 20 tired = 20
elif tired == 'No': elif tired == "No":
tired = -1 tired = -1
muscle = st.sidebar.selectbox( muscle = st.sidebar.selectbox("Do you feel muscle pain?", ("", "Yes", "No"))
'Do you feel muscle pain?', if muscle == "":
('','Yes', 'No'))
if muscle == '':
muscle = 0 muscle = 0
elif muscle == 'Yes': elif muscle == "Yes":
muscle = 25 muscle = 25
elif muscle == 'No': elif muscle == "No":
muscle = -2 muscle = -2
mucus = st.sidebar.selectbox( mucus = st.sidebar.selectbox("Have you had increased mucus or phlegm?", ("", "Yes", "No"))
'Have you had increased mucus or phlegm?', if mucus == "":
('','Yes', 'No'))
if mucus == '':
mucus = 0 mucus = 0
elif mucus == 'Yes': elif mucus == "Yes":
mucus = 25 mucus = 25
elif mucus == 'No': elif mucus == "No":
mucus = -2 mucus = -2
headache = st.sidebar.selectbox( headache = st.sidebar.selectbox("Do you have a headache?", ("", "Yes", "No"))
'Do you have a headache?', if headache == "":
('','Yes', 'No'))
if headache == '':
headache = 0 headache = 0
elif headache == 'Yes': elif headache == "Yes":
headache = 119 headache = 119
elif headache == 'No': elif headache == "No":
headache = -5 headache = -5
t2d = st.sidebar.selectbox( t2d = st.sidebar.selectbox("Do you have Type 2 diabetes?", ("", "Yes", "No"))
'Do you have Type 2 diabetes?', if t2d == "":
('','Yes', 'No'))
if t2d == '':
t2d = 0 t2d = 0
elif t2d == 'Yes': elif t2d == "Yes":
t2d = 12 t2d = 12
elif t2d == 'No': elif t2d == "No":
t2d = -2 t2d = -2
pregnant = st.sidebar.selectbox( pregnant = st.sidebar.selectbox("Are you pregnant?", ("", "Yes", "No"))
'Are you pregnant?', if pregnant == "":
('','Yes', 'No'))
if pregnant == '':
pregnant = 0 pregnant = 0
elif pregnant == 'Yes': elif pregnant == "Yes":
pregnant = -93 pregnant = -93
elif pregnant == 'No': elif pregnant == "No":
pregnant = 9 pregnant = 9
kidney = st.sidebar.selectbox( kidney = st.sidebar.selectbox(
'Are you currently seeing a doctor for kidney issues?', "Are you currently seeing a doctor for kidney issues?", ("", "Yes", "No")
('','Yes', 'No')) )
if kidney == '': if kidney == "":
kidney = 0 kidney = 0
elif kidney == 'Yes': elif kidney == "Yes":
kidney = -85 kidney = -85
elif kidney == 'No': elif kidney == "No":
kidney = 6 kidney = 6
hyper = st.sidebar.selectbox( hyper = st.sidebar.selectbox("Have you been diagnosed with hypertension?", ("", "Yes", "No"))
'Have you been diagnosed with hypertension?', if hyper == "":
('','Yes', 'No'))
if hyper == '':
hyper = 0 hyper = 0
elif hyper == 'Yes': elif hyper == "Yes":
hyper = -40 hyper = -40
elif hyper == 'No': elif hyper == "No":
hyper = 8 hyper = 8
heart = st.sidebar.selectbox( heart = st.sidebar.selectbox(
'Have you been diagnosed with heart disease (other than hypertension)?', "Have you been diagnosed with heart disease (other than hypertension)?", ("", "Yes", "No"),
('','Yes', 'No')) )
if heart == '': if heart == "":
heart = 0 heart = 0
elif heart == 'Yes': elif heart == "Yes":
heart = -56 heart = -56
elif heart == 'No': elif heart == "No":
heart = 4 heart = 4
anxiety = st.sidebar.selectbox( anxiety = st.sidebar.selectbox(
'Have you been diagnosed with an anxiety disorder?', "Have you been diagnosed with an anxiety disorder?", ("", "Yes", "No")
('','Yes', 'No')) )
if anxiety == '': if anxiety == "":
anxiety = 0 anxiety = 0
elif anxiety == 'Yes': elif anxiety == "Yes":
anxiety = -64 anxiety = -64
elif anxiety == 'No': elif anxiety == "No":
anxiety = 4 anxiety = 4
copd = st.sidebar.selectbox( copd = st.sidebar.selectbox("Have you been diagnosed with COPD?", ("", "Yes", "No"))
'Have you been diagnosed with COPD?', if copd == "":
('','Yes', 'No'))
if copd == '':
copd = 0 copd = 0
elif copd == 'Yes': elif copd == "Yes":
copd = -101 copd = -101
elif copd == 'No': elif copd == "No":
copd = 3 copd = 3
total = 137 + cough + smoke + fever + tired +muscle + mucus +headache +t2d + pregnant + kidney + heart + anxiety + hyper + copd + drink + age + race total = (
137
+ cough
+ smoke
+ fever
+ tired
+ muscle
+ mucus
+ headache
+ t2d
+ pregnant
+ kidney
+ heart
+ anxiety
+ hyper
+ copd
+ drink
+ age
+ race
)
st.write(imc_chart(total)) st.write(imc_chart(total))
'## 💊 Patient Risk Score:', total "## 💊 Patient Risk Score:", total
''' """
--- ---
Risk score chart: Risk score chart:
------------------ ------------------
#### Base score = 137 #### Base score = 137
Predictive safest score < 0 Predictive safest score < 0
...@@ -252,4 +260,4 @@ Predictive risk score = 170 - 213 ...@@ -252,4 +260,4 @@ Predictive risk score = 170 - 213
Predictive high risk = >213 Predictive high risk = >213
''' """
\ No newline at end of file
import unittest
from parameterized import parameterized
from src import filter_dataset
# inspired from https://github.com/wolever/parameterized
class FilterDatasetTest(unittest.TestCase):
# For icd codes
@parameterized.expand(
[["diabetesIcdCode", "E08.22", "E08",], ["withoutdot", "E08", "E08"],]
)
def test_icd(self, name, icdCode, expectedCategory):
assert filter_dataset.icd(icdCode) == expectedCategory
# For observation table habits values
@parameterized.expand(
[
["former_smoker", "Former smoker-HX Tobacco use", "former_smoker",],
["No_Substance_use", "None-SHX Substance abuse use", "No_Substance_use"],
["No_alcohol", "None-SHX Alcohol use", "No_alcohol"],
["irreg_BMI_instance", "BMI-30+", "30.0-34.9"],
["irreg_BMI_instance1", "Body mass index (BMI 50.0-59.9), adult-Z68.43", "50-59.9"],
["irreg_BMI_instance2", "Body mass index (BMI 20.0_24.9), adult-Z68.24", "20.0_24.9"],
]
)
def test_habits(self, name, habits, expectedhabits):
assert filter_dataset.parse_values(habits) == expectedhabits
# For observation table BMI values
@parameterized.expand(
[
["age_43", "Body mass index (BMI) 50.0-59.9, adult-Z68.43", "BMI-40.0_or_greater",],
["age_38", "Body mass index (BMI) 38.0-38.9, adult-Z68.38", "BMI-25.0_39.9"],
["age_24", "Body mass index (BMI) 24.0-24.9, adult-Z68.24", "BMI-20.0_24.9"],
]
)
def test_BMI(self, name, BMI, expectedBMI):
assert filter_dataset.weight_bins(BMI) == expectedBMI
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment