Comments on: Ask a Data Scientist: Handling Missing Data https://insidebigdata.com/2014/10/29/ask-data-scientist-handling-missing-data/ Your Source for AI, Data Science, Deep Learning & Machine Learning Strategies Tue, 22 May 2018 21:45:21 +0000 hourly 1 https://wordpress.org/?v=6.3.2 By: Piyapong Khumrin https://insidebigdata.com/2014/10/29/ask-data-scientist-handling-missing-data/#comment-121952 Mon, 13 Mar 2017 00:59:50 +0000 http://insidebigdata.com/?p=12294#comment-121952 I read you article about how to handle missing data from http://insidebigdata.com/2014/10/29/ask-data-scientist-handling-missing-data/ which is very useful for me to help me to find a solution how to handle on my PhD research.

In my research, I train a machine learning model with clinical cases to predict diagnoses and use the model to guide medical students along the way that they try to solve unknown cases in a game-based learning tool.

I could get a good predictive model and reasonable prediction on the learning system until I discover some predictions are quite odd.

I tried to find the reason and discovered that there are some features that contained 100% missing data for one target class.

I realised that it’s always missing because doctors never investigate that value becuase it dose not give any benefit for that diagnosis.

However, when I create a scenario, I have to add that feature value because it should be available if a user want to know the information.

Therefore, when a user choose that feature and the unknow instance contain the information of that feature, I use the machine learning model to predict the unknown case for diagnosis X. Because in a training data, that feature is 100% for diagnosis X. When I try to get a prediction of the unknow case with the information of that feature. The prediction drops to zero because the model never learned with existing data from that feature.

I tried to read articles how to handle missing data but it’s for handling some missing data but not 100% missing data.

Would that be possible to ask your advice on how to handle with 100% missing data?

]]>