Comments on: Ask a Data Scientist: Data Leakage https://insidebigdata.com/2014/11/26/ask-data-scientist-data-leakage/ Your Source for AI, Data Science, Deep Learning & Machine Learning Strategies Fri, 09 Sep 2022 17:57:55 +0000 hourly 1 https://wordpress.org/?v=6.3.2 By: Daniel Gutierrez https://insidebigdata.com/2014/11/26/ask-data-scientist-data-leakage/#comment-452754 Fri, 09 Sep 2022 17:57:55 +0000 http://insidebigdata.com/?p=12428#comment-452754 [READER COMMENT] I was reading your 2014 article Ask a Data Scientist: Data Leakage for insideBIGDATA. I know it’s a quite old article but I was wondering if you were able to give me more pointers about this sentence: “There are war stories of algorithms with data leakage running in production systems for years before the bugs in the data creation or training scripts were detected”.

I’m Denis, a PhD student at the ENS in Paris. Lately, I’ve been focus in formally proving the absence/presence of data leakage on machine learning code – in the data preparation phase. I would love to receive some suggestion/pointers of such problem in code snippets where data leakage happened in production! I don’t have a working tool yet, I’m still in the exploration phase to understand what are the ML scientist needs about data leakage.

]]>
By: Henok https://insidebigdata.com/2014/11/26/ask-data-scientist-data-leakage/#comment-99627 Fri, 23 Sep 2016 07:05:03 +0000 http://insidebigdata.com/?p=12428#comment-99627 I have a spatial model that uses the response variable itself as a predictor but it would be useful for making predictions on new unseen dataset. so, is there data leakage in my model(spatial model)?

]]>