3
Finally realized I was overcleaning my data for 2 years lmao
I was running all my training data through like 15 steps of cleaning thinking I was being thorough. Then a friend from a startup in SF showed me their pipeline - they barely clean anything and their model performance was actually better. I had been stripping out too much useful noise. Has anyone else found that less cleaning can sometimes help with AI models?
3 comments
Log in to join the discussion
Log In3 Comments
charlie1987d ago
My roommate spent 3 months cleaning data that ended up being a corrupted CSV file.
6
scott.alex6d ago
Ugh, that's brutal, I feel for them.
0
shane1707d ago
lol yeah I went through the same phase, thought I was a data janitor scrubbing every little stain out. Turns out I was basically throwing the baby out with the bathwater and running a model on nothing but bathwater fumes. My buddy from some no-name startup just dumps his raw data in, runs a basic filter, and his stuff outperforms mine every time. It's kind of embarrassing how many hours I wasted on "cleaning" when the noise was actually doing the heavy lifting. What kind of stuff were you stripping out that you now think was useful?
5