Cleaning Data - A Retrospective

When we first started the phase 3 project, we had to figure out a way to clean and organize the data. Our dataset from Kaggle on H1-B Visa Applications had over three million rows of data. We tried using tools such as OpenRefine to try and upload the data so we could filter it. However, these tools would typically be very memory intensive.

Our last option was to use a Python script that my group mate, Akash, wrote. This was very quick and allowed us to filter our data from over three million rows to just over 621,000 rows of data. Looking back on it, it is definitely worthwhile to pick up Python skills. As a more visually oriented designer, I can appreciate and respect the people that learn and are great coders!

Snippet of the Python script that Akash wrote.

