This research article explains in detail the pre-processing stage unifying various techniques, using real and open public data from Peru, between the years 2016-2019. The main objective is to address the study of gender inequality through clean and reliable data. This article shows how to group and clean 6 data sets by category to identify and interpret inequality factors, extract valuable information that can be used in data mining models, and contribute to future decision making. The pre-processing techniques were validated using various prediction algorithms and their performances were compared using ranking metrics.
Copyright 2019 - Centro de Investigación de la Universidad del Pacífico