Web-shopping is popular and widely used nowadays; however, comparing price difference among various platforms is tedious and inefficient for customers. A customer might be interested in a particular brand but have no concrete idea what model he/she is going to buy. In this work, we will provide a suggestion for users which platform has cheapest laptops of a certain brand to help them make choices. Because of the large amount of laptop models in the market, the comparisons would become a time-consuming job. Therefore, we are interesting in the performance improvement which Hadoop can bring to us if we partition the workload into parallelism.
Comparing price difference between two web-shopping platforms, we must get the price of a certain model from one platform and then search the same model on the other one. This operation is similar to do equijoin on the same model on two websites.
However, with the help of robust search function in these websites, we can utilize it as index to facilitate our equijoin and therefore minimize the time and resource for our service.
The flow of our program is stated as follows: