Abstract
1-Introduction
2-Background
3-Proposed Solution
4-Conclusion
Acknowledgement
References
Abstract
The rapid development of the Internet, especially mobile Internet, makes it much easier for people to make social contacts online. Nowadays they tend to spend more and more time on social network service, producing a lot of image files. This brings a challenge to traditional standalone framework on handing the continued increasing image files. Therefore, it is advisable to find a new way to face the challenge. Hadoop is a notable, widely-used project for distributed storage and computations with high efficiency, data integrity, reliability and fault tolerance. Hadoop Distributed File System and MapReduce are two primary subprojects respectively for big data storage and computations. However, Hadoop do not provide any interface for image processing. Worse, both Hadoop Distributed File System and MapReduce have trouble processing large amount of small files, decreasing efficiency of files access and distributed computations. This prevents us from performing images processing actions on Hadoop. This paper proposes a method to optimize small image files storage on Hadoop and self-defines an input/output format to enable Hadoop to process image files.
Introduction
At the meantime, the rapid development of mobile technology and the wide spread of the Internet give rise to the time people spend on social networking service every day. They chat and share their life by pictures or videos with friends whenever they access to the net. Take Weibo as an example. According to The Report of Weibo Users Development 2016[2], the number of monthly active users on Weibo have reached 297 million while daily active users have got to 132 million by September, 2016. The report also revealed that these users active on Weibo mainly write tweet in the form of images with some text description, which accounts for 60%. We can simply estimate that even if we take Weibo alone into consideration, about 70 million of images will be created, stored and uploaded every day. And clearly there are many other network services similar to Weibo, popular and producing large amount of data especially images data. Thus, it is hard for traditional standalone framework to store and handle such a large number of image files. A solution to this problem is needed.