This paper introduces us the full-text search engine based on Lucene and full-text retrieval technology, including indexing and system architecture, compares the full-text search of Lucene with the String search retri...This paper introduces us the full-text search engine based on Lucene and full-text retrieval technology, including indexing and system architecture, compares the full-text search of Lucene with the String search retrieval’s response time, the experimental results show that the full text search of Lucene has faster retrieval speed.展开更多
Web crawlers are an important part of modern search engines.With the development of the times,data has exploded and humans have entered a“big data era”.For example,Wikipedia carries the knowledge from all over the w...Web crawlers are an important part of modern search engines.With the development of the times,data has exploded and humans have entered a“big data era”.For example,Wikipedia carries the knowledge from all over the world,records the realtime news that occurs every day,and provides users with a good database of data,but because of the large amount of data,it puts a lot of pressure on users to search.At present,single-threaded crawling data can no longer meet the requirements of text crawling.In order to improve the performance and program versatility of single-threaded crawlers,a high-speed multi-threaded web crawler is designed to crawl the network hyper-scale text database.Multi-threaded crawling uses multiple threads to process web pages in parallel,combining breadth-first and depth-first algorithms to control web crawling.The practice project is based on the Python language to achieve multi-threaded optimization network hyper-large-scale text database-Wikipedia book crawling method,the project is inspired by the article on the Wikipedia article in the Big Data Digest public number.展开更多
In recent years,the growing popularity of social media platforms has led to several interesting natural language processing(NLP)applications.However,these social media-based NLP applications are subject to different t...In recent years,the growing popularity of social media platforms has led to several interesting natural language processing(NLP)applications.However,these social media-based NLP applications are subject to different types of adversarial attacks due to the vulnerabilities of machine learning(ML)and NLP techniques.This work presents a new low-level adversarial attack recipe inspired by textual variations in online social media communication.These variations are generated to convey the message using out-of-vocabulary words based on visual and phonetic similarities of characters and words in the shortest possible form.The intuition of the proposed scheme is to generate adversarial examples influenced by human cognition in text generation on social media platforms while preserving human robustness in text understanding with the fewest possible perturbations.The intentional textual variations introduced by users in online communication motivate us to replicate such trends in attacking text to see the effects of such widely used textual variations on the deep learning classifiers.In this work,the four most commonly used textual variations are chosen to generate adversarial examples.Moreover,this article introduced a word importance ranking-based beam search algorithm as a searching method for the best possible perturbation selection.The effectiveness of the proposed adversarial attacks has been demonstrated on four benchmark datasets in an extensive experimental setup.展开更多
文摘This paper introduces us the full-text search engine based on Lucene and full-text retrieval technology, including indexing and system architecture, compares the full-text search of Lucene with the String search retrieval’s response time, the experimental results show that the full text search of Lucene has faster retrieval speed.
基金This research is funded by the Open Foundation for the University Innovation Platform in the Hunan Province,grant number 16K013Hunan Provincial Natural Science Foundation of China,grant number 2017JJ2016+2 种基金2016 Science Research Project of Hunan Provincial Department of Education,grant number 16C0269.Accurate crawler design and implementation with a data cleaning function,National Students innovation and entrepreneurship of training program,grant number 201811532010.This research work is implemented at the 2011 Collaborative Innovation Center for Development and Utilization of Finance and Economics Big Data Property,Universities of Hunan Province.Open Foundation for the University Innovation Platform in the Hunan Province,grant number 16K013Hunan Provincial Natural Science Foundation of China,grant number 2017JJ20162016 Science Research Project of Hunan Provincial Department of Education,grant number 16C0269.This research work is implemented at the 2011 Collaborative Innovation Center for Development and Utilization of Finance and Economics Big Data Property,Universities of Hunan Province.Open project,grant number 20181901CRP03,20181901CRP04,20181901CRP05.
文摘Web crawlers are an important part of modern search engines.With the development of the times,data has exploded and humans have entered a“big data era”.For example,Wikipedia carries the knowledge from all over the world,records the realtime news that occurs every day,and provides users with a good database of data,but because of the large amount of data,it puts a lot of pressure on users to search.At present,single-threaded crawling data can no longer meet the requirements of text crawling.In order to improve the performance and program versatility of single-threaded crawlers,a high-speed multi-threaded web crawler is designed to crawl the network hyper-scale text database.Multi-threaded crawling uses multiple threads to process web pages in parallel,combining breadth-first and depth-first algorithms to control web crawling.The practice project is based on the Python language to achieve multi-threaded optimization network hyper-large-scale text database-Wikipedia book crawling method,the project is inspired by the article on the Wikipedia article in the Big Data Digest public number.
基金supported by the National Research Foundation of Korea (NRF)grant funded by the Korea government (MSIT) (No.NRF-2022R1A2C1007434)by the BK21 FOUR Program of the NRF of Korea funded by the Ministry of Education (NRF5199991014091).
文摘In recent years,the growing popularity of social media platforms has led to several interesting natural language processing(NLP)applications.However,these social media-based NLP applications are subject to different types of adversarial attacks due to the vulnerabilities of machine learning(ML)and NLP techniques.This work presents a new low-level adversarial attack recipe inspired by textual variations in online social media communication.These variations are generated to convey the message using out-of-vocabulary words based on visual and phonetic similarities of characters and words in the shortest possible form.The intuition of the proposed scheme is to generate adversarial examples influenced by human cognition in text generation on social media platforms while preserving human robustness in text understanding with the fewest possible perturbations.The intentional textual variations introduced by users in online communication motivate us to replicate such trends in attacking text to see the effects of such widely used textual variations on the deep learning classifiers.In this work,the four most commonly used textual variations are chosen to generate adversarial examples.Moreover,this article introduced a word importance ranking-based beam search algorithm as a searching method for the best possible perturbation selection.The effectiveness of the proposed adversarial attacks has been demonstrated on four benchmark datasets in an extensive experimental setup.