Text perception is crucial for understanding the semantics of outdoor scenes,making it a key requirement for building intelligent systems for driver assistance or autonomous driving.Text information in car-mounted vid...Text perception is crucial for understanding the semantics of outdoor scenes,making it a key requirement for building intelligent systems for driver assistance or autonomous driving.Text information in car-mounted videos can assist drivers in making decisions.However,Car-mounted video text images pose challenges such as complex backgrounds,small fonts,and the need for real-time detection.We proposed a robust Car-mounted Video Text Detector(CVTD).It is a lightweight text detection model based on ResNet18 for feature extraction,capable of detecting text in arbitrary shapes.Our model efficiently extracted global text positions through the Coordinate Attention Threshold Activation(CATA)and enhanced the representation capability through stacking two Feature Pyramid Enhancement Fusion Modules(FPEFM),strengthening feature representation,and integrating text local features and global position information,reinforcing the representation capability of the CVTD model.The enhanced feature maps,when acted upon by Text Activation Maps(TAM),effectively distinguished text foreground from non-text regions.Additionally,we collected and annotated a dataset containing 2200 images of Car-mounted Video Text(CVT)under various road conditions for training and evaluating our model’s performance.We further tested our model on four other challenging public natural scene text detection benchmark datasets,demonstrating its strong generalization ability and real-time detection speed.This model holds potential for practical applications in real-world scenarios.展开更多
Large language models(LLMs),such as ChatGPT developed by OpenAI,represent a significant advancement in artificial intelligence(AI),designed to understand,generate,and interpret human language by analyzing extensive te...Large language models(LLMs),such as ChatGPT developed by OpenAI,represent a significant advancement in artificial intelligence(AI),designed to understand,generate,and interpret human language by analyzing extensive text data.Their potential integration into clinical settings offers a promising avenue that could transform clinical diagnosis and decision-making processes in the future(Thirunavukarasu et al.,2023).This article aims to provide an in-depth analysis of LLMs’current and potential impact on clinical practices.Their ability to generate differential diagnosis lists underscores their potential as invaluable tools in medical practice and education(Hirosawa et al.,2023;Koga et al.,2023).展开更多
To remove handwritten texts from an image of a document taken by smart phone,an intelligent removal method was proposed that combines dewarping and Fully Convolutional Network with Atrous Convolutional and Atrous Spat...To remove handwritten texts from an image of a document taken by smart phone,an intelligent removal method was proposed that combines dewarping and Fully Convolutional Network with Atrous Convolutional and Atrous Spatial Pyramid Pooling(FCN-AC-ASPP).For a picture taken by a smart phone,firstly,the image is transformed into a regular image by the dewarping algorithm.Secondly,the FCN-AC-ASPP is used to classify printed texts and handwritten texts.Lastly,handwritten texts can be removed by a simple algorithm.Experiments show that the classification accuracy of the FCN-AC-ASPP is better than FCN,DeeplabV3+,FCN-AC.For handwritten texts removal effect,the method of combining dewarping and FCN-AC-ASPP is superior to FCN-AC-ASP alone.展开更多
Scene text detection is an important task in computer vision.In this paper,we present YOLOv5 Scene Text(YOLOv5ST),an optimized architecture based on YOLOv5 v6.0 tailored for fast scene text detection.Our primary goal ...Scene text detection is an important task in computer vision.In this paper,we present YOLOv5 Scene Text(YOLOv5ST),an optimized architecture based on YOLOv5 v6.0 tailored for fast scene text detection.Our primary goal is to enhance inference speed without sacrificing significant detection accuracy,thereby enabling robust performance on resource-constrained devices like drones,closed-circuit television cameras,and other embedded systems.To achieve this,we propose key modifications to the network architecture to lighten the original backbone and improve feature aggregation,including replacing standard convolution with depth-wise convolution,adopting the C2 sequence module in place of C3,employing Spatial Pyramid Pooling Global(SPPG)instead of Spatial Pyramid Pooling Fast(SPPF)and integrating Bi-directional Feature Pyramid Network(BiFPN)into the neck.Experimental results demonstrate a remarkable 26%improvement in inference speed compared to the baseline,with only marginal reductions of 1.6%and 4.2%in mean average precision(mAP)at the intersection over union(IoU)thresholds of 0.5 and 0.5:0.95,respectively.Our work represents a significant advancement in scene text detection,striking a balance between speed and accuracy,making it well-suited for performance-constrained environments.展开更多
Text classification,by automatically categorizing texts,is one of the foundational elements of natural language processing applications.This study investigates how text classification performance can be improved throu...Text classification,by automatically categorizing texts,is one of the foundational elements of natural language processing applications.This study investigates how text classification performance can be improved through the integration of entity-relation information obtained from the Wikidata(Wikipedia database)database and BERTbased pre-trained Named Entity Recognition(NER)models.Focusing on a significant challenge in the field of natural language processing(NLP),the research evaluates the potential of using entity and relational information to extract deeper meaning from texts.The adopted methodology encompasses a comprehensive approach that includes text preprocessing,entity detection,and the integration of relational information.Experiments conducted on text datasets in both Turkish and English assess the performance of various classification algorithms,such as Support Vector Machine,Logistic Regression,Deep Neural Network,and Convolutional Neural Network.The results indicate that the integration of entity-relation information can significantly enhance algorithmperformance in text classification tasks and offer new perspectives for information extraction and semantic analysis in NLP applications.Contributions of this work include the utilization of distant supervised entity-relation information in Turkish text classification,the development of a Turkish relational text classification approach,and the creation of a relational database.By demonstrating potential performance improvements through the integration of distant supervised entity-relation information into Turkish text classification,this research aims to support the effectiveness of text-based artificial intelligence(AI)tools.Additionally,it makes significant contributions to the development ofmultilingual text classification systems by adding deeper meaning to text content,thereby providing a valuable addition to current NLP studies and setting an important reference point for future research.展开更多
To promote behavioral change among adolescents in Zambia, the National HIV/AIDS/STI/TB Council, in collaboration with UNICEF, developed the Zambia U-Report platform. This platform provides young people with improved a...To promote behavioral change among adolescents in Zambia, the National HIV/AIDS/STI/TB Council, in collaboration with UNICEF, developed the Zambia U-Report platform. This platform provides young people with improved access to information on various Sexual Reproductive Health topics through Short Messaging Service (SMS) messages. Over the years, the platform has accumulated millions of incoming and outgoing messages, which need to be categorized into key thematic areas for better tracking of sexual reproductive health knowledge gaps among young people. The current manual categorization process of these text messages is inefficient and time-consuming and this study aims to automate the process for improved analysis using text-mining techniques. Firstly, the study investigates the current text message categorization process and identifies a list of categories adopted by counselors over time which are then used to build and train a categorization model. Secondly, the study presents a proof of concept tool that automates the categorization of U-report messages into key thematic areas using the developed categorization model. Finally, it compares the performance and effectiveness of the developed proof of concept tool against the manual system. The study used a dataset comprising 206,625 text messages. The current process would take roughly 2.82 years to categorise this dataset whereas the trained SVM model would require only 6.4 minutes while achieving an accuracy of 70.4% demonstrating that the automated method is significantly faster, more scalable, and consistent when compared to the current manual categorization. These advantages make the SVM model a more efficient and effective tool for categorizing large unstructured text datasets. These results and the proof-of-concept tool developed demonstrate the potential for enhancing the efficiency and accuracy of message categorization on the Zambia U-report platform and other similar text messages-based platforms.展开更多
传统编目分类和规则匹配方法存在工作效能低、过度依赖专家知识、缺乏对古籍文本自身语义的深层次挖掘、编目主题边界模糊、较难实现对古籍文本领域主题的精准推荐等问题。为此,本文结合古籍语料特征探究如何实现精准推荐符合研究者需...传统编目分类和规则匹配方法存在工作效能低、过度依赖专家知识、缺乏对古籍文本自身语义的深层次挖掘、编目主题边界模糊、较难实现对古籍文本领域主题的精准推荐等问题。为此,本文结合古籍语料特征探究如何实现精准推荐符合研究者需求的文本主题内容的方法,以推动数字人文研究的进一步发展。首先,选取本课题组前期标注的古籍语料数据进行主题类别标注和视图分类;其次,构建融合BERT(bidirectional encoder representation from transformers)预训练模型、改进卷积神经网络、循环神经网络和多头注意力机制的语义挖掘模型;最后,融入“主体-关系-客体”多视图的语义增强模型,构建DJ-TextRCNN(DianJi-recurrent convolutional neural networks for text classification)模型实现对典籍文本更细粒度、更深层次、更多维度的语义挖掘。研究结果发现,DJ-TextRCNN模型在不同视图下的古籍主题推荐任务的准确率均为最优。在“主体-关系-客体”视图下,精确率达到88.54%,初步实现了对古籍文本的精准主题推荐,对中华文化深层次、细粒度的语义挖掘具有一定的指导意义。展开更多
以编目分类和规则匹配为主的古籍文本主题分类方法存在工作效能低、专家知识依赖性强、分类依据单一化、古籍文本主题自动分类难等问题。对此,本文结合古籍文本内容和文字特征,尝试从古籍内容分类得到符合研究者需求的主题,推动数字人...以编目分类和规则匹配为主的古籍文本主题分类方法存在工作效能低、专家知识依赖性强、分类依据单一化、古籍文本主题自动分类难等问题。对此,本文结合古籍文本内容和文字特征,尝试从古籍内容分类得到符合研究者需求的主题,推动数字人文研究范式的转型。首先,参照东汉古籍《说文解字》对文字的分析方式,以前期标注的古籍语料数据集为基础,构建全新的“字音(说)-原文(文)-结构(解)-字形(字)”四维特征数据集。其次,设计四维特征向量提取模型(speaking,word,pattern,and font to vector,SWPF2vec),并结合预训练模型实现对古籍文本细粒度的特征表示。再其次,构建融合卷积神经网络、循环神经网络和多头注意力机制的古籍文本主题分类模型(dianji-recurrent convolutional neural networks for text classification,DJ-TextRCNN)。最后,融入四维语义特征,实现对古籍文本多维度、深层次、细粒度的语义挖掘。在古籍文本主题分类任务上,DJ-TextRCNN模型在不同维度特征下的主题分类准确率均为最优,在“说文解字”四维特征下达到76.23%的准确率,初步实现了对古籍文本的精准主题分类。展开更多
This experimental study investigated how text difficulty and different working memory capacity(WMC)affected Chinese EFL learners’reading comprehension and their tendency to engage in task-unrelated thoughts,that is,m...This experimental study investigated how text difficulty and different working memory capacity(WMC)affected Chinese EFL learners’reading comprehension and their tendency to engage in task-unrelated thoughts,that is,mind wandering(MW),in the course of reading.Sixty first-year university non-English majors participated in the study.A two-factor mixed experimental design of 2(text difficulty:difficult and simple)×2(WMC:high/large and low/small)was employed.Results revealed that 1)the main and interaction effects of WMC and text difficulty on voluntary MW were significant,whereas those on involuntary MW were not;2)while reading the easy texts,the involuntary MW of high-WMC individuals was less frequent than that of low-WMC ones,whereas while reading the difficult ones,the direct relationship between WMC and involuntary MW was not found;and that 3)high-WMC individuals had a lower overall rate of MW and better reading performance than low-WMC individuals did,but with increasing text difficulty,their rates of overall MW and voluntary MW were getting higher and higher,and the reading performance was getting lower and lower.These results lend support to WM theory and have pedagogical implications for the instruction of L2 reading.展开更多
The social functionality of places(e.g.school,restaurant)partly determines human behaviors and reflects a region’s functional configuration.Semantic descriptions of places are thus valuable to a range of studies of h...The social functionality of places(e.g.school,restaurant)partly determines human behaviors and reflects a region’s functional configuration.Semantic descriptions of places are thus valuable to a range of studies of humans and geographic spaces.Assuming their potential impacts on human verbalization behaviors,one possibility is to link the functions of places to verbal representations such as users’postings in location-based social networks(LBSNs).In this study,we examine whether the heterogeneous user-generated text snippets found in LBSNs reliably reflect the semantic concepts attached with check-in places.We investigate Foursquare because its available categorization hierarchy provides rich a-priori semantic knowledge about its check-in places,which enables a reliable verification of the semantic concepts identified fromuser-generated text snippets.A latent semantic analysis is conducted on a large Foursquare check-in dataset.The results confirm that attached text messages can represent semantic concepts by demonstrating their large correspondence to the official Foursquare venue categorization.To further elaborate the representativeness of text messages,this work also performs an investigation on the textual terms to quantify their abilities of representing semantic concepts(i.e.,representativeness),and another investigation on semantic concepts to quantify how well they can be represented by text messages(i.e.,representability).The results shed light on featured terms with strong locational characteristics,as well as on distinctive semantic concepts with potentially strong impacts on human verbalizations.展开更多
Sophisticated systems for the testing of fibres and advanced materials were showcased by key members of the British Textile Machinery Association(BTMA)at the forthcoming Techtextil 2024 exhibition which takes place in...Sophisticated systems for the testing of fibres and advanced materials were showcased by key members of the British Textile Machinery Association(BTMA)at the forthcoming Techtextil 2024 exhibition which takes place in Frankfurt,Germany,from April 23-26.展开更多
Data compression plays a key role in optimizing the use of memory storage space and also reducing latency in data transmission. In this paper, we are interested in lossless compression techniques because their perform...Data compression plays a key role in optimizing the use of memory storage space and also reducing latency in data transmission. In this paper, we are interested in lossless compression techniques because their performance is exploited with lossy compression techniques for images and videos generally using a mixed approach. To achieve our intended objective, which is to study the performance of lossless compression methods, we first carried out a literature review, a summary of which enabled us to select the most relevant, namely the following: arithmetic coding, LZW, Tunstall’s algorithm, RLE, BWT, Huffman coding and Shannon-Fano. Secondly, we designed a purposive text dataset with a repeating pattern in order to test the behavior and effectiveness of the selected compression techniques. Thirdly, we designed the compression algorithms and developed the programs (scripts) in Matlab in order to test their performance. Finally, following the tests conducted on relevant data that we constructed according to a deliberate model, the results show that these methods presented in order of performance are very satisfactory:- LZW- Arithmetic coding- Tunstall algorithm- BWT + RLELikewise, it appears that on the one hand, the performance of certain techniques relative to others is strongly linked to the sequencing and/or recurrence of symbols that make up the message, and on the other hand, to the cumulative time of encoding and decoding.展开更多
基金This work is supported in part by the National Natural Science Foundation of China(Grant Number 61971078)which provided domain expertise and computational power that greatly assisted the activity+1 种基金This work was financially supported by Chongqing Municipal Education Commission Grants forMajor Science and Technology Project(KJZD-M202301901)the Science and Technology Research Project of Jiangxi Department of Education(GJJ2201049).
文摘Text perception is crucial for understanding the semantics of outdoor scenes,making it a key requirement for building intelligent systems for driver assistance or autonomous driving.Text information in car-mounted videos can assist drivers in making decisions.However,Car-mounted video text images pose challenges such as complex backgrounds,small fonts,and the need for real-time detection.We proposed a robust Car-mounted Video Text Detector(CVTD).It is a lightweight text detection model based on ResNet18 for feature extraction,capable of detecting text in arbitrary shapes.Our model efficiently extracted global text positions through the Coordinate Attention Threshold Activation(CATA)and enhanced the representation capability through stacking two Feature Pyramid Enhancement Fusion Modules(FPEFM),strengthening feature representation,and integrating text local features and global position information,reinforcing the representation capability of the CVTD model.The enhanced feature maps,when acted upon by Text Activation Maps(TAM),effectively distinguished text foreground from non-text regions.Additionally,we collected and annotated a dataset containing 2200 images of Car-mounted Video Text(CVT)under various road conditions for training and evaluating our model’s performance.We further tested our model on four other challenging public natural scene text detection benchmark datasets,demonstrating its strong generalization ability and real-time detection speed.This model holds potential for practical applications in real-world scenarios.
文摘Large language models(LLMs),such as ChatGPT developed by OpenAI,represent a significant advancement in artificial intelligence(AI),designed to understand,generate,and interpret human language by analyzing extensive text data.Their potential integration into clinical settings offers a promising avenue that could transform clinical diagnosis and decision-making processes in the future(Thirunavukarasu et al.,2023).This article aims to provide an in-depth analysis of LLMs’current and potential impact on clinical practices.Their ability to generate differential diagnosis lists underscores their potential as invaluable tools in medical practice and education(Hirosawa et al.,2023;Koga et al.,2023).
基金Sponsored by the Scientific Research Project of Zhejiang Provincial Department of Education(Grant No.KYY-ZX-20210329).
文摘To remove handwritten texts from an image of a document taken by smart phone,an intelligent removal method was proposed that combines dewarping and Fully Convolutional Network with Atrous Convolutional and Atrous Spatial Pyramid Pooling(FCN-AC-ASPP).For a picture taken by a smart phone,firstly,the image is transformed into a regular image by the dewarping algorithm.Secondly,the FCN-AC-ASPP is used to classify printed texts and handwritten texts.Lastly,handwritten texts can be removed by a simple algorithm.Experiments show that the classification accuracy of the FCN-AC-ASPP is better than FCN,DeeplabV3+,FCN-AC.For handwritten texts removal effect,the method of combining dewarping and FCN-AC-ASPP is superior to FCN-AC-ASP alone.
基金the National Natural Science Foundation of PRChina(42075130)Nari Technology Co.,Ltd.(4561655965)。
文摘Scene text detection is an important task in computer vision.In this paper,we present YOLOv5 Scene Text(YOLOv5ST),an optimized architecture based on YOLOv5 v6.0 tailored for fast scene text detection.Our primary goal is to enhance inference speed without sacrificing significant detection accuracy,thereby enabling robust performance on resource-constrained devices like drones,closed-circuit television cameras,and other embedded systems.To achieve this,we propose key modifications to the network architecture to lighten the original backbone and improve feature aggregation,including replacing standard convolution with depth-wise convolution,adopting the C2 sequence module in place of C3,employing Spatial Pyramid Pooling Global(SPPG)instead of Spatial Pyramid Pooling Fast(SPPF)and integrating Bi-directional Feature Pyramid Network(BiFPN)into the neck.Experimental results demonstrate a remarkable 26%improvement in inference speed compared to the baseline,with only marginal reductions of 1.6%and 4.2%in mean average precision(mAP)at the intersection over union(IoU)thresholds of 0.5 and 0.5:0.95,respectively.Our work represents a significant advancement in scene text detection,striking a balance between speed and accuracy,making it well-suited for performance-constrained environments.
文摘Text classification,by automatically categorizing texts,is one of the foundational elements of natural language processing applications.This study investigates how text classification performance can be improved through the integration of entity-relation information obtained from the Wikidata(Wikipedia database)database and BERTbased pre-trained Named Entity Recognition(NER)models.Focusing on a significant challenge in the field of natural language processing(NLP),the research evaluates the potential of using entity and relational information to extract deeper meaning from texts.The adopted methodology encompasses a comprehensive approach that includes text preprocessing,entity detection,and the integration of relational information.Experiments conducted on text datasets in both Turkish and English assess the performance of various classification algorithms,such as Support Vector Machine,Logistic Regression,Deep Neural Network,and Convolutional Neural Network.The results indicate that the integration of entity-relation information can significantly enhance algorithmperformance in text classification tasks and offer new perspectives for information extraction and semantic analysis in NLP applications.Contributions of this work include the utilization of distant supervised entity-relation information in Turkish text classification,the development of a Turkish relational text classification approach,and the creation of a relational database.By demonstrating potential performance improvements through the integration of distant supervised entity-relation information into Turkish text classification,this research aims to support the effectiveness of text-based artificial intelligence(AI)tools.Additionally,it makes significant contributions to the development ofmultilingual text classification systems by adding deeper meaning to text content,thereby providing a valuable addition to current NLP studies and setting an important reference point for future research.
文摘To promote behavioral change among adolescents in Zambia, the National HIV/AIDS/STI/TB Council, in collaboration with UNICEF, developed the Zambia U-Report platform. This platform provides young people with improved access to information on various Sexual Reproductive Health topics through Short Messaging Service (SMS) messages. Over the years, the platform has accumulated millions of incoming and outgoing messages, which need to be categorized into key thematic areas for better tracking of sexual reproductive health knowledge gaps among young people. The current manual categorization process of these text messages is inefficient and time-consuming and this study aims to automate the process for improved analysis using text-mining techniques. Firstly, the study investigates the current text message categorization process and identifies a list of categories adopted by counselors over time which are then used to build and train a categorization model. Secondly, the study presents a proof of concept tool that automates the categorization of U-report messages into key thematic areas using the developed categorization model. Finally, it compares the performance and effectiveness of the developed proof of concept tool against the manual system. The study used a dataset comprising 206,625 text messages. The current process would take roughly 2.82 years to categorise this dataset whereas the trained SVM model would require only 6.4 minutes while achieving an accuracy of 70.4% demonstrating that the automated method is significantly faster, more scalable, and consistent when compared to the current manual categorization. These advantages make the SVM model a more efficient and effective tool for categorizing large unstructured text datasets. These results and the proof-of-concept tool developed demonstrate the potential for enhancing the efficiency and accuracy of message categorization on the Zambia U-report platform and other similar text messages-based platforms.
文摘传统编目分类和规则匹配方法存在工作效能低、过度依赖专家知识、缺乏对古籍文本自身语义的深层次挖掘、编目主题边界模糊、较难实现对古籍文本领域主题的精准推荐等问题。为此,本文结合古籍语料特征探究如何实现精准推荐符合研究者需求的文本主题内容的方法,以推动数字人文研究的进一步发展。首先,选取本课题组前期标注的古籍语料数据进行主题类别标注和视图分类;其次,构建融合BERT(bidirectional encoder representation from transformers)预训练模型、改进卷积神经网络、循环神经网络和多头注意力机制的语义挖掘模型;最后,融入“主体-关系-客体”多视图的语义增强模型,构建DJ-TextRCNN(DianJi-recurrent convolutional neural networks for text classification)模型实现对典籍文本更细粒度、更深层次、更多维度的语义挖掘。研究结果发现,DJ-TextRCNN模型在不同视图下的古籍主题推荐任务的准确率均为最优。在“主体-关系-客体”视图下,精确率达到88.54%,初步实现了对古籍文本的精准主题推荐,对中华文化深层次、细粒度的语义挖掘具有一定的指导意义。
文摘以编目分类和规则匹配为主的古籍文本主题分类方法存在工作效能低、专家知识依赖性强、分类依据单一化、古籍文本主题自动分类难等问题。对此,本文结合古籍文本内容和文字特征,尝试从古籍内容分类得到符合研究者需求的主题,推动数字人文研究范式的转型。首先,参照东汉古籍《说文解字》对文字的分析方式,以前期标注的古籍语料数据集为基础,构建全新的“字音(说)-原文(文)-结构(解)-字形(字)”四维特征数据集。其次,设计四维特征向量提取模型(speaking,word,pattern,and font to vector,SWPF2vec),并结合预训练模型实现对古籍文本细粒度的特征表示。再其次,构建融合卷积神经网络、循环神经网络和多头注意力机制的古籍文本主题分类模型(dianji-recurrent convolutional neural networks for text classification,DJ-TextRCNN)。最后,融入四维语义特征,实现对古籍文本多维度、深层次、细粒度的语义挖掘。在古籍文本主题分类任务上,DJ-TextRCNN模型在不同维度特征下的主题分类准确率均为最优,在“说文解字”四维特征下达到76.23%的准确率,初步实现了对古籍文本的精准主题分类。
文摘This experimental study investigated how text difficulty and different working memory capacity(WMC)affected Chinese EFL learners’reading comprehension and their tendency to engage in task-unrelated thoughts,that is,mind wandering(MW),in the course of reading.Sixty first-year university non-English majors participated in the study.A two-factor mixed experimental design of 2(text difficulty:difficult and simple)×2(WMC:high/large and low/small)was employed.Results revealed that 1)the main and interaction effects of WMC and text difficulty on voluntary MW were significant,whereas those on involuntary MW were not;2)while reading the easy texts,the involuntary MW of high-WMC individuals was less frequent than that of low-WMC ones,whereas while reading the difficult ones,the direct relationship between WMC and involuntary MW was not found;and that 3)high-WMC individuals had a lower overall rate of MW and better reading performance than low-WMC individuals did,but with increasing text difficulty,their rates of overall MW and voluntary MW were getting higher and higher,and the reading performance was getting lower and lower.These results lend support to WM theory and have pedagogical implications for the instruction of L2 reading.
基金supported by the German Research Foundation(DFG)through the priority program“Volunteered Geographic Information:Interpretation,Visualisation and Social Computing”(SPP 1894).
文摘The social functionality of places(e.g.school,restaurant)partly determines human behaviors and reflects a region’s functional configuration.Semantic descriptions of places are thus valuable to a range of studies of humans and geographic spaces.Assuming their potential impacts on human verbalization behaviors,one possibility is to link the functions of places to verbal representations such as users’postings in location-based social networks(LBSNs).In this study,we examine whether the heterogeneous user-generated text snippets found in LBSNs reliably reflect the semantic concepts attached with check-in places.We investigate Foursquare because its available categorization hierarchy provides rich a-priori semantic knowledge about its check-in places,which enables a reliable verification of the semantic concepts identified fromuser-generated text snippets.A latent semantic analysis is conducted on a large Foursquare check-in dataset.The results confirm that attached text messages can represent semantic concepts by demonstrating their large correspondence to the official Foursquare venue categorization.To further elaborate the representativeness of text messages,this work also performs an investigation on the textual terms to quantify their abilities of representing semantic concepts(i.e.,representativeness),and another investigation on semantic concepts to quantify how well they can be represented by text messages(i.e.,representability).The results shed light on featured terms with strong locational characteristics,as well as on distinctive semantic concepts with potentially strong impacts on human verbalizations.
文摘Sophisticated systems for the testing of fibres and advanced materials were showcased by key members of the British Textile Machinery Association(BTMA)at the forthcoming Techtextil 2024 exhibition which takes place in Frankfurt,Germany,from April 23-26.
文摘Data compression plays a key role in optimizing the use of memory storage space and also reducing latency in data transmission. In this paper, we are interested in lossless compression techniques because their performance is exploited with lossy compression techniques for images and videos generally using a mixed approach. To achieve our intended objective, which is to study the performance of lossless compression methods, we first carried out a literature review, a summary of which enabled us to select the most relevant, namely the following: arithmetic coding, LZW, Tunstall’s algorithm, RLE, BWT, Huffman coding and Shannon-Fano. Secondly, we designed a purposive text dataset with a repeating pattern in order to test the behavior and effectiveness of the selected compression techniques. Thirdly, we designed the compression algorithms and developed the programs (scripts) in Matlab in order to test their performance. Finally, following the tests conducted on relevant data that we constructed according to a deliberate model, the results show that these methods presented in order of performance are very satisfactory:- LZW- Arithmetic coding- Tunstall algorithm- BWT + RLELikewise, it appears that on the one hand, the performance of certain techniques relative to others is strongly linked to the sequencing and/or recurrence of symbols that make up the message, and on the other hand, to the cumulative time of encoding and decoding.