A visual sensing system was developed. The system is suitable for titanium-alloy electron-beam welding, and senses and detects molten-pool dynamic processes. A suite of processing programs for colored molten-pool imag...A visual sensing system was developed. The system is suitable for titanium-alloy electron-beam welding, and senses and detects molten-pool dynamic processes. A suite of processing programs for colored molten-pool images in titanium-alloy electron-beam welding was developed using Matlab software; molten-pool edge images are completely obtained using the program. The Matlab software was used to write a program which could extract the molten-pool width. The functional relationship between the molten-pool width and penetration under the experimental conditions was obtained by a curve-fitting method, and provided the theoretical basis for further penetration control.展开更多
The design process of the built environment relies on the collaborative effort of all parties involved in the project.During the design phase,owners,end users,and their representatives are expected to make the most cr...The design process of the built environment relies on the collaborative effort of all parties involved in the project.During the design phase,owners,end users,and their representatives are expected to make the most critical design and budgetary decisions-shaping the essential traits of the project,hence emerge the need and necessity to create and integrate mechanisms to support the decision-making process.Design decisions should not be based on assumptions,past experiences,or imagination.An example of the numerous problems that are a result of uninformed design decisions is“change orders”,known as the deviation from the original scope of work,which leads to an increase of the overall cost,and changes to the construction schedule of the project.The long-term aim of this inquiry is to understand the user’s behavior,and establish evidence-based control measures,which are actions and processes that can be implemented in practice to decrease the volume and frequency of the occurrence of change orders.The current study developed a foundation for further examination by proposing potential control measures,and testing their efficiency,such as integrating Virtual Reality(VR).The specific aim was to examine the effect of different visualization methods(i.e.,VR vs.construction drawings)on,(1)how well the subjects understand the information presented about the future/planned environment;(2)the subjects’perceived confidence in what the future environment will look like;(3)the likelihood of changing the built environment;(4)design review time;and(5)accuracy in reviewing and understanding the design.展开更多
Visual data mining is one of important approach of data mining techniques. Most of them are based on computer graphic techniques but few of them exploit image-processing techniques. This paper proposes an image proces...Visual data mining is one of important approach of data mining techniques. Most of them are based on computer graphic techniques but few of them exploit image-processing techniques. This paper proposes an image processing method, named RNAM (resemble neighborhood averaging method), to facilitate visual data mining, which is used to post-process the data mining result-image and help users to discover significant features and useful patterns effectively. The experiments show that the method is intuitive, easily-understanding and effectiveness. It provides a new approach for visual data mining.展开更多
This paper seeks a synthesis of Bayesian and geostatistical approaches to combining categorical data in the context of remote sensing classification. By experiment with aerial photographs and Landsat TM data, accuracy...This paper seeks a synthesis of Bayesian and geostatistical approaches to combining categorical data in the context of remote sensing classification. By experiment with aerial photographs and Landsat TM data, accuracy of spectral, spatial, and combined classification results was evaluated. It was confirmed that the incorporation of spatial information in spectral classification increases accuracy significantly. Secondly, through test with a 5-class and a 3-class classification schemes, it was revealed that setting a proper semantic framework for classification is fundamental to any endeavors of categorical mapping and the most important factor affecting accuracy. Lastly, this paper promotes non-parametric methods for both definition of class membership profiling based on band-specific histograms of image intensities and derivation of spatial probability via indicator kriging, a non-parametric geostatistical technique.展开更多
The rapid evolution of wireless communication technologies has underscored the critical role of antennas in ensuring seamless connectivity.Antenna defects,ranging from manufacturing imperfections to environmental wear...The rapid evolution of wireless communication technologies has underscored the critical role of antennas in ensuring seamless connectivity.Antenna defects,ranging from manufacturing imperfections to environmental wear,pose significant challenges to the reliability and performance of communication systems.This review paper navigates the landscape of antenna defect detection,emphasizing the need for a nuanced understanding of various defect types and the associated challenges in visual detection.This review paper serves as a valuable resource for researchers,engineers,and practitioners engaged in the design and maintenance of communication systems.The insights presented here pave the way for enhanced reliability in antenna systems through targeted defect detection measures.In this study,a comprehensive literature analysis on computer vision algorithms that are employed in end-of-line visual inspection of antenna parts is presented.The PRISMA principles will be followed throughout the review,and its goals are to provide a summary of recent research,identify relevant computer vision techniques,and evaluate how effective these techniques are in discovering defects during inspections.It contains articles from scholarly journals as well as papers presented at conferences up until June 2023.This research utilized search phrases that were relevant,and papers were chosen based on whether or not they met certain inclusion and exclusion criteria.In this study,several different computer vision approaches,such as feature extraction and defect classification,are broken down and analyzed.Additionally,their applicability and performance are discussed.The review highlights the significance of utilizing a wide variety of datasets and measurement criteria.The findings of this study add to the existing body of knowledge and point researchers in the direction of promising new areas of investigation,such as real-time inspection systems and multispectral imaging.This review,on its whole,offers a complete study of computer vision approaches for quality control in antenna parts.It does so by providing helpful insights and drawing attention to areas that require additional exploration.展开更多
In times of digitalisation, visual assistance systems in assembly are increasingly important. The design of these assembly systems needs to be highly complex to meet the requirements. Due to the increasing number of v...In times of digitalisation, visual assistance systems in assembly are increasingly important. The design of these assembly systems needs to be highly complex to meet the requirements. Due to the increasing number of variants in production processes, as well as shorter innovation and product life cycles, assistance systems should improve quality and reduce complexity of assembly processes. However, many large kitchen manufacturers still assemble kitchen cabinets manually, due to the high variety of components, such as rails and fittings. This paper focuses on the analysis and evaluation of virtual assistance systems to improve quality and usability in individualised kitchen cabinet assembly processes at a large German manufacturer. A solution is identified and detailed.展开更多
3D sensing represents the main channel through which humans,or robotics agents,understand and interact with each other and with the real world.As such,many 3D acquisition technologies and devices have been developed a...3D sensing represents the main channel through which humans,or robotics agents,understand and interact with each other and with the real world.As such,many 3D acquisition technologies and devices have been developed and applied in emerging applications,such as autonomous systems,augmented reality and digital production.A typical 3D visual system takes RGB and/or range images of an object or scene and generates 3D geometry.展开更多
Visual information processing is not only an important research direction in fields of psychology,neuroscience and artificial intelligence etc,but also the research base on biological recognition theory and technology...Visual information processing is not only an important research direction in fields of psychology,neuroscience and artificial intelligence etc,but also the research base on biological recognition theory and technology realization.Visual information processing in existence,e.g.visual information processing facing to nerve calculation,visual information processing using substance shape distilling and wavelet under high yawp,ANN visual information processing and etc,are very complex in comparison.Using qualitative Mapping,this text describes the specific attributes in the course of visual information processing and the results are more brief and straightforward.So the software program of vision recognition is probably easier to realize.展开更多
An experimental setup of acquiring the coaxial visual image of the molten pool and keyhole in high power Nd:YAG laser welding is introduced in this paper. It is one of the most difficult problems in acquiring coaxial ...An experimental setup of acquiring the coaxial visual image of the molten pool and keyhole in high power Nd:YAG laser welding is introduced in this paper. It is one of the most difficult problems in acquiring coaxial image that the coaxial imaging signal of molten pool and keyhole must be separated from the laser beam with high power. This problem was resolved by designing a dichroitic spectroscope. The characteristics of imaging signal were analyzed and the coaxial image of molten pool and keyhole was acquired. A smoothing filter and a homomorphic filter were designed to remove the low frequency noise and to enhance the image according to the characteristics of imaging signal. At last, edges of molten pool and keyhole were detected and extracted based on image segmentation with threshold.展开更多
The schlieren interferograms used to be analyzed in a qualitative way. In this paper, by means of the powerful computational ability and the large memory of computer; the image processing method is investigated for th...The schlieren interferograms used to be analyzed in a qualitative way. In this paper, by means of the powerful computational ability and the large memory of computer; the image processing method is investigated for the digitalization of an axisymmetric schlieren interferogram and the determination of the density field. This method includes the 2-D low-pass filtering, the thinning of interferometric fringes, the extraction of physical information and the numerical integration of the density field. The image processing results show that the accuracy of the quantitative analysis of the schlieren interferogram can be improved and a lot of time can be saved in dealing with optical experimental results. Therefore, the algorithm used here is useful and efficient.展开更多
In order to design a kind of heat exchanger suitable to the indirect-touched gas hydrate cool storage vessel, a visual observation of HCFC141b gas hydrate formation/decomposition process was presented through a self-d...In order to design a kind of heat exchanger suitable to the indirect-touched gas hydrate cool storage vessel, a visual observation of HCFC141b gas hydrate formation/decomposition process was presented through a self-designed small-scale visualization apparatus of gas hydrate cool storage. Based on the shooted photos and recorded temperatures, the formation/decomposition process of HCFC141b are described, some characteristics are concluded, and some suggestions of designing heat exchanger are indicated according to the specific characteristics of HCFC141b gas hydrate formation/decomposition process.展开更多
High-moisture extrusion technology should be considered one of the best choices for producing plant-based meat substitutes with the rich fibrous structure offered by real animal meat products.Unfortunately,the extrusi...High-moisture extrusion technology should be considered one of the best choices for producing plant-based meat substitutes with the rich fibrous structure offered by real animal meat products.Unfortunately,the extrusion process has been seen as a“black box”with limited information about what occurs inside,causing serious obstacles in developing meat substitutes.This study designed a high-moisture extrusion process and developed 10 new plant-based meat substitutes comparable to the fibrous structure of real animal meat.The study used the Feature-Augmented Principal Component Analysis(FA-PCA)method to visualize and understand the whole extrusion process in three ways systematically and accurately.It established six sets of mathematical models of the high-moisture extrusion process based on 8000 pieces of data,including five types of parameters.The FA-PCA method improved the R^(2) values significantly compared with the PCA method.The Way 3 was the best to predict product quality(Z),demonstrating that the gradually molecular conformational changes(Y^(n'))were critical in controlling the final quality of the plant-based meat substitutes.Moreover,the first visualization platform software for the high-moisture extrusion process has been established to clearly show the“black box”by combining the virtual simulation technology.Through the software,some practice work such as equipment installation,parameter adjustment,equipment disassembly,and data prediction can be easily achieved.展开更多
This thesis focuses on the analysis of Harold Pinter’s use of verbal and visual means in The Caretaker. The verbal means are: the ambiguity of the meaning of language; the evasiveness of language; silence. The functi...This thesis focuses on the analysis of Harold Pinter’s use of verbal and visual means in The Caretaker. The verbal means are: the ambiguity of the meaning of language; the evasiveness of language; silence. The functions of visual means are: end-connection; theme-revelation; atmosphere-creation. A meticulous study of Pinter’s dramatic technique can enhance the understanding of the theme of the play.展开更多
Background To solve the problem of visualization in augmented reality(AR),for assembly process information,we report here on our study into the composition of AR assembly process information.Methods Our work led us to...Background To solve the problem of visualization in augmented reality(AR),for assembly process information,we report here on our study into the composition of AR assembly process information.Methods Our work led us to classify the visual elements of assembly processes into six categories,and after looking further into visual element expression characteristics used in assembly process information in the AR environment,standard assembly process elements have been identified and visual element layout principles studied.Conclusion Typical visualization elements have been presented,using an AR-based assembly instruction system.展开更多
The existing dataset for visual dialog comprises multiple rounds of questions and a diverse range of image contents.However,it faces challenges in overcoming visual semantic limitations,particularly in obtaining suffi...The existing dataset for visual dialog comprises multiple rounds of questions and a diverse range of image contents.However,it faces challenges in overcoming visual semantic limitations,particularly in obtaining sufficient context from visual and textual aspects of images.This paper proposes a new visual dialog dataset called Diverse History-Dialog(DS-Dialog)to address the visual semantic limitations faced by the existing dataset.DS-Dialog groups relevant histories based on their respective Microsoft Common Objects in Context(MSCOCO)image categories and consolidates them for each image.Specifically,each MSCOCO image category consists of top relevant histories extracted based on their semantic relationships between the original image caption and historical context.These relevant histories are consolidated for each image,and DS-Dialog enhances the current dataset by adding new context-aware relevant history to provide more visual semantic context for each image.The new dataset is generated through several stages,including image semantic feature extraction,keyphrase extraction,relevant question extraction,and relevant history dialog generation.The DS-Dialog dataset contains about 2.6 million question-answer pairs,where 1.3 million pairs correspond to existing VisDial’s question-answer pairs,and the remaining 1.3 million pairs include a maximum of 5 image features for each VisDial image,with each image comprising 10-round relevant question-answer pairs.Moreover,a novel adaptive relevant history selection is proposed to resolve missing visual semantic information for each image.DS-Dialog is used to benchmark the performance of previous visual dialog models and achieves better performance than previous models.Specifically,the proposed DSDialog model achieves an 8% higher mean reciprocal rank(MRR),11% higher R@1%,6% higher R@5%,5% higher R@10%,and 8% higher normalized discounted cumulative gain(NDCG)compared to LF.DS-Dialog also achieves approximately 1 point improvement on R@k,mean,MRR,and NDCG compared to the original RVA,and 2 points improvement compared to LF andDualVD.These results demonstrates the importance of the relevant semantic historical context in enhancing the visual semantic relationship between textual and visual representations of the images and questions.展开更多
Audio‐visual wake word spotting is a challenging multi‐modal task that exploits visual information of lip motion patterns to supplement acoustic speech to improve overall detection performance.However,most audio‐vi...Audio‐visual wake word spotting is a challenging multi‐modal task that exploits visual information of lip motion patterns to supplement acoustic speech to improve overall detection performance.However,most audio‐visual wake word spotting models are only suitable for simple single‐speaker scenarios and require high computational complexity.Further development is hindered by complex multi‐person scenarios and computational limitations in mobile environments.In this paper,a novel audio‐visual model is proposed for on‐device multi‐person wake word spotting.Firstly,an attention‐based audio‐visual voice activity detection module is presented,which generates an attention score matrix of audio and visual representations to derive active speaker representation.Secondly,the knowledge distillation method is introduced to transfer knowledge from the large model to the on‐device model to control the size of our model.Moreover,a new audio‐visual dataset,PKU‐KWS,is collected for sentence‐level multi‐person wake word spotting.Experimental results on the PKU‐KWS dataset show that this approach outperforms the previous state‐of‐the‐art methods.展开更多
Visual question answering(VQA)has attracted more and more attention in computer vision and natural language processing.Scholars are committed to studying how to better integrate image features and text features to ach...Visual question answering(VQA)has attracted more and more attention in computer vision and natural language processing.Scholars are committed to studying how to better integrate image features and text features to achieve better results in VQA tasks.Analysis of all features may cause information redundancy and heavy computational burden.Attention mechanism is a wise way to solve this problem.However,using single attention mechanism may cause incomplete concern of features.This paper improves the attention mechanism method and proposes a hybrid attention mechanism that combines the spatial attention mechanism method and the channel attention mechanism method.In the case that the attention mechanism will cause the loss of the original features,a small portion of image features were added as compensation.For the attention mechanism of text features,a selfattention mechanism was introduced,and the internal structural features of sentences were strengthened to improve the overall model.The results show that attention mechanism and feature compensation add 6.1%accuracy to multimodal low-rank bilinear pooling network.展开更多
The original intention of visual question answering(VQA)models is to infer the answer based on the relevant information of the question text in the visual image,but many VQA models often yield answers that are biased ...The original intention of visual question answering(VQA)models is to infer the answer based on the relevant information of the question text in the visual image,but many VQA models often yield answers that are biased by some prior knowledge,especially the language priors.This paper proposes a mitigation model called language priors mitigation-VQA(LPM-VQA)for the language priors problem in VQA model,which divides language priors into positive and negative language priors.Different network branches are used to capture and process the different priors to achieve the purpose of mitigating language priors.A dynamically-changing language prior feedback objective function is designed with the intermediate results of some modules in the VQA model.The weight of the loss value for each answer is dynamically set according to the strength of its language priors to balance its proportion in the total VQA loss to further mitigate the language priors.This model does not depend on the baseline VQA architectures and can be configured like a plug-in to improve the performance of the model over most existing VQA models.The experimental results show that the proposed model is general and effective,achieving state-of-the-art accuracy in the VQA-CP v2 dataset.展开更多
文摘A visual sensing system was developed. The system is suitable for titanium-alloy electron-beam welding, and senses and detects molten-pool dynamic processes. A suite of processing programs for colored molten-pool images in titanium-alloy electron-beam welding was developed using Matlab software; molten-pool edge images are completely obtained using the program. The Matlab software was used to write a program which could extract the molten-pool width. The functional relationship between the molten-pool width and penetration under the experimental conditions was obtained by a curve-fitting method, and provided the theoretical basis for further penetration control.
文摘The design process of the built environment relies on the collaborative effort of all parties involved in the project.During the design phase,owners,end users,and their representatives are expected to make the most critical design and budgetary decisions-shaping the essential traits of the project,hence emerge the need and necessity to create and integrate mechanisms to support the decision-making process.Design decisions should not be based on assumptions,past experiences,or imagination.An example of the numerous problems that are a result of uninformed design decisions is“change orders”,known as the deviation from the original scope of work,which leads to an increase of the overall cost,and changes to the construction schedule of the project.The long-term aim of this inquiry is to understand the user’s behavior,and establish evidence-based control measures,which are actions and processes that can be implemented in practice to decrease the volume and frequency of the occurrence of change orders.The current study developed a foundation for further examination by proposing potential control measures,and testing their efficiency,such as integrating Virtual Reality(VR).The specific aim was to examine the effect of different visualization methods(i.e.,VR vs.construction drawings)on,(1)how well the subjects understand the information presented about the future/planned environment;(2)the subjects’perceived confidence in what the future environment will look like;(3)the likelihood of changing the built environment;(4)design review time;and(5)accuracy in reviewing and understanding the design.
基金Supported by the National Natural Science Foun-dation of China (60173051) ,the Teaching and Research Award Pro-gramfor Outstanding Young Teachers in Higher Education Institu-tions of Ministry of Education of China ,and Liaoning Province HigherEducation Research Foundation (20040206)
文摘Visual data mining is one of important approach of data mining techniques. Most of them are based on computer graphic techniques but few of them exploit image-processing techniques. This paper proposes an image processing method, named RNAM (resemble neighborhood averaging method), to facilitate visual data mining, which is used to post-process the data mining result-image and help users to discover significant features and useful patterns effectively. The experiments show that the method is intuitive, easily-understanding and effectiveness. It provides a new approach for visual data mining.
文摘This paper seeks a synthesis of Bayesian and geostatistical approaches to combining categorical data in the context of remote sensing classification. By experiment with aerial photographs and Landsat TM data, accuracy of spectral, spatial, and combined classification results was evaluated. It was confirmed that the incorporation of spatial information in spectral classification increases accuracy significantly. Secondly, through test with a 5-class and a 3-class classification schemes, it was revealed that setting a proper semantic framework for classification is fundamental to any endeavors of categorical mapping and the most important factor affecting accuracy. Lastly, this paper promotes non-parametric methods for both definition of class membership profiling based on band-specific histograms of image intensities and derivation of spatial probability via indicator kriging, a non-parametric geostatistical technique.
文摘The rapid evolution of wireless communication technologies has underscored the critical role of antennas in ensuring seamless connectivity.Antenna defects,ranging from manufacturing imperfections to environmental wear,pose significant challenges to the reliability and performance of communication systems.This review paper navigates the landscape of antenna defect detection,emphasizing the need for a nuanced understanding of various defect types and the associated challenges in visual detection.This review paper serves as a valuable resource for researchers,engineers,and practitioners engaged in the design and maintenance of communication systems.The insights presented here pave the way for enhanced reliability in antenna systems through targeted defect detection measures.In this study,a comprehensive literature analysis on computer vision algorithms that are employed in end-of-line visual inspection of antenna parts is presented.The PRISMA principles will be followed throughout the review,and its goals are to provide a summary of recent research,identify relevant computer vision techniques,and evaluate how effective these techniques are in discovering defects during inspections.It contains articles from scholarly journals as well as papers presented at conferences up until June 2023.This research utilized search phrases that were relevant,and papers were chosen based on whether or not they met certain inclusion and exclusion criteria.In this study,several different computer vision approaches,such as feature extraction and defect classification,are broken down and analyzed.Additionally,their applicability and performance are discussed.The review highlights the significance of utilizing a wide variety of datasets and measurement criteria.The findings of this study add to the existing body of knowledge and point researchers in the direction of promising new areas of investigation,such as real-time inspection systems and multispectral imaging.This review,on its whole,offers a complete study of computer vision approaches for quality control in antenna parts.It does so by providing helpful insights and drawing attention to areas that require additional exploration.
文摘In times of digitalisation, visual assistance systems in assembly are increasingly important. The design of these assembly systems needs to be highly complex to meet the requirements. Due to the increasing number of variants in production processes, as well as shorter innovation and product life cycles, assistance systems should improve quality and reduce complexity of assembly processes. However, many large kitchen manufacturers still assemble kitchen cabinets manually, due to the high variety of components, such as rails and fittings. This paper focuses on the analysis and evaluation of virtual assistance systems to improve quality and usability in individualised kitchen cabinet assembly processes at a large German manufacturer. A solution is identified and detailed.
文摘3D sensing represents the main channel through which humans,or robotics agents,understand and interact with each other and with the real world.As such,many 3D acquisition technologies and devices have been developed and applied in emerging applications,such as autonomous systems,augmented reality and digital production.A typical 3D visual system takes RGB and/or range images of an object or scene and generates 3D geometry.
文摘Visual information processing is not only an important research direction in fields of psychology,neuroscience and artificial intelligence etc,but also the research base on biological recognition theory and technology realization.Visual information processing in existence,e.g.visual information processing facing to nerve calculation,visual information processing using substance shape distilling and wavelet under high yawp,ANN visual information processing and etc,are very complex in comparison.Using qualitative Mapping,this text describes the specific attributes in the course of visual information processing and the results are more brief and straightforward.So the software program of vision recognition is probably easier to realize.
文摘An experimental setup of acquiring the coaxial visual image of the molten pool and keyhole in high power Nd:YAG laser welding is introduced in this paper. It is one of the most difficult problems in acquiring coaxial image that the coaxial imaging signal of molten pool and keyhole must be separated from the laser beam with high power. This problem was resolved by designing a dichroitic spectroscope. The characteristics of imaging signal were analyzed and the coaxial image of molten pool and keyhole was acquired. A smoothing filter and a homomorphic filter were designed to remove the low frequency noise and to enhance the image according to the characteristics of imaging signal. At last, edges of molten pool and keyhole were detected and extracted based on image segmentation with threshold.
文摘The schlieren interferograms used to be analyzed in a qualitative way. In this paper, by means of the powerful computational ability and the large memory of computer; the image processing method is investigated for the digitalization of an axisymmetric schlieren interferogram and the determination of the density field. This method includes the 2-D low-pass filtering, the thinning of interferometric fringes, the extraction of physical information and the numerical integration of the density field. The image processing results show that the accuracy of the quantitative analysis of the schlieren interferogram can be improved and a lot of time can be saved in dealing with optical experimental results. Therefore, the algorithm used here is useful and efficient.
基金supported by the National Natural Science Foundation of China (No. 50176051, No. 59836230)the Satate Key Development Program for Basic Research of China (No. 2000026306).
文摘In order to design a kind of heat exchanger suitable to the indirect-touched gas hydrate cool storage vessel, a visual observation of HCFC141b gas hydrate formation/decomposition process was presented through a self-designed small-scale visualization apparatus of gas hydrate cool storage. Based on the shooted photos and recorded temperatures, the formation/decomposition process of HCFC141b are described, some characteristics are concluded, and some suggestions of designing heat exchanger are indicated according to the specific characteristics of HCFC141b gas hydrate formation/decomposition process.
基金supported by the National Natural Science Foundation of China(31901608)the National Key Research and Development Plan of China(2021YFC2101402)the Science and Technology Innovation Project of Chinese Academy of Agricultural Sciences(CAAS-ASTIP-2022-IFST)。
文摘High-moisture extrusion technology should be considered one of the best choices for producing plant-based meat substitutes with the rich fibrous structure offered by real animal meat products.Unfortunately,the extrusion process has been seen as a“black box”with limited information about what occurs inside,causing serious obstacles in developing meat substitutes.This study designed a high-moisture extrusion process and developed 10 new plant-based meat substitutes comparable to the fibrous structure of real animal meat.The study used the Feature-Augmented Principal Component Analysis(FA-PCA)method to visualize and understand the whole extrusion process in three ways systematically and accurately.It established six sets of mathematical models of the high-moisture extrusion process based on 8000 pieces of data,including five types of parameters.The FA-PCA method improved the R^(2) values significantly compared with the PCA method.The Way 3 was the best to predict product quality(Z),demonstrating that the gradually molecular conformational changes(Y^(n'))were critical in controlling the final quality of the plant-based meat substitutes.Moreover,the first visualization platform software for the high-moisture extrusion process has been established to clearly show the“black box”by combining the virtual simulation technology.Through the software,some practice work such as equipment installation,parameter adjustment,equipment disassembly,and data prediction can be easily achieved.
文摘This thesis focuses on the analysis of Harold Pinter’s use of verbal and visual means in The Caretaker. The verbal means are: the ambiguity of the meaning of language; the evasiveness of language; silence. The functions of visual means are: end-connection; theme-revelation; atmosphere-creation. A meticulous study of Pinter’s dramatic technique can enhance the understanding of the theme of the play.
基金Industrial Technology Development Program(JCKY2016204A502).
文摘Background To solve the problem of visualization in augmented reality(AR),for assembly process information,we report here on our study into the composition of AR assembly process information.Methods Our work led us to classify the visual elements of assembly processes into six categories,and after looking further into visual element expression characteristics used in assembly process information in the AR environment,standard assembly process elements have been identified and visual element layout principles studied.Conclusion Typical visualization elements have been presented,using an AR-based assembly instruction system.
文摘The existing dataset for visual dialog comprises multiple rounds of questions and a diverse range of image contents.However,it faces challenges in overcoming visual semantic limitations,particularly in obtaining sufficient context from visual and textual aspects of images.This paper proposes a new visual dialog dataset called Diverse History-Dialog(DS-Dialog)to address the visual semantic limitations faced by the existing dataset.DS-Dialog groups relevant histories based on their respective Microsoft Common Objects in Context(MSCOCO)image categories and consolidates them for each image.Specifically,each MSCOCO image category consists of top relevant histories extracted based on their semantic relationships between the original image caption and historical context.These relevant histories are consolidated for each image,and DS-Dialog enhances the current dataset by adding new context-aware relevant history to provide more visual semantic context for each image.The new dataset is generated through several stages,including image semantic feature extraction,keyphrase extraction,relevant question extraction,and relevant history dialog generation.The DS-Dialog dataset contains about 2.6 million question-answer pairs,where 1.3 million pairs correspond to existing VisDial’s question-answer pairs,and the remaining 1.3 million pairs include a maximum of 5 image features for each VisDial image,with each image comprising 10-round relevant question-answer pairs.Moreover,a novel adaptive relevant history selection is proposed to resolve missing visual semantic information for each image.DS-Dialog is used to benchmark the performance of previous visual dialog models and achieves better performance than previous models.Specifically,the proposed DSDialog model achieves an 8% higher mean reciprocal rank(MRR),11% higher R@1%,6% higher R@5%,5% higher R@10%,and 8% higher normalized discounted cumulative gain(NDCG)compared to LF.DS-Dialog also achieves approximately 1 point improvement on R@k,mean,MRR,and NDCG compared to the original RVA,and 2 points improvement compared to LF andDualVD.These results demonstrates the importance of the relevant semantic historical context in enhancing the visual semantic relationship between textual and visual representations of the images and questions.
基金supported by the National Key R&D Program of China(No.2020AAA0108904)the Science and Technology Plan of Shenzhen(No.JCYJ20200109140410340).
文摘Audio‐visual wake word spotting is a challenging multi‐modal task that exploits visual information of lip motion patterns to supplement acoustic speech to improve overall detection performance.However,most audio‐visual wake word spotting models are only suitable for simple single‐speaker scenarios and require high computational complexity.Further development is hindered by complex multi‐person scenarios and computational limitations in mobile environments.In this paper,a novel audio‐visual model is proposed for on‐device multi‐person wake word spotting.Firstly,an attention‐based audio‐visual voice activity detection module is presented,which generates an attention score matrix of audio and visual representations to derive active speaker representation.Secondly,the knowledge distillation method is introduced to transfer knowledge from the large model to the on‐device model to control the size of our model.Moreover,a new audio‐visual dataset,PKU‐KWS,is collected for sentence‐level multi‐person wake word spotting.Experimental results on the PKU‐KWS dataset show that this approach outperforms the previous state‐of‐the‐art methods.
基金This work was supported by the Sichuan Science and Technology Program(2021YFQ0003).
文摘Visual question answering(VQA)has attracted more and more attention in computer vision and natural language processing.Scholars are committed to studying how to better integrate image features and text features to achieve better results in VQA tasks.Analysis of all features may cause information redundancy and heavy computational burden.Attention mechanism is a wise way to solve this problem.However,using single attention mechanism may cause incomplete concern of features.This paper improves the attention mechanism method and proposes a hybrid attention mechanism that combines the spatial attention mechanism method and the channel attention mechanism method.In the case that the attention mechanism will cause the loss of the original features,a small portion of image features were added as compensation.For the attention mechanism of text features,a selfattention mechanism was introduced,and the internal structural features of sentences were strengthened to improve the overall model.The results show that attention mechanism and feature compensation add 6.1%accuracy to multimodal low-rank bilinear pooling network.
文摘The original intention of visual question answering(VQA)models is to infer the answer based on the relevant information of the question text in the visual image,but many VQA models often yield answers that are biased by some prior knowledge,especially the language priors.This paper proposes a mitigation model called language priors mitigation-VQA(LPM-VQA)for the language priors problem in VQA model,which divides language priors into positive and negative language priors.Different network branches are used to capture and process the different priors to achieve the purpose of mitigating language priors.A dynamically-changing language prior feedback objective function is designed with the intermediate results of some modules in the VQA model.The weight of the loss value for each answer is dynamically set according to the strength of its language priors to balance its proportion in the total VQA loss to further mitigate the language priors.This model does not depend on the baseline VQA architectures and can be configured like a plug-in to improve the performance of the model over most existing VQA models.The experimental results show that the proposed model is general and effective,achieving state-of-the-art accuracy in the VQA-CP v2 dataset.