Visual Question Answering(VQA)has sparked widespread interest as a crucial task in integrating vision and language.VQA primarily uses attention mechanisms to effectively answer questions to associate relevant visual r...Visual Question Answering(VQA)has sparked widespread interest as a crucial task in integrating vision and language.VQA primarily uses attention mechanisms to effectively answer questions to associate relevant visual regions with input questions.The detection-based features extracted by the object detection network aim to acquire the visual attention distribution on a predetermined detection frame and provide object-level insights to answer questions about foreground objects more effectively.However,it cannot answer the question about the background forms without detection boxes due to the lack of fine-grained details,which is the advantage of grid-based features.In this paper,we propose a Dual-Level Feature Embedding(DLFE)network,which effectively integrates grid-based and detection-based image features in a unified architecture to realize the complementary advantages of both features.Specifically,in DLFE,In DLFE,firstly,a novel Dual-Level Self-Attention(DLSA)modular is proposed to mine the intrinsic properties of the two features,where Positional Relation Attention(PRA)is designed to model the position information.Then,we propose a Feature Fusion Attention(FFA)to address the semantic noise caused by the fusion of two features and construct an alignment graph to enhance and align the grid and detection features.Finally,we use co-attention to learn the interactive features of the image and question and answer questions more accurately.Our method has significantly improved compared to the baseline,increasing accuracy from 66.01%to 70.63%on the test-std dataset of VQA 1.0 and from 66.24%to 70.91%for the test-std dataset of VQA 2.0.展开更多
Laser cleaning is a highly nonlinear physical process for solving poor single-modal(e.g., acoustic or vision)detection performance and low inter-information utilization. In this study, a multi-modal feature fusion net...Laser cleaning is a highly nonlinear physical process for solving poor single-modal(e.g., acoustic or vision)detection performance and low inter-information utilization. In this study, a multi-modal feature fusion network model was constructed based on a laser paint removal experiment. The alignment of heterogeneous data under different modals was solved by combining the piecewise aggregate approximation and gramian angular field. Moreover, the attention mechanism was introduced to optimize the dual-path network and dense connection network, enabling the sampling characteristics to be extracted and integrated. Consequently, the multi-modal discriminant detection of laser paint removal was realized. According to the experimental results, the verification accuracy of the constructed model on the experimental dataset was 99.17%, which is 5.77% higher than the optimal single-modal detection results of the laser paint removal. The feature extraction network was optimized by the attention mechanism, and the model accuracy was increased by 3.3%. Results verify the improved classification performance of the constructed multi-modal feature fusion model in detecting laser paint removal, the effective integration of acoustic data and visual image data, and the accurate detection of laser paint removal.展开更多
The random oscillations of many longitudinal modes are inevitable in both class –A and –B lasers due to their broadened atomic bandwidths. The destructive superposition of electric field components that are incohere...The random oscillations of many longitudinal modes are inevitable in both class –A and –B lasers due to their broadened atomic bandwidths. The destructive superposition of electric field components that are incoherently oscillating at the different longitudinal modes can be converted into a constructive one by using the mode-locking technique. Here, the Maxwell–Bloch equations of motion are solved for a three-mode class-B laser under the mode-locking conditions. The results indicate that the cavity oscillating modes are shifted by changing the laser pumping rate. On the other hand, the frequency components of cavity electric field simultaneously form the various bifurcations. These bifurcations satisfy the well-known mode-locking conditions as well. The atomic population inversion forms only one bifurcation, which is responsible for shaping the cavity electric field bifurcations.展开更多
In recent years, large language models have achieved breakthroughs on a wide range of benchmarks in natural language processing and continue to increase in performance. Recently, the advances of large language models ...In recent years, large language models have achieved breakthroughs on a wide range of benchmarks in natural language processing and continue to increase in performance. Recently, the advances of large language models have raised interest outside the natural language processing community and could have a large impact on daily life. In this paper, we pose the question: How will large language models and other foundation models shape the future product development process? We provide the reader with an overview of the subject by summarizing both recent advances in natural language processing and the use of information technology in the engineering design process. We argue that discourse should be regarded as the core of engineering design processes, and therefore should be represented in a digital artifact. On this basis, we describe how foundation models such as large language models could contribute to the design discourse by automating parts thereof that involve creativity and reasoning, and were previously reserved for humans. We describe how simulations, experiments, topology optimizations, and other process steps can be integrated into a machine-actionable, discourse-centric design process. As an example, we present a design discourse on the optimization of wind turbine blades. Finally, we outline the future research that will be necessary for the implementation of the conceptualized framework.展开更多
In the process of fault detection and classification,the operation mode usually drifts over time,which brings great challenges to the algorithms.Because traditional machine learning based fault classification cannot d...In the process of fault detection and classification,the operation mode usually drifts over time,which brings great challenges to the algorithms.Because traditional machine learning based fault classification cannot dynamically update the trained model according to the probability distribution of the testing dataset,the accuracy of these traditional methods usually drops significantly in the case of covariate shift.In this paper,an importance-weighted transfer learning method is proposed for fault classification in the nonlinear multi-mode industrial process.It effectively alters the drift between the training and testing dataset.Firstly,the mutual information method is utilized to perform feature selection on the original data,and a number of characteristic parameters associated with fault classification are selected according to their mutual information.Then,the importance-weighted least-squares probabilistic classifier(IWLSPC)is utilized for binary fault detection and multi-fault classification in covariate shift.Finally,the Tennessee Eastman(TE)benchmark is carried out to confirm the effectiveness of the proposed method.The experimental result shows that the covariate shift adaptation based on importance-weight sampling is superior to the traditional machine learning fault classification algorithms.Moreover,IWLSPC can not only be used for binary fault classification,but also can be applied to the multi-classification target in the process of fault diagnosis.展开更多
文摘Visual Question Answering(VQA)has sparked widespread interest as a crucial task in integrating vision and language.VQA primarily uses attention mechanisms to effectively answer questions to associate relevant visual regions with input questions.The detection-based features extracted by the object detection network aim to acquire the visual attention distribution on a predetermined detection frame and provide object-level insights to answer questions about foreground objects more effectively.However,it cannot answer the question about the background forms without detection boxes due to the lack of fine-grained details,which is the advantage of grid-based features.In this paper,we propose a Dual-Level Feature Embedding(DLFE)network,which effectively integrates grid-based and detection-based image features in a unified architecture to realize the complementary advantages of both features.Specifically,in DLFE,In DLFE,firstly,a novel Dual-Level Self-Attention(DLSA)modular is proposed to mine the intrinsic properties of the two features,where Positional Relation Attention(PRA)is designed to model the position information.Then,we propose a Feature Fusion Attention(FFA)to address the semantic noise caused by the fusion of two features and construct an alignment graph to enhance and align the grid and detection features.Finally,we use co-attention to learn the interactive features of the image and question and answer questions more accurately.Our method has significantly improved compared to the baseline,increasing accuracy from 66.01%to 70.63%on the test-std dataset of VQA 1.0 and from 66.24%to 70.91%for the test-std dataset of VQA 2.0.
基金Project(51875491) supported by the National Natural Science Foundation of ChinaProject(2021T3069) supported by the Fujian Science and Technology Plan STS Project,China。
文摘Laser cleaning is a highly nonlinear physical process for solving poor single-modal(e.g., acoustic or vision)detection performance and low inter-information utilization. In this study, a multi-modal feature fusion network model was constructed based on a laser paint removal experiment. The alignment of heterogeneous data under different modals was solved by combining the piecewise aggregate approximation and gramian angular field. Moreover, the attention mechanism was introduced to optimize the dual-path network and dense connection network, enabling the sampling characteristics to be extracted and integrated. Consequently, the multi-modal discriminant detection of laser paint removal was realized. According to the experimental results, the verification accuracy of the constructed model on the experimental dataset was 99.17%, which is 5.77% higher than the optimal single-modal detection results of the laser paint removal. The feature extraction network was optimized by the attention mechanism, and the model accuracy was increased by 3.3%. Results verify the improved classification performance of the constructed multi-modal feature fusion model in detecting laser paint removal, the effective integration of acoustic data and visual image data, and the accurate detection of laser paint removal.
文摘The random oscillations of many longitudinal modes are inevitable in both class –A and –B lasers due to their broadened atomic bandwidths. The destructive superposition of electric field components that are incoherently oscillating at the different longitudinal modes can be converted into a constructive one by using the mode-locking technique. Here, the Maxwell–Bloch equations of motion are solved for a three-mode class-B laser under the mode-locking conditions. The results indicate that the cavity oscillating modes are shifted by changing the laser pumping rate. On the other hand, the frequency components of cavity electric field simultaneously form the various bifurcations. These bifurcations satisfy the well-known mode-locking conditions as well. The atomic population inversion forms only one bifurcation, which is responsible for shaping the cavity electric field bifurcations.
基金the German Research Foundation(DFG)–project number:442146713.
文摘In recent years, large language models have achieved breakthroughs on a wide range of benchmarks in natural language processing and continue to increase in performance. Recently, the advances of large language models have raised interest outside the natural language processing community and could have a large impact on daily life. In this paper, we pose the question: How will large language models and other foundation models shape the future product development process? We provide the reader with an overview of the subject by summarizing both recent advances in natural language processing and the use of information technology in the engineering design process. We argue that discourse should be regarded as the core of engineering design processes, and therefore should be represented in a digital artifact. On this basis, we describe how foundation models such as large language models could contribute to the design discourse by automating parts thereof that involve creativity and reasoning, and were previously reserved for humans. We describe how simulations, experiments, topology optimizations, and other process steps can be integrated into a machine-actionable, discourse-centric design process. As an example, we present a design discourse on the optimization of wind turbine blades. Finally, we outline the future research that will be necessary for the implementation of the conceptualized framework.
文摘In the process of fault detection and classification,the operation mode usually drifts over time,which brings great challenges to the algorithms.Because traditional machine learning based fault classification cannot dynamically update the trained model according to the probability distribution of the testing dataset,the accuracy of these traditional methods usually drops significantly in the case of covariate shift.In this paper,an importance-weighted transfer learning method is proposed for fault classification in the nonlinear multi-mode industrial process.It effectively alters the drift between the training and testing dataset.Firstly,the mutual information method is utilized to perform feature selection on the original data,and a number of characteristic parameters associated with fault classification are selected according to their mutual information.Then,the importance-weighted least-squares probabilistic classifier(IWLSPC)is utilized for binary fault detection and multi-fault classification in covariate shift.Finally,the Tennessee Eastman(TE)benchmark is carried out to confirm the effectiveness of the proposed method.The experimental result shows that the covariate shift adaptation based on importance-weight sampling is superior to the traditional machine learning fault classification algorithms.Moreover,IWLSPC can not only be used for binary fault classification,but also can be applied to the multi-classification target in the process of fault diagnosis.