With the development of Internet technology,the explosive growth of Internet information presentation has led to difficulty in filtering effective information.Finding a model with high accuracy for text classification...With the development of Internet technology,the explosive growth of Internet information presentation has led to difficulty in filtering effective information.Finding a model with high accuracy for text classification has become a critical problem to be solved by text filtering,especially for Chinese texts.This paper selected the manually calibrated Douban movie website comment data for research.First,a text filtering model based on the BP neural network has been built;Second,based on the Term Frequency-Inverse Document Frequency(TF-IDF)vector space model and the doc2vec method,the text word frequency vector and the text semantic vector were obtained respectively,and the text word frequency vector was linearly reduced by the Principal Component Analysis(PCA)method.Third,the text word frequency vector after dimensionality reduction and the text semantic vector were combined,add the text value degree,and the text synthesis vector was constructed.Experiments show that the model combined with text word frequency vector degree after dimensionality reduction,text semantic vector,and text value has reached the highest accuracy of 84.67%.展开更多
The pressure and horizontal particle velocity combined descriptions in the very low frequency acoustic field of shallow wa- ter integrated with the concept of effective depth of Pekeris wave- guide is proposed, especi...The pressure and horizontal particle velocity combined descriptions in the very low frequency acoustic field of shallow wa- ter integrated with the concept of effective depth of Pekeris wave- guide is proposed, especially the active component of the pressure and horizontal particle velocity cross-spectrum, also called ho- rizontal complex cross acoustic intensity, when only two normal modes are trapped in the waveguide. Both the approximate theo- retic analysis and the numerical results show that the sign of the horizontal complex cross acoustic intensity active component is independent of the range when vertically deployed receiving dual sensors are placed in appropriate depths, the sum of which is equal to the waveguide effective depth, so it can be used to tell whether the sound source is near the surface or underwater; while the range rate is expected to be measured by utilizing the sign distribution characteristic of the reactive component. The further robustness analysis of the depth classification algorithm shows that the existence of shear waves in semi infinite basement and the change of acoustic velocity profiles have few effects on the application of this method, and the seabed attenuation will limit the detection range, but the algorithm still has a good robustness in the valid detection range.展开更多
Observational data such as those obtained from the magnetosheath in the downstream of Earth's bow shock have waveforms that differ from those of sinusoidal signals. In practice, they are not even aggregates of sinuso...Observational data such as those obtained from the magnetosheath in the downstream of Earth's bow shock have waveforms that differ from those of sinusoidal signals. In practice, they are not even aggregates of sinusoidal signals. Therefore, the frequency decomposition for the data requires technique that will account for the time-varying features of the data that will lead to deduction of physical meaning of the observations. The combination of empirical mode decompo- sition (EMD) and Hilbert transform has been used for extracting the various contributing oscillatory modes (EMDs) and the instantaneous frequency determination (Hilbert transform) of every physically meaningful mode called intrinsic mode func- tion (IMF). The resulting instantaneous frequencies are used to determine instantaneous wave vectors. The combination of the instantaneous frequencies and wave vectors is useful in the identification of wave modes based on the characteristics of the waves. The results show that EMD-Hilbert can be more reliable than simple Hilbert transform alone.展开更多
String similarity join(SSJ) is essential for many applications where near-duplicate objects need to be found. This paper targets SSJ with edit distance constraints. The existing algorithms usually adopt the filter-and...String similarity join(SSJ) is essential for many applications where near-duplicate objects need to be found. This paper targets SSJ with edit distance constraints. The existing algorithms usually adopt the filter-andrefine framework. They cannot catch the dissimilarity between string subsets, and do not fully exploit the statistics such as the frequencies of characters. We investigate to develop a partition-based algorithm by using such statistics.The frequency vectors are used to partition datasets into data chunks with dissimilarity between them being caught easily. A novel algorithm is designed to accelerate SSJ via the partitioned data. A new filter is proposed to leverage the statistics to avoid computing edit distances for a noticeable proportion of candidate pairs which survive the existing filters. Our algorithm outperforms alternative methods notably on real datasets.展开更多
基金Supported by the Sichuan Science and Technology Program (2021YFQ0003).
文摘With the development of Internet technology,the explosive growth of Internet information presentation has led to difficulty in filtering effective information.Finding a model with high accuracy for text classification has become a critical problem to be solved by text filtering,especially for Chinese texts.This paper selected the manually calibrated Douban movie website comment data for research.First,a text filtering model based on the BP neural network has been built;Second,based on the Term Frequency-Inverse Document Frequency(TF-IDF)vector space model and the doc2vec method,the text word frequency vector and the text semantic vector were obtained respectively,and the text word frequency vector was linearly reduced by the Principal Component Analysis(PCA)method.Third,the text word frequency vector after dimensionality reduction and the text semantic vector were combined,add the text value degree,and the text synthesis vector was constructed.Experiments show that the model combined with text word frequency vector degree after dimensionality reduction,text semantic vector,and text value has reached the highest accuracy of 84.67%.
基金supported by the National Natural Science Foundation of China(1140440611374072)
文摘The pressure and horizontal particle velocity combined descriptions in the very low frequency acoustic field of shallow wa- ter integrated with the concept of effective depth of Pekeris wave- guide is proposed, especially the active component of the pressure and horizontal particle velocity cross-spectrum, also called ho- rizontal complex cross acoustic intensity, when only two normal modes are trapped in the waveguide. Both the approximate theo- retic analysis and the numerical results show that the sign of the horizontal complex cross acoustic intensity active component is independent of the range when vertically deployed receiving dual sensors are placed in appropriate depths, the sum of which is equal to the waveguide effective depth, so it can be used to tell whether the sound source is near the surface or underwater; while the range rate is expected to be measured by utilizing the sign distribution characteristic of the reactive component. The further robustness analysis of the depth classification algorithm shows that the existence of shear waves in semi infinite basement and the change of acoustic velocity profiles have few effects on the application of this method, and the seabed attenuation will limit the detection range, but the algorithm still has a good robustness in the valid detection range.
文摘Observational data such as those obtained from the magnetosheath in the downstream of Earth's bow shock have waveforms that differ from those of sinusoidal signals. In practice, they are not even aggregates of sinusoidal signals. Therefore, the frequency decomposition for the data requires technique that will account for the time-varying features of the data that will lead to deduction of physical meaning of the observations. The combination of empirical mode decompo- sition (EMD) and Hilbert transform has been used for extracting the various contributing oscillatory modes (EMDs) and the instantaneous frequency determination (Hilbert transform) of every physically meaningful mode called intrinsic mode func- tion (IMF). The resulting instantaneous frequencies are used to determine instantaneous wave vectors. The combination of the instantaneous frequencies and wave vectors is useful in the identification of wave modes based on the characteristics of the waves. The results show that EMD-Hilbert can be more reliable than simple Hilbert transform alone.
文摘String similarity join(SSJ) is essential for many applications where near-duplicate objects need to be found. This paper targets SSJ with edit distance constraints. The existing algorithms usually adopt the filter-andrefine framework. They cannot catch the dissimilarity between string subsets, and do not fully exploit the statistics such as the frequencies of characters. We investigate to develop a partition-based algorithm by using such statistics.The frequency vectors are used to partition datasets into data chunks with dissimilarity between them being caught easily. A novel algorithm is designed to accelerate SSJ via the partitioned data. A new filter is proposed to leverage the statistics to avoid computing edit distances for a noticeable proportion of candidate pairs which survive the existing filters. Our algorithm outperforms alternative methods notably on real datasets.