This paper investigates the wireless communication with a novel architecture of antenna arrays,termed modular extremely large-scale array(XLarray),where array elements of an extremely large number/size are regularly m...This paper investigates the wireless communication with a novel architecture of antenna arrays,termed modular extremely large-scale array(XLarray),where array elements of an extremely large number/size are regularly mounted on a shared platform with both horizontally and vertically interlaced modules.Each module consists of a moderate/flexible number of array elements with the inter-element distance typically in the order of the signal wavelength,while different modules are separated by the relatively large inter-module distance for convenience of practical deployment.By accurately modelling the signal amplitudes and phases,as well as projected apertures across all modular elements,we analyse the near-field signal-to-noise ratio(SNR)performance for modular XL-array communications.Based on the non-uniform spherical wave(NUSW)modelling,the closed-form SNR expression is derived in terms of key system parameters,such as the overall modular array size,distances of adjacent modules along all dimensions,and the user's three-dimensional(3D)location.In addition,with the number of modules in different dimensions increasing infinitely,the asymptotic SNR scaling laws are revealed.Furthermore,we show that our proposed near-field modelling and performance analysis include the results for existing array architectures/modelling as special cases,e.g.,the collocated XL-array architecture,the uniform plane wave(UPW)based far-field modelling,and the modular extremely large-scale uniform linear array(XL-ULA)of onedimension.Extensive simulation results are presented to validate our findings.展开更多
Models for the design of assembly processes are considered. Various models for the voice control of an industrial robot are considered: a logical model, semantic networks, a frame model and Petri nets. It is shown tha...Models for the design of assembly processes are considered. Various models for the voice control of an industrial robot are considered: a logical model, semantic networks, a frame model and Petri nets. It is shown that this set of models allows describing the process of designing the technological process for an industrial robot. The logical model of the technological process allows you to define logical relationships. A model based on semantic networks describes the relationship between assembly units in a detail. This allows you to determine the order and method of registration, as well as the mutual orientation of assembly units in the product. The frame model provides the ability to streamline the execution of the build process. A model based on Petri nets allows one to describe the type and sequence of technological transitions. Based on the proposed models, a method of voice control for an industrial robot is developed. The basic principles of voice control for an industrial robot are considered.展开更多
Extremely large-scale hybrid reconfigurable intelligence surface(XL-HRIS),an improved version of the RIS,can receive the incident signal and enhance communication performance.However,as the RIS size increases,the phas...Extremely large-scale hybrid reconfigurable intelligence surface(XL-HRIS),an improved version of the RIS,can receive the incident signal and enhance communication performance.However,as the RIS size increases,the phase variations of the received signal across the whole array are nonnegligible in the near-field region,and the channel model mismatch,which will decrease the estimation accuracy,must be considered.In this paper,the lower bound(LB)of the estimated parameter is studied and the impacts of the distance and signal-tonoise ratio(SNR)on LB are then evaluated.Moreover,the impacts of the array scale on LB and spectral efficiency(SE)are also studied.Simulation results verify that even in extremely large-scale array systems with infinite SNR,channel model mismatch can still limit estimation accuracy.However,this impact decreases with increasing distance.展开更多
Voice conversion algorithm aims to provide high level of similarity to the target voice with an acceptable level of quality.The main object of this paper was to build a nonlinear relationship between the parameters fo...Voice conversion algorithm aims to provide high level of similarity to the target voice with an acceptable level of quality.The main object of this paper was to build a nonlinear relationship between the parameters for the acoustical features of source and target speaker using Non-Linear Canonical Correlation Analysis(NLCCA) based on jointed Gaussian mixture model.Speaker indi-viduality transformation was achieved mainly by altering vocal tract characteristics represented by Line Spectral Frequencies(LSF).To obtain the transformed speech which sounded more like the target voices,prosody modification is involved through residual prediction.Both objective and subjective evaluations were conducted.The experimental results demonstrated that our proposed algorithm was effective and outperformed the conventional conversion method utilized by the Minimum Mean Square Error(MMSE) estimation.展开更多
This study examines vishing, a form of social engineering scam using voice communication to deceive individuals into revealing sensitive information or losing money. With the rise of smartphone usage, people are more ...This study examines vishing, a form of social engineering scam using voice communication to deceive individuals into revealing sensitive information or losing money. With the rise of smartphone usage, people are more susceptible to vishing attacks. The proposed Emoti-Shing model analyzes potential victims’ emotions using Hidden Markov Models to track vishing scams by examining the emotional content of phone call audio conversations. This approach aims to detect vishing scams using biological features of humans, specifically emotions, which cannot be easily masked or spoofed. Experimental results on 30 generated emotions indicate the potential for increased vishing scam detection through this approach.展开更多
The quality of experience( QoE) evaluation model for voice over IP( VoI P) service is studied to analyze the impact of network parameters on voice quality and monitor voice quality in real-time for operators.First...The quality of experience( QoE) evaluation model for voice over IP( VoI P) service is studied to analyze the impact of network parameters on voice quality and monitor voice quality in real-time for operators.Firstly,the influence of some network parameters on the voice quality of VoI P is investigated. Then,a simulation platform for VoI P transmission is built to collect voice data under different network enviornments. According to the simulation results,a new mapping model between these arguments and VoI P voice quality is deduced. Finally,the accuracy of this voice quality evaluation model is examined and the results demanstrate that it has high reliability and feasibility.展开更多
On September 8, 2018, an M_S 5.9 earthquake struck Mojiang, a county in Yunnan Province, China. We collect near-field seismic recordings(epicentral distances less than 200 km) to relocate the mainshock and the aftersh...On September 8, 2018, an M_S 5.9 earthquake struck Mojiang, a county in Yunnan Province, China. We collect near-field seismic recordings(epicentral distances less than 200 km) to relocate the mainshock and the aftershocks within the first 60 hours to determine the focal mechanism solutions of the mainshock and some of the aftershocks and to invert for the finite-fault model of the mainshock.The focal mechanism solution of the mainshock and the relocation results of the aftershocks constrain the mainshock on a nearly vertical fault plane striking northeast and dipping to the southeast. The inversion of the finite-fault model reveals only a single slip asperity on the fault plane. The major slip is distributed above the initiation point, ~14 km wide along the down-dip direction and ~14 km long along the strike direction, with a maximal slip of ~22 cm at a depth of ~6 km. The focal mechanism solutions of the aftershocks show that most of the aftershocks are of the strike-slip type, a number of them are of the normal-slip type, and only a few of them are of the thrust-slip type.On average, strike-slip is dominant on the fault plane of the mainshock, as the focal mechanism solution of the mainshock suggests, but when examined in detail, slight thrust-slip appears on the southwest of the fault plane while an obvious part of normal-slip appears on the northeast, which is consistent with what the focal mechanism solutions of the aftershocks display. The multiple types of aftershock focal mechanism solutions and the slip details of the mainshock both suggest a complex tectonic setting, stress setting, or both. The intensity contours predicted exhibit a longer axis trending from northeast to southwest and a maximal intensity of Ⅷ around the epicenter and in the northwest.展开更多
Most of the near-field source localization methods are developed with the approximated signal model,because the phases of the received near-field signal are highly non-linear.Nevertheless,the approximated signal model...Most of the near-field source localization methods are developed with the approximated signal model,because the phases of the received near-field signal are highly non-linear.Nevertheless,the approximated signal model based methods suffer from model mismatch and performance degradation while the exact signal model based estimation methods usually involve parameter searching or multiple decomposition procedures.In this paper,a search-free near-field source localization method is proposed with the exact signal model.Firstly,the approximative estimates of the direction of arrival(DOA)and range are obtained by using the approximated signal model based method through parameter separation and polynomial rooting operations.Then,the approximative estimates are corrected with the exact signal model according to the exact expressions of phase difference in near-field observations.The proposed method avoids spectral searching and parameter pairing and has enhanced estimation performance.Numerical simulations are provided to demonstrate the effectiveness of the proposed method.展开更多
In conventional source-filter models, voiced and unvoiced components were considered independently. However, in practice it was difficult to separate the source into two parts. An actual source consists of a mixture o...In conventional source-filter models, voiced and unvoiced components were considered independently. However, in practice it was difficult to separate the source into two parts. An actual source consists of a mixture of two sources and the ratio varies according to the content or the intention of speaker. It had been investigated to separate the voiced and unvoiced components for different source models. Source signals were modeled based on the residual signal measured from inverse filtering. Three different source models were assumed. The parameters of each model were optimized for the original speech signal using a genetic algorithm. The resulting parameters were compared in terms of the mel-cepstral distance to the original signal, the spectrogram and the spectral envelope from the synthesized signal. The optimization method achieves an improvement of 15% for the Klatt model, but there is little improvement in the modified residual case.展开更多
The hybrid slip model used to generate a finite fault model for near-field ground motion estimation and seismic hazard assessment was improved to express the uncertainty of the source form of a future earthquake.In th...The hybrid slip model used to generate a finite fault model for near-field ground motion estimation and seismic hazard assessment was improved to express the uncertainty of the source form of a future earthquake.In this process, source parameters were treated as normal random variables, and the Fortran code of hybrid slip model was modified by adding a random number generator so that the code could generate many finite fault models with different dimensions and slip distributions for a given magnitude.Furth...展开更多
This paper improves and presents an advanced method of the voice conversion system based on Gaussian Mixture Models(GMM) models by changing the time-scale of speech.The Speech Transformation and Representation using A...This paper improves and presents an advanced method of the voice conversion system based on Gaussian Mixture Models(GMM) models by changing the time-scale of speech.The Speech Transformation and Representation using Adaptive Interpolation of weiGHTed spectrum(STRAIGHT) model is adopted to extract the spectrum features,and the GMM models are trained to generate the conversion function.The spectrum features of a source speech will be converted by the conversion function.The time-scale of speech is changed by extracting the converted features and adding to the spectrum.The conversion voice was evaluated by subjective and objective measurements.The results confirm that the transformed speech not only approximates the characteristics of the target speaker,but also more natural and more intelligible.展开更多
A method of robust speech endpoint detection in airplane cockpit voice background is presented. Based on the analysis of background noise character, a complex Laplacian distribution model directly aiming at noisy spee...A method of robust speech endpoint detection in airplane cockpit voice background is presented. Based on the analysis of background noise character, a complex Laplacian distribution model directly aiming at noisy speech is established. Then the likelihood ratio test based on binary hypothesis test is carried out. The decision criterion of conventional maximum a posterior incorporating the inter-frame correlation leads to two separate thresholds. Speech endpoint detection decision is finally made depend on the previous frame and the observed spectrum, and the speech endpoint is searched based on the decision. Compared with the typical algorithms, the proposed method operates robust in the airplane cockpit voice background.展开更多
基金supported by the National Key R&D Program of China with Grant number 2019YFB1803400the National Natural Science Foundation of China under Grant number 62071114the Fundamental Research Funds for the Central Universities of China under grant numbers 3204002004A2 and 2242022k30005。
文摘This paper investigates the wireless communication with a novel architecture of antenna arrays,termed modular extremely large-scale array(XLarray),where array elements of an extremely large number/size are regularly mounted on a shared platform with both horizontally and vertically interlaced modules.Each module consists of a moderate/flexible number of array elements with the inter-element distance typically in the order of the signal wavelength,while different modules are separated by the relatively large inter-module distance for convenience of practical deployment.By accurately modelling the signal amplitudes and phases,as well as projected apertures across all modular elements,we analyse the near-field signal-to-noise ratio(SNR)performance for modular XL-array communications.Based on the non-uniform spherical wave(NUSW)modelling,the closed-form SNR expression is derived in terms of key system parameters,such as the overall modular array size,distances of adjacent modules along all dimensions,and the user's three-dimensional(3D)location.In addition,with the number of modules in different dimensions increasing infinitely,the asymptotic SNR scaling laws are revealed.Furthermore,we show that our proposed near-field modelling and performance analysis include the results for existing array architectures/modelling as special cases,e.g.,the collocated XL-array architecture,the uniform plane wave(UPW)based far-field modelling,and the modular extremely large-scale uniform linear array(XL-ULA)of onedimension.Extensive simulation results are presented to validate our findings.
文摘Models for the design of assembly processes are considered. Various models for the voice control of an industrial robot are considered: a logical model, semantic networks, a frame model and Petri nets. It is shown that this set of models allows describing the process of designing the technological process for an industrial robot. The logical model of the technological process allows you to define logical relationships. A model based on semantic networks describes the relationship between assembly units in a detail. This allows you to determine the order and method of registration, as well as the mutual orientation of assembly units in the product. The frame model provides the ability to streamline the execution of the build process. A model based on Petri nets allows one to describe the type and sequence of technological transitions. Based on the proposed models, a method of voice control for an industrial robot is developed. The basic principles of voice control for an industrial robot are considered.
基金supported in part by the National Natural Science Founda⁃tion of China(NSFC)under Grant Nos.62301148,62341107,and 62261160576by the Natural Science Foundation of Jiangsu Prov⁃ince under Grant No.BK20230824in part by the Key Technologies R&D Program of Jiangsu(Prospective and Key Technologies for Indus⁃try)under Grant Nos.BE2023022 and BE2023022-1.
文摘Extremely large-scale hybrid reconfigurable intelligence surface(XL-HRIS),an improved version of the RIS,can receive the incident signal and enhance communication performance.However,as the RIS size increases,the phase variations of the received signal across the whole array are nonnegligible in the near-field region,and the channel model mismatch,which will decrease the estimation accuracy,must be considered.In this paper,the lower bound(LB)of the estimated parameter is studied and the impacts of the distance and signal-tonoise ratio(SNR)on LB are then evaluated.Moreover,the impacts of the array scale on LB and spectral efficiency(SE)are also studied.Simulation results verify that even in extremely large-scale array systems with infinite SNR,channel model mismatch can still limit estimation accuracy.However,this impact decreases with increasing distance.
基金Supported by the National High Technology Research and Development Program of China (863 Program,No.2006AA010102)
文摘Voice conversion algorithm aims to provide high level of similarity to the target voice with an acceptable level of quality.The main object of this paper was to build a nonlinear relationship between the parameters for the acoustical features of source and target speaker using Non-Linear Canonical Correlation Analysis(NLCCA) based on jointed Gaussian mixture model.Speaker indi-viduality transformation was achieved mainly by altering vocal tract characteristics represented by Line Spectral Frequencies(LSF).To obtain the transformed speech which sounded more like the target voices,prosody modification is involved through residual prediction.Both objective and subjective evaluations were conducted.The experimental results demonstrated that our proposed algorithm was effective and outperformed the conventional conversion method utilized by the Minimum Mean Square Error(MMSE) estimation.
文摘This study examines vishing, a form of social engineering scam using voice communication to deceive individuals into revealing sensitive information or losing money. With the rise of smartphone usage, people are more susceptible to vishing attacks. The proposed Emoti-Shing model analyzes potential victims’ emotions using Hidden Markov Models to track vishing scams by examining the emotional content of phone call audio conversations. This approach aims to detect vishing scams using biological features of humans, specifically emotions, which cannot be easily masked or spoofed. Experimental results on 30 generated emotions indicate the potential for increased vishing scam detection through this approach.
基金Supported by China National S&T Major Project(2012ZX03001034MCM 201240113)
文摘The quality of experience( QoE) evaluation model for voice over IP( VoI P) service is studied to analyze the impact of network parameters on voice quality and monitor voice quality in real-time for operators.Firstly,the influence of some network parameters on the voice quality of VoI P is investigated. Then,a simulation platform for VoI P transmission is built to collect voice data under different network enviornments. According to the simulation results,a new mapping model between these arguments and VoI P voice quality is deduced. Finally,the accuracy of this voice quality evaluation model is examined and the results demanstrate that it has high reliability and feasibility.
基金supported by the National Natural Science Foundation of China(project 41804088)the Special Fund of the Institute of Geophysics,China Earthquake Administration(project DQJB19B08)
文摘On September 8, 2018, an M_S 5.9 earthquake struck Mojiang, a county in Yunnan Province, China. We collect near-field seismic recordings(epicentral distances less than 200 km) to relocate the mainshock and the aftershocks within the first 60 hours to determine the focal mechanism solutions of the mainshock and some of the aftershocks and to invert for the finite-fault model of the mainshock.The focal mechanism solution of the mainshock and the relocation results of the aftershocks constrain the mainshock on a nearly vertical fault plane striking northeast and dipping to the southeast. The inversion of the finite-fault model reveals only a single slip asperity on the fault plane. The major slip is distributed above the initiation point, ~14 km wide along the down-dip direction and ~14 km long along the strike direction, with a maximal slip of ~22 cm at a depth of ~6 km. The focal mechanism solutions of the aftershocks show that most of the aftershocks are of the strike-slip type, a number of them are of the normal-slip type, and only a few of them are of the thrust-slip type.On average, strike-slip is dominant on the fault plane of the mainshock, as the focal mechanism solution of the mainshock suggests, but when examined in detail, slight thrust-slip appears on the southwest of the fault plane while an obvious part of normal-slip appears on the northeast, which is consistent with what the focal mechanism solutions of the aftershocks display. The multiple types of aftershock focal mechanism solutions and the slip details of the mainshock both suggest a complex tectonic setting, stress setting, or both. The intensity contours predicted exhibit a longer axis trending from northeast to southwest and a maximal intensity of Ⅷ around the epicenter and in the northwest.
基金supported by the Key Laboratory of Dynamic Cognitive System of Electromagnetic Spectrum Space(KF20202109)the National Natural Science Foundation of China(82004259)the Young Talent Training Project of Guangzhou University of Chinese Medicine(QNYC20190110).
文摘Most of the near-field source localization methods are developed with the approximated signal model,because the phases of the received near-field signal are highly non-linear.Nevertheless,the approximated signal model based methods suffer from model mismatch and performance degradation while the exact signal model based estimation methods usually involve parameter searching or multiple decomposition procedures.In this paper,a search-free near-field source localization method is proposed with the exact signal model.Firstly,the approximative estimates of the direction of arrival(DOA)and range are obtained by using the approximated signal model based method through parameter separation and polynomial rooting operations.Then,the approximative estimates are corrected with the exact signal model according to the exact expressions of phase difference in near-field observations.The proposed method avoids spectral searching and parameter pairing and has enhanced estimation performance.Numerical simulations are provided to demonstrate the effectiveness of the proposed method.
基金supported by the Second Stage of Brain Korea 21 Projects
文摘In conventional source-filter models, voiced and unvoiced components were considered independently. However, in practice it was difficult to separate the source into two parts. An actual source consists of a mixture of two sources and the ratio varies according to the content or the intention of speaker. It had been investigated to separate the voiced and unvoiced components for different source models. Source signals were modeled based on the residual signal measured from inverse filtering. Three different source models were assumed. The parameters of each model were optimized for the original speech signal using a genetic algorithm. The resulting parameters were compared in terms of the mel-cepstral distance to the original signal, the spectrogram and the spectral envelope from the synthesized signal. The optimization method achieves an improvement of 15% for the Klatt model, but there is little improvement in the modified residual case.
基金Supported by National Natural Science Foundation of China (No. 50778058 and No. 90715038)National Key Technology Research and Development Program of China (No. 2006BAC13B02)Major State Basic Research Development Program of China ("973" Program, No. 2008CB425802)
文摘The hybrid slip model used to generate a finite fault model for near-field ground motion estimation and seismic hazard assessment was improved to express the uncertainty of the source form of a future earthquake.In this process, source parameters were treated as normal random variables, and the Fortran code of hybrid slip model was modified by adding a random number generator so that the code could generate many finite fault models with different dimensions and slip distributions for a given magnitude.Furth...
基金Supported by the National Natural Science Foundation of China (No. 60872105)the Program for Science & Technology Innovative Research Team of Qing Lan Project in Higher Educational Institutions of Jiangsuthe Priority Academic Program Development of Jiangsu Higher Education Institutions (PAPD)
文摘This paper improves and presents an advanced method of the voice conversion system based on Gaussian Mixture Models(GMM) models by changing the time-scale of speech.The Speech Transformation and Representation using Adaptive Interpolation of weiGHTed spectrum(STRAIGHT) model is adopted to extract the spectrum features,and the GMM models are trained to generate the conversion function.The spectrum features of a source speech will be converted by the conversion function.The time-scale of speech is changed by extracting the converted features and adding to the spectrum.The conversion voice was evaluated by subjective and objective measurements.The results confirm that the transformed speech not only approximates the characteristics of the target speaker,but also more natural and more intelligible.
文摘A method of robust speech endpoint detection in airplane cockpit voice background is presented. Based on the analysis of background noise character, a complex Laplacian distribution model directly aiming at noisy speech is established. Then the likelihood ratio test based on binary hypothesis test is carried out. The decision criterion of conventional maximum a posterior incorporating the inter-frame correlation leads to two separate thresholds. Speech endpoint detection decision is finally made depend on the previous frame and the observed spectrum, and the speech endpoint is searched based on the decision. Compared with the typical algorithms, the proposed method operates robust in the airplane cockpit voice background.