期刊文献+
共找到13篇文章
< 1 >
每页显示 20 50 100
Spatio-Temporal Context-Guided Algorithm for Lossless Point Cloud Geometry Compression
1
作者 ZHANG Huiran DONG Zhen WANG Mingsheng 《ZTE Communications》 2023年第4期17-28,共12页
Point cloud compression is critical to deploy 3D representation of the physical world such as 3D immersive telepresence,autonomous driving,and cultural heritage preservation.However,point cloud data are distributed ir... Point cloud compression is critical to deploy 3D representation of the physical world such as 3D immersive telepresence,autonomous driving,and cultural heritage preservation.However,point cloud data are distributed irregularly and discontinuously in spatial and temporal domains,where redundant unoccupied voxels and weak correlations in 3D space make achieving efficient compression a challenging problem.In this paper,we propose a spatio-temporal context-guided algorithm for lossless point cloud geometry compression.The proposed scheme starts with dividing the point cloud into sliced layers of unit thickness along the longest axis.Then,it introduces a prediction method where both intraframe and inter-frame point clouds are available,by determining correspondences between adjacent layers and estimating the shortest path using the travelling salesman algorithm.Finally,the few prediction residual is efficiently compressed with optimal context-guided and adaptive fastmode arithmetic coding techniques.Experiments prove that the proposed method can effectively achieve low bit rate lossless compression of point cloud geometric information,and is suitable for 3D point cloud compression applicable to various types of scenes. 展开更多
关键词 point cloud geometry compression single-frame point clouds multi-frame point clouds predictive coding arithmetic coding
下载PDF
Learning a Deep Predictive Coding Network for a Semi-Supervised 3D-Hand Pose Estimation 被引量:2
2
作者 Jamal Banzi Isack Bulugu Zhongfu Ye 《IEEE/CAA Journal of Automatica Sinica》 SCIE EI CSCD 2020年第5期1371-1379,共9页
In this paper we present a CNN based approach for a real time 3 D-hand pose estimation from the depth sequence.Prior discriminative approaches have achieved remarkable success but are facing two main challenges:Firstl... In this paper we present a CNN based approach for a real time 3 D-hand pose estimation from the depth sequence.Prior discriminative approaches have achieved remarkable success but are facing two main challenges:Firstly,the methods are fully supervised hence require large numbers of annotated training data to extract the dynamic information from a hand representation.Secondly,unreliable hand detectors based on strong assumptions or a weak detector which often fail in several situations like complex environment and multiple hands.In contrast to these methods,this paper presents an approach that can be considered as semi-supervised by performing predictive coding of image sequences of hand poses in order to capture latent features underlying a given image without supervision.The hand is modelled using a novel latent tree dependency model(LDTM)which transforms internal joint location to an explicit representation.Then the modeled hand topology is integrated with the pose estimator using data dependent method to jointly learn latent variables of the posterior pose appearance and the pose configuration respectively.Finally,an unsupervised error term which is a part of the recurrent architecture ensures smooth estimations of the final pose.Experiments on three challenging public datasets,ICVL,MSRA,and NYU demonstrate the significant performance of the proposed method which is comparable or better than state-of-the-art approaches. 展开更多
关键词 Convolutional neural networks deep learning hand pose estimation human-machine interaction predictive coding recurrent neural networks unsupervised learning
下载PDF
Robust Speech Recognition System Using Conventional and Hybrid Features of MFCC,LPCC,PLP,RASTA-PLP and Hidden Markov Model Classifier in Noisy Conditions 被引量:7
3
作者 Veton Z.Kepuska Hussien A.Elharati 《Journal of Computer and Communications》 2015年第6期1-9,共9页
In recent years, the accuracy of speech recognition (SR) has been one of the most active areas of research. Despite that SR systems are working reasonably well in quiet conditions, they still suffer severe performance... In recent years, the accuracy of speech recognition (SR) has been one of the most active areas of research. Despite that SR systems are working reasonably well in quiet conditions, they still suffer severe performance degradation in noisy conditions or distorted channels. It is necessary to search for more robust feature extraction methods to gain better performance in adverse conditions. This paper investigates the performance of conventional and new hybrid speech feature extraction algorithms of Mel Frequency Cepstrum Coefficient (MFCC), Linear Prediction Coding Coefficient (LPCC), perceptual linear production (PLP), and RASTA-PLP in noisy conditions through using multivariate Hidden Markov Model (HMM) classifier. The behavior of the proposal system is evaluated using TIDIGIT human voice dataset corpora, recorded from 208 different adult speakers in both training and testing process. The theoretical basis for speech processing and classifier procedures were presented, and the recognition results were obtained based on word recognition rate. 展开更多
关键词 Speech Recognition Noisy Conditions Feature Extraction Mel-Frequency Cepstral Coefficients Linear Predictive Coding Coefficients Perceptual Linear Production RASTA-PLP Isolated Speech Hidden Markov Model
下载PDF
Fast determination of meso-level mechanical parameters of PFC models 被引量:4
4
作者 Guo Jianwei Xu Guoan +1 位作者 Jing Hongwen Kuang Tiejun 《International Journal of Mining Science and Technology》 SCIE EI 2013年第1期157-162,共6页
To solve the problems of blindness and inefficiency existing in the determination of meso-level mechanical parameters of particle flow code (PFC) models, we firstly designed and numerically carried out orthogonal test... To solve the problems of blindness and inefficiency existing in the determination of meso-level mechanical parameters of particle flow code (PFC) models, we firstly designed and numerically carried out orthogonal tests on rock samples to investigate the correlations between macro-and meso-level mechanical parameters of rock-like bonded granular materials. Then based on the artificial intelligent technology, the intelligent prediction systems for nine meso-level mechanical parameters of PFC models were obtained by creating, training and testing the prediction models with the set of data got from the orthogonal tests. Lastly the prediction systems were used to predict the meso-level mechanical parameters of one kind of sandy mudstone, and according to the predicted results the macroscopic properties of the rock were obtained by numerical tests. The maximum relative error between the numerical test results and real rock properties is 3.28% which satisfies the precision requirement in engineering. It shows that this paper provides a fast and accurate method for the determination of meso-level mechanical parameters of PFC models. 展开更多
关键词 Particle flow code Meso-level mechanical parameter Macroscopic property Orthogonal test Intelligent prediction
下载PDF
Complete Focal Plane Compression Based on CMOS Image Sensor Using Predictive Coding
5
作者 姚素英 于潇 +1 位作者 高静 徐江涛 《Transactions of Tianjin University》 EI CAS 2015年第1期83-89,共7页
In this paper, a CMOS image sensor(CIS) is proposed, which can accomplish both decorrelation and entropy coding of image compression directly on the focal plane. The design is based on predictive coding for image deco... In this paper, a CMOS image sensor(CIS) is proposed, which can accomplish both decorrelation and entropy coding of image compression directly on the focal plane. The design is based on predictive coding for image decorrelation. The predictions are performed in analog domain by 2×2 pixel units. Both the prediction residuals and original pixel values are quantized and encoded in parallel. Since the residuals have a peak distribution around zero,the output codewords can be replaced by the valid part of the residuals' binary mode. The compressed bit stream is accessible directly at the output of CIS without extra disposition. Simulation results show that the proposed approach achieves a compression rate of 2. 2 and PSNR of 51 on different test images. 展开更多
关键词 CMOS image sensor focal plane compression predictive coding entropy coder
下载PDF
Web Voice Browser Based on an ISLPC Text-to-Speech Algorithm
6
作者 LIAO Rikun JI Yuefeng LI Hui 《Wuhan University Journal of Natural Sciences》 CAS 2006年第5期1157-1160,共4页
A kind of Web voice browser based on improved synchronous linear predictive coding (ISLPC) and Text-toSpeech (TTS) algorithm and Internet application was proposed. The paper analyzes the features of TTS system wit... A kind of Web voice browser based on improved synchronous linear predictive coding (ISLPC) and Text-toSpeech (TTS) algorithm and Internet application was proposed. The paper analyzes the features of TTS system with ISLPC speech synthesis and discusses the design and implementation of ISLPC TTS-based Web voice browser. The browser integrates Web technology, Chinese information processing, artificial intelligence and the key technology of Chinese ISLPC speech synthesis. It's a visual and audible web browser that can improve information precision for network users. The evaluation results show that ISLPC-based TTS model has a better performance than other browsers in voice quality and capability of identifying Chinese characters. 展开更多
关键词 improved synchronous linear predictive coding (ISLPC) Text-to-Speech (TTS) Web voice browser voice quality
下载PDF
Comparison of Khasi Speech Representations with Different Spectral Features and Hidden Markov States
7
作者 Bronson Syiem Sushanta Kabir Dutta +1 位作者 Juwesh Binong Lairenlakpam Joyprakash Singh 《Journal of Electronic Science and Technology》 CAS CSCD 2021年第2期155-162,共8页
In this paper,we present a comparison of Khasi speech representations with four different spectral features and novel extension towards the development of Khasi speech corpora.These four features include linear predic... In this paper,we present a comparison of Khasi speech representations with four different spectral features and novel extension towards the development of Khasi speech corpora.These four features include linear predictive coding(LPC),linear prediction cepstrum coefficient(LPCC),perceptual linear prediction(PLP),and Mel frequency cepstral coefficient(MFCC).The 10-hour speech data were used for training and 3-hour data for testing.For each spectral feature,different hidden Markov model(HMM)based recognizers with variations in HMM states and different Gaussian mixture models(GMMs)were built.The performance was evaluated by using the word error rate(WER).The experimental results show that MFCC provides a better representation for Khasi speech compared with the other three spectral features. 展开更多
关键词 Acoustic model(AM) Gaussian mixture model(GMM) hidden Markov model(HMM) language model(LM) linear predictive coding(LPC) linear prediction cepstral coefficient(LPCC) Mel frequency cepstral coefficient(MFCC) perceptual linear prediction(PLP)
下载PDF
An Approach to Hide Secret Speech Information
8
作者 吴志军 段海新 李星 《Journal of Shanghai Jiaotong university(Science)》 EI 2006年第2期134-139,共6页
This paper presented an approach to hide secret speech information in code excited linear prediction (CELP)-based speech coding scheme by adopting the analysis-by-synthesis (ABS)-based algorithm of speech information ... This paper presented an approach to hide secret speech information in code excited linear prediction (CELP)-based speech coding scheme by adopting the analysis-by-synthesis (ABS)-based algorithm of speech information hiding and extracting for the purpose of secure speech communication. The secret speech is coded in 2.4 Kb/s mixed excitation linear prediction (MELP), which is embedded in CELP type public speech. The ABS algorithm adopts speech synthesizer in speech coder. Speech embedding and coding are synchronous, i.e. a fusion of speech information data of public and secret. The experiment of embedding 2.4 Kb/s MELP secret speech in G.728 scheme coded public speech transmitted via public switched telephone network (PSTN) shows that the proposed approach satisfies the requirements of information hiding, meets the secure communication speech quality constraints, and achieves high hiding capacity of average 3.2 Kb/s with an excellent speech quality and complicating speakers’ recognition. 展开更多
关键词 information hiding analysis-by-synthesis (ABS) code excited linear prediction (CELP) EMBED EXTRACT
下载PDF
Wake-Up-Word Feature Extraction on FPGA
9
作者 Veton ZKepuska Mohamed MEljhani Brian HHight 《World Journal of Engineering and Technology》 2014年第1期1-12,共12页
Wake-Up-Word Speech Recognition task (WUW-SR) is a computationally very demand, particularly the stage of feature extraction which is decoded with corresponding Hidden Markov Models (HMMs) in the back-end stage of the... Wake-Up-Word Speech Recognition task (WUW-SR) is a computationally very demand, particularly the stage of feature extraction which is decoded with corresponding Hidden Markov Models (HMMs) in the back-end stage of the WUW-SR. The state of the art WUW-SR system is based on three different sets of features: Mel-Frequency Cepstral Coefficients (MFCC), Linear Predictive Coding Coefficients (LPC), and Enhanced Mel-Frequency Cepstral Coefficients (ENH_MFCC). In (front-end of Wake-Up-Word Speech Recognition System Design on FPGA) [1], we presented an experimental FPGA design and implementation of a novel architecture of a real-time spectrogram extraction processor that generates MFCC, LPC, and ENH_MFCC spectrograms simultaneously. In this paper, the details of converting the three sets of spectrograms 1) Mel-Frequency Cepstral Coefficients (MFCC), 2) Linear Predictive Coding Coefficients (LPC), and 3) Enhanced Mel-Frequency Cepstral Coefficients (ENH_MFCC) to their equivalent features are presented. In the WUW- SR system, the recognizer’s frontend is located at the terminal which is typically connected over a data network to remote back-end recognition (e.g., server). The WUW-SR is shown in Figure 1. The three sets of speech features are extracted at the front-end. These extracted features are then compressed and transmitted to the server via a dedicated channel, where subsequently they are decoded. 展开更多
关键词 Speech Recognition System Feature Extraction Mel-Frequency Cepstral Coefficients Linear Predictive Coding Coefficients Enhanced Mel-Frequency Cepstral Coefficients Hidden Markov Models Field-Programmable Gate Arrays
下载PDF
Energy-information trade-off induces continuous and discontinuous phase transitions in lateral predictive coding
10
作者 Zhen-Ye Huang Ruyi Zhou +1 位作者 Miao Huang Hai-Jun Zhou 《Science China(Physics,Mechanics & Astronomy)》 SCIE EI CAS CSCD 2024年第6期79-85,共7页
Lateral predictive coding is a recurrent neural network that creates energy-efficient internal representations by exploiting statistical regularity in sensory inputs.Here,we analytically investigate the trade-off betw... Lateral predictive coding is a recurrent neural network that creates energy-efficient internal representations by exploiting statistical regularity in sensory inputs.Here,we analytically investigate the trade-off between information robustness and energy in a linear model of lateral predictive coding and numerically minimize a free energy quantity.We observed several phase transitions in the synaptic weight matrix,particularly a continuous transition that breaks reciprocity and permutation symmetry and builds cyclic dominance and a discontinuous transition with the associated sudden emergence of tight balance between excitatory and inhibitory interactions.The optimal network follows an ideal gas law over an extended temperature range and saturates the efficiency upper bound of energy use.These results provide theoretical insights into the emergence and evolution of complex internal models in predictive processing systems. 展开更多
关键词 predictive coding recurrent neural network phase transition internal model free energy
原文传递
Dynamic Brain Responses Modulated by Precise Timing Prediction in an Opposing Process 被引量:1
11
作者 Minpeng Xu Jiayuan Meng +2 位作者 Haiqing Yu Tzyy-Ping Jung Dong Ming 《Neuroscience Bulletin》 SCIE CAS CSCD 2021年第1期70-80,共11页
The brain function of prediction is fundamental for human beings to shape perceptions efficiently and successively. Through decades of effort, a valuable brain activation map has been obtained for prediction. However,... The brain function of prediction is fundamental for human beings to shape perceptions efficiently and successively. Through decades of effort, a valuable brain activation map has been obtained for prediction. However,much less is known about how the brain manages the prediction process over time using traditional neuropsychological paradigms. Here, we implemented an innovative paradigm for timing prediction to precisely study the temporal dynamics of neural oscillations. In the experiment recruiting 45 participants, expectation suppression was found for the overall electroencephalographic activity,consistent with previous hemodynamic studies. Notably,we found that N1 was positively associated with predictability while N2 showed a reversed relation to predictability. Furthermore, the matching prediction had a similar profile with no timing prediction, both showing an almost saturated N1 and an absence of N2. The results indicate that the N1 process showed a ‘sharpening' effect for predictable inputs, while the N2 process showed a‘dampening' effect. Therefore, these two paradoxical neural effects of prediction, which have provoked wide confusion in accounting for expectation suppression,actually co-exist in the procedure of timing prediction but work in separate time windows. These findings strongly support a recently-proposed opposing process theory. 展开更多
关键词 Expectation suppression Predictive coding Event-related potentials Timing prediction
原文传递
Lateral predictive coding revisited:internal model,symmetry breaking,and response time
12
作者 Zhen-Ye Huang Xin-Yi Fan +1 位作者 Jianwen Zhou Hai-Jun Zhou 《Communications in Theoretical Physics》 SCIE CAS CSCD 2022年第9期158-169,共12页
Predictive coding is a promising theoretical framework in neuroscience for understanding information transmission and perception.It posits that the brain perceives the external world through internal models and update... Predictive coding is a promising theoretical framework in neuroscience for understanding information transmission and perception.It posits that the brain perceives the external world through internal models and updates these models under the guidance of prediction errors.Previous studies on predictive coding emphasized top-down feedback interactions in hierarchical multilayered networks but largely ignored lateral recurrent interactions.We perform analytical and numerical investigations in this work on the effects of single-layer lateral interactions.We consider a simple predictive response dynamics and run it on the MNIST dataset of hand-written digits.We find that learning will generally break the interaction symmetry between peer neurons,and that high input correlation between two neurons does not necessarily bring strong direct interactions between them.The optimized network responds to familiar input signals much faster than to novel or random inputs,and it significantly reduces the correlations between the output states of pairs of neurons. 展开更多
关键词 neural network response dynamics predictive coding SIMILARITY symmetry breaking
原文传递
Hot topic:Review of the current and future technologies for video compression
13
作者 Lu YU Jian-peng WANG 《Journal of Zhejiang University-Science C(Computers and Electronics)》 SCIE EI 2010年第1期1-13,共13页
Many important developments in video compression technologies have occurred during the past two decades. The block-based discrete cosine transform with motion compensation hybrid coding scheme has been widely employed... Many important developments in video compression technologies have occurred during the past two decades. The block-based discrete cosine transform with motion compensation hybrid coding scheme has been widely employed by most available video coding standards, notably the ITU-T H.26x and ISO/IEC MPEG-x families and video part of China audio video coding standard (AVS). The objective of this paper is to provide a review of the developments of the four basic building blocks of hybrid coding scheme, namely predictive coding, transform coding, quantization and entropy coding, and give theoretical analyses and summaries of the technological advancements. We further analyze the development trends and perspectives of video com- pression, highlighting problems and research directions. 展开更多
关键词 Video compression Predictive coding Transform coding Quantization Entropy coding Theoretical analysis
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部