Membrane proteins are an important kind of proteins embedded in the membranes of cells and play crucial roles in living organisms, such as ion channels,transporters, receptors. Because it is difficult to determinate t...Membrane proteins are an important kind of proteins embedded in the membranes of cells and play crucial roles in living organisms, such as ion channels,transporters, receptors. Because it is difficult to determinate the membrane protein's structure by wet-lab experiments,accurate and fast amino acid sequence-based computational methods are highly desired. In this paper, we report an online prediction tool called Mem Brain, whose input is the amino acid sequence. Mem Brain consists of specialized modules for predicting transmembrane helices, residue–residue contacts and relative accessible surface area of a-helical membrane proteins. Mem Brain achieves aprediction accuracy of 97.9% of ATMH, 87.1% of AP,3.2 ± 3.0 of N-score, 3.1 ± 2.8 of C-score. Mem BrainContact obtains 62%/64.1% prediction accuracy on training and independent dataset on top L/5 contact prediction,respectively. And Mem Brain-Rasa achieves Pearson correlation coefficient of 0.733 and its mean absolute error of13.593. These prediction results provide valuable hints for revealing the structure and function of membrane proteins.Mem Brain web server is free for academic use and available at www.csbio.sjtu.edu.cn/bioinf/Mem Brain/.展开更多
Environment matting and compositing is a technique to extract a foreground object, including color, opacity, reflec- tive and refractive properties, from a real-world scene, and synthesize new images by placing it int...Environment matting and compositing is a technique to extract a foreground object, including color, opacity, reflec- tive and refractive properties, from a real-world scene, and synthesize new images by placing it into new environments. The description of the captured object is named environment matte. Recent matting and compositing techniques can produce quite realistic images for objects with complex optical properties. This paper presents an approximate method to transform the matte by simulating variation of the foreground object’s refractive index. Our algorithms can deal with achromatous-and-transparent ob- jects and the experimental results are visually acceptable. Our idea and method can be applied to produce some special video effects, which could be very useful in film making, compared with the extreme difficulty of physically changing an object’s refractive index.展开更多
Background With the development of information technology,there is a significant increase in the number of network traffic logs mixed with various types of cyberattacks.Traditional intrusion detection systems(IDSs)are...Background With the development of information technology,there is a significant increase in the number of network traffic logs mixed with various types of cyberattacks.Traditional intrusion detection systems(IDSs)are limited in detecting new inconstant patterns and identifying malicious traffic traces in real time.Therefore,there is an urgent need to implement more effective intrusion detection technologies to protect computer security.Methods In this study,we designed a hybrid IDS by combining our incremental learning model(KANSOINN)and active learning to learn new log patterns and detect various network anomalies in real time.Conclusions Experimental results on the NSLKDD dataset showed that KAN-SOINN can be continuously improved and effectively detect malicious logs.Meanwhile,comparative experiments proved that using a hybrid query strategy in active learning can improve the model learning efficiency.展开更多
While databases are widely-used in commercial user-facing services that have stringent quality-of-service(QoS)requirement,it is crucial to ensure their good performance and minimize the hardware usage at the same time...While databases are widely-used in commercial user-facing services that have stringent quality-of-service(QoS)requirement,it is crucial to ensure their good performance and minimize the hardware usage at the same time.Our investigation shows that the optimal DBMS(database management system)software configuration varies for different user request patterns(i.e.,workloads)and hardware configurations.It is challenging to identify the optimal software and hardware configurations for a database workload,because DBMSs have hundreds of tunable knobs,the effect of tuning a knob depends on other knobs,and the dependency relationship changes under different hardware configurations.In this paper,we propose SHA,a software and hardware auto-tuning system for DBMSs.SHA is comprised of a scaling-based performance predictor,a reinforcement learning(RL)based software tuner,and a QoS-aware resource reallocator.The performance predictor predicts its optimal performance with different hardware configurations and identifies the minimum amount of resources for satisfying its performance requirement.The software tuner fine-tunes the DBMS software knobs to optimize the performance of the workload.The resource reallocator assigns the saved resources to other applications to improve resource utilization without incurring QoS violation of the database workload.Experimental results show that SHA improves the performance of database workloads by 9.9%on average compared with a state-of-the-art solution when the hardware configuration is fixed,and improves 43.2%of resource utilization while ensuring the QoS.展开更多
A lambda system with algebraic operators,lambda-plus system,is introduced.After giving the definitions of the system,we present a sufficient condition for formulating a model of the system. Finally,a model of such sys...A lambda system with algebraic operators,lambda-plus system,is introduced.After giving the definitions of the system,we present a sufficient condition for formulating a model of the system. Finally,a model of such system is constructed.展开更多
The X calculus is a model of concurrent and mobile systems. It emphasizes that communications are information exchanges. In the paper, two constructions are incorporated into the framework of the chi calculus, which a...The X calculus is a model of concurrent and mobile systems. It emphasizes that communications are information exchanges. In the paper, two constructions are incorporated into the framework of the chi calculus, which are asymmetric communication and mismatch condition widely used in applications. Since the barbed bisimilarity has proved its generality and gained its popularity as an effective approach to generating a reasonable observational equivalence, we study both the operational and algebraic properties of the barbed bisimilarity in this enriched calculus. The investigation supports an improved understanding of the bisimulation behaviors of the model. It also gives a general picture of how the two constructions affect the observational theory.展开更多
A great many practical applications have observed knowledge evolution,i.e.,continuous born of new knowledge,with its formation influenced by the structure of historical knowledge.This observation gives rise to evolvin...A great many practical applications have observed knowledge evolution,i.e.,continuous born of new knowledge,with its formation influenced by the structure of historical knowledge.This observation gives rise to evolving knowledge graphs whose structure temporally grows over time.However,both the modal characterization and the algorithmic implementation of evolving knowledge graphs remain unexplored.To this end,we propose EvolveKG–a general framework that enables algorithms in the static knowledge graphs to learn the evolving ones.EvolveKG quantifies the influence of a historical fact on a current one,called the effectiveness of the fact,and makes knowledge prediction by leveraging all the cross-time knowledge interaction.The novelty of EvolveKG lies in Derivative Graph–a weighted snapshot of evolution at a certain time.Particularly,each weight quantifies knowledge effectiveness through a temporarily decaying function of consistency and attenuation,two proposed factors depicting whether or not the effectiveness of a fact fades away with time.Besides,considering both knowledge creation and loss,we obtain higher prediction accuracy when the effectiveness of all the facts increases with time or remains unchanged.Under four real datasets,the superiority of EvolveKG is confirmed in prediction accuracy.展开更多
The white-box attack is a new attack context in which it is assumed that cryptographic software is implemented on an un-trusted platform and all the implementation details are controlled by the attackers. So far, almo...The white-box attack is a new attack context in which it is assumed that cryptographic software is implemented on an un-trusted platform and all the implementation details are controlled by the attackers. So far, almost all white-box solutions have been broken. In this study, we propose a white-box encryption scheme that is not a variant of obfuscating existing ciphers but a completely new solution. The new scheme is based on the unbalanced Feistel network as well as the ASASASA (where "A" means affine, and "S" means substitution) structure. It has an optional input block size and is suitable for saving space compared with other solutions because the space requirement grows slowly (linearly) with the growth of block size. Moreover, our scheme not only has huge white-box diversity and white-box ambiguity but also has a particular construction to bypass public white-box cryptanalysis techniques, including attacks aimed at white-box variants of existing ciphers and attacks specific to the ASASASA structure. More precisely, we present a definition of white-box security with regard to equivalent key, and prove that our scheme satisfies such security requirement.展开更多
Symbolic execution is widely used in many code analysis, testing, and verification tools. As symbolic execu- tion exhaustively explores all feasible paths, it is quite time consuming. To handle the problem, researcher...Symbolic execution is widely used in many code analysis, testing, and verification tools. As symbolic execu- tion exhaustively explores all feasible paths, it is quite time consuming. To handle the problem, researchers have par- alleled existing symbolic execution tools (e.g., KLEE). In particular, Cloud9 is a widely used paralleled symbolic exe- cution tool, and researchers have used the tool to analyze real code. However, researchers criticize that tools such as Cloud9 still cannot analyze large scale code. In this paper, we con- duct a field study on Cloud9, in which we use KLEE and Cloud9 to analyze benchmarks in C. Our results confirm the criticism. Based on the results, we identify three bottlenecks that hinder the performance of Cloud9: the communication time gap, the job transfer policy, and the cache management of the solved constraints. To handle these problems, we tune the communication time gap with better parameters, modify the job transfer policy, and implement an approach for cache management of solved constraints. We conduct two evalua- tions on our benchmarks and a real application to understand our improvements. Our results show that our tuned Cloud9 reduces the execution time significantly, both on our bench- marks and the real application. Furthermore, our evaluation results show that our tuning techniques improve the effective- ness on all the devices, and the improvement can be achievedupto five times, depending upon a tuning value of our ap- proach and the behaviour of program under test.展开更多
In this paper, we propose to detect a special group of microblog users: the "marionette" users, who are created or employed by backstage "puppeteers", either through programs or manually. Unlike normal users that...In this paper, we propose to detect a special group of microblog users: the "marionette" users, who are created or employed by backstage "puppeteers", either through programs or manually. Unlike normal users that access microblog for information sharing or social communication, the marionette users perform specific tasks to earn financial profits. For example, they follow certain users to increase their "statistical popularity", or retweet some tweets to amplify their "statistical impact". The fabricated follower or retweet counts not only mislead normal users to wrong information, but also seriously impair microblog-based applications, such as hot tweets selection and expert finding. In this paper, we study the important problem of detecting marionette users on microblog platforms. This problem is challenging because puppeteers are employing complicated strategies to generate marionette users that present similar behaviors as normal users. To tackle this challenge, we propose to take into account two types of discriminative information: 1) individual user tweeting behavior and 2) the social interactions among users. By integrating both information into a semi-supervised probabilistic model, we can effectively distinguish marionette users from normal ones. By applying the proposed model to one of the most popular microblog platforms (Sina Weibo) in China, we find that the model can detect marionette users with F-measure close to 0.9. In addition, we apply the proposed model to calculate the marionette ratio of the top 200 most followed microbloggers and the top 50 most retweeted posts in Sina Weibo. To accelerate the detecting speed and reduce feature generation cost, we further propose a light-weight model which utilizes fewer features to identify marionettes from retweeters.展开更多
In this paper, we give an up-to-date survey on physically-based fluid animation research. As one of the most popular approaches to simulate realistic fluid effects, physically-based fluid animation has spurred a large...In this paper, we give an up-to-date survey on physically-based fluid animation research. As one of the most popular approaches to simulate realistic fluid effects, physically-based fluid animation has spurred a large number of new results in recent years. We classify and discuss the existing methods within three categories: Lagrangian method, Eulerian method and Lattice-Boltzmann method. We then introduce techniques for seven different kinds of special fluid effects. Finally we review the latest hot research areas and point out some future research trends, including surface tracking, fluid control, hybrid method, model reduction, etc.展开更多
Identifying ambiguous queries is crucial to research on personalized Web search and search result diversity. Intuitively, query logs contain valuable information on how many intentions users have when issuing a query....Identifying ambiguous queries is crucial to research on personalized Web search and search result diversity. Intuitively, query logs contain valuable information on how many intentions users have when issuing a query. However, previous work showed user clicks alone are misleading in judging a query as being ambiguous or not. In this paper, we address the problem of learning a query ambiguity model by using search logs. First, we propose enriching a query by mining the documents clicked by users and the relevant follow up queries in a session. Second, we use a text classifier to map the documents and the queries into predefined categories. Third, we propose extracting features from the processed data. Finally, we apply a state-of-the-art algorithm, Support Vector Machine (SVM), to learn a query ambiguity classifier. Experimental results verify that the sole use of click based features or session based features perform worse than the previous work based on top retrieved documents. When we combine the two sets of features, our proposed approach achieves the best effectiveness, specifically 86% in terms of accuracy. It significantly improves the click based method by 5.6% and the session based method by 4.6%.展开更多
In the research of software reuse, feature models have been widely adopted to capture, organize and reuse the requirements of a set of similar applications in a software do- main. However, the construction, especially...In the research of software reuse, feature models have been widely adopted to capture, organize and reuse the requirements of a set of similar applications in a software do- main. However, the construction, especially the refinement, of feature models is a labor-intensive process, and there lacks an effective way to aid domain engineers in refining feature models. In this paper, we propose a new approach to support interactive refinement of feature models based on the view updating technique. The basic idea of our approach is to first extract features and relationships of interest from a possibly large and complicated feature model, then organize them into a comprehensible view, and finally refine the feature model through modifications on the view. The main characteristics of this approach are twofold: a set of powerful rules (as the slicing criterion) to slice the feature model into a view auto- matically, and a novel use of a bidirectional transformation language to make the view updatable. We have successfully developed a tool, and a nontrivial case study shows the feasi- bility of this approach.展开更多
Protein phosphorylation/dephosphorylation is the central mechanism of post-translational modification which regulates cellular responses and phenotypes. Due to the efficiency and resource constraints of the in vivo me...Protein phosphorylation/dephosphorylation is the central mechanism of post-translational modification which regulates cellular responses and phenotypes. Due to the efficiency and resource constraints of the in vivo methods for identifying phosphorylation sites, there is a strong motivation to computationally predict potential phosphorylation sites. In this work, we propose to use a unique set of features to represent the peptides surrounding the amino acid sites of interest and use feature selection support vector machine to predict whether the serine/threonine sites are potentially phosphorylable, as well as selecting important features that may lead to phosphorylation. Experimental results indicate that the new features and the prediction method can more effectively predict protein phosphorylation sites than the existing state of the art methods. The features selected by our prediction model provide biological insights to the in vivo phosphorylation.展开更多
In the reinforcement learning,policy evaluation aims to predict long-term values of a state under a certain policy.Since high-dimensional representations become more and more common in the reinforcement learning,how t...In the reinforcement learning,policy evaluation aims to predict long-term values of a state under a certain policy.Since high-dimensional representations become more and more common in the reinforcement learning,how to reduce the computational cost becomes a significant problem to the policy evaluation.Many recent works focus on adopting matrix sketching methods to accelerate least-square temporal difference(TD)algorithms and quasi-Newton temporal difference algorithms.Among these sketching methods,the truncated incremental SVD shows better performance because it is stable and efficient.However,the convergence properties of the incremental SVD is still open.In this paper,we first show that the conventional incremental SVD algorithms could have enormous approximation errors in the worst case.Then we propose a variant of incremental SVD with better theoretical guarantees by shrinking the singular values periodically.Moreover,we employ our improved incremental SVD to accelerate least-square TD and quasi-Newton TD algorithms.The experimental results verify the correctness and effectiveness of our methods.展开更多
Robust regression plays an important role in many machine learning problems.A primal approach relies on the use of Huber loss and an iteratively reweightedℓ2 method.However,because the Huber loss is not smooth and its...Robust regression plays an important role in many machine learning problems.A primal approach relies on the use of Huber loss and an iteratively reweightedℓ2 method.However,because the Huber loss is not smooth and its corresponding distribution cannot be represented as a Gaussian scale mixture,such an approach is extremely difficult to handle using a probabilistic framework.To address those limitations,this paper proposes two novel losses and the corresponding probability functions.One is called Soft Huber,which is well suited for modeling non-Gaussian noise.Another is Nonconvex Huber,which can help produce much sparser results when imposed as a prior on regression vector.They can represent anyℓq loss(1/2≤q<2)with tuning parameters,which makes the regression model more robust.We also show that both distributions have an elegant form,which is a Gaussian scale mixture with a generalized inverse Gaussian mixing density.This enables us to devise an expectation maximization(EM)algorithm for solving the regression model.We can obtain an adaptive weight through EM,which is very useful to remove noise data or irrelevant features in regression problems.We apply our model to the face recognition problem and show that it not only reduces the impact of noise pixels but also removes more irrelevant face images.Our experiments demonstrate the promising results on two datasets.展开更多
The increasing interest in exploring the correlation between personal-ity traits and real-life individual characteristics has been driven by the growing popularity of the Myers–Briggs Type Indicator(MBTI)on social me...The increasing interest in exploring the correlation between personal-ity traits and real-life individual characteristics has been driven by the growing popularity of the Myers–Briggs Type Indicator(MBTI)on social media plat-forms.To investigate this correlation,we conduct an analysis on a Myers–Briggs Type Indicator(MBTI)-demographic dataset and present MBTIviz,a visualiza-tion system that enables researchers to conduct a comprehensive and accessible analysis of the correlation between personality and demographic variables such as occupation and nationality.While humanities and computer disciplines provide valuable insights into the behavior of small groups and data analysis,analysing demographic data with personality information poses challenges due to the com-plexity of big data.Additionally,the correlation analysis table commonly used in the humanities does not offer an intuitive representation when examining the relationship between variables.To address these issues,our system provides an integrated view of statistical data that presents all demographic information in a single visual format and a more informative and visually appealing approach to presenting correlation data,facilitating further exploration of the linkages between personality traits and real-life individual characteristics.It also includes machine learning predictive views that help nonexpert users understand their personality traits and provide career predictions based on demographic data.In this paper,we utilize the MBTIviz system to analyse the MBTI-demographic dataset,calcu-lating age,gender,and occupation percentages for each MBTI and studying the correlation between MBTI,occupation,and nationality.展开更多
基金supported by the National Natural Science Foundation of China(Nos.61671288,91530321,61603161)Science and Technology Commission of Shanghai Municipality(Nos.16JC1404300,17JC1403500,16ZR1448700)
文摘Membrane proteins are an important kind of proteins embedded in the membranes of cells and play crucial roles in living organisms, such as ion channels,transporters, receptors. Because it is difficult to determinate the membrane protein's structure by wet-lab experiments,accurate and fast amino acid sequence-based computational methods are highly desired. In this paper, we report an online prediction tool called Mem Brain, whose input is the amino acid sequence. Mem Brain consists of specialized modules for predicting transmembrane helices, residue–residue contacts and relative accessible surface area of a-helical membrane proteins. Mem Brain achieves aprediction accuracy of 97.9% of ATMH, 87.1% of AP,3.2 ± 3.0 of N-score, 3.1 ± 2.8 of C-score. Mem BrainContact obtains 62%/64.1% prediction accuracy on training and independent dataset on top L/5 contact prediction,respectively. And Mem Brain-Rasa achieves Pearson correlation coefficient of 0.733 and its mean absolute error of13.593. These prediction results provide valuable hints for revealing the structure and function of membrane proteins.Mem Brain web server is free for academic use and available at www.csbio.sjtu.edu.cn/bioinf/Mem Brain/.
基金Project supported by the National Natural Science Foundation of China (No. 60403044) and Microsoft Research Asia (PROJECT-2004-IMAGE-01)
文摘Environment matting and compositing is a technique to extract a foreground object, including color, opacity, reflec- tive and refractive properties, from a real-world scene, and synthesize new images by placing it into new environments. The description of the captured object is named environment matte. Recent matting and compositing techniques can produce quite realistic images for objects with complex optical properties. This paper presents an approximate method to transform the matte by simulating variation of the foreground object’s refractive index. Our algorithms can deal with achromatous-and-transparent ob- jects and the experimental results are visually acceptable. Our idea and method can be applied to produce some special video effects, which could be very useful in film making, compared with the extreme difficulty of physically changing an object’s refractive index.
基金Supported by SJTU-HUAWEI TECH Cybersecurity Innovation Lab。
文摘Background With the development of information technology,there is a significant increase in the number of network traffic logs mixed with various types of cyberattacks.Traditional intrusion detection systems(IDSs)are limited in detecting new inconstant patterns and identifying malicious traffic traces in real time.Therefore,there is an urgent need to implement more effective intrusion detection technologies to protect computer security.Methods In this study,we designed a hybrid IDS by combining our incremental learning model(KANSOINN)and active learning to learn new log patterns and detect various network anomalies in real time.Conclusions Experimental results on the NSLKDD dataset showed that KAN-SOINN can be continuously improved and effectively detect malicious logs.Meanwhile,comparative experiments proved that using a hybrid query strategy in active learning can improve the model learning efficiency.
基金sponsored by the National Natural Science Foundation of China under Grant Nos.62022057,61832006,61632017,and 61872240.
文摘While databases are widely-used in commercial user-facing services that have stringent quality-of-service(QoS)requirement,it is crucial to ensure their good performance and minimize the hardware usage at the same time.Our investigation shows that the optimal DBMS(database management system)software configuration varies for different user request patterns(i.e.,workloads)and hardware configurations.It is challenging to identify the optimal software and hardware configurations for a database workload,because DBMSs have hundreds of tunable knobs,the effect of tuning a knob depends on other knobs,and the dependency relationship changes under different hardware configurations.In this paper,we propose SHA,a software and hardware auto-tuning system for DBMSs.SHA is comprised of a scaling-based performance predictor,a reinforcement learning(RL)based software tuner,and a QoS-aware resource reallocator.The performance predictor predicts its optimal performance with different hardware configurations and identifies the minimum amount of resources for satisfying its performance requirement.The software tuner fine-tunes the DBMS software knobs to optimize the performance of the workload.The resource reallocator assigns the saved resources to other applications to improve resource utilization without incurring QoS violation of the database workload.Experimental results show that SHA improves the performance of database workloads by 9.9%on average compared with a state-of-the-art solution when the hardware configuration is fixed,and improves 43.2%of resource utilization while ensuring the QoS.
基金Supported by Chinese Natural Science Foundation.
文摘A lambda system with algebraic operators,lambda-plus system,is introduced.After giving the definitions of the system,we present a sufficient condition for formulating a model of the system. Finally,a model of such system is constructed.
基金Supported by the National Grand Fundamental Research 973 Program of China under Grant No.2003CB317005the National Natural Science Foundation of China under Grant No.60473006the National Research Foundation for the Doctoral Program of Education of China under Grant No.20010248033.
文摘The X calculus is a model of concurrent and mobile systems. It emphasizes that communications are information exchanges. In the paper, two constructions are incorporated into the framework of the chi calculus, which are asymmetric communication and mismatch condition widely used in applications. Since the barbed bisimilarity has proved its generality and gained its popularity as an effective approach to generating a reasonable observational equivalence, we study both the operational and algebraic properties of the barbed bisimilarity in this enriched calculus. The investigation supports an improved understanding of the bisimulation behaviors of the model. It also gives a general picture of how the two constructions affect the observational theory.
基金supported in part by the National Key R&D Program of China(No.2021ZD0113305)the National Natural Science Foundation of China(Grant Nos.61960206008,62002292,42050105,62020106005,62061146002,61960206002)+1 种基金the National Science Fund for Distinguished Young Scholars(No.61725205)Shanghai Pilot Program for Basic Research-Shanghai Jiao Tong University.
文摘A great many practical applications have observed knowledge evolution,i.e.,continuous born of new knowledge,with its formation influenced by the structure of historical knowledge.This observation gives rise to evolving knowledge graphs whose structure temporally grows over time.However,both the modal characterization and the algorithmic implementation of evolving knowledge graphs remain unexplored.To this end,we propose EvolveKG–a general framework that enables algorithms in the static knowledge graphs to learn the evolving ones.EvolveKG quantifies the influence of a historical fact on a current one,called the effectiveness of the fact,and makes knowledge prediction by leveraging all the cross-time knowledge interaction.The novelty of EvolveKG lies in Derivative Graph–a weighted snapshot of evolution at a certain time.Particularly,each weight quantifies knowledge effectiveness through a temporarily decaying function of consistency and attenuation,two proposed factors depicting whether or not the effectiveness of a fact fades away with time.Besides,considering both knowledge creation and loss,we obtain higher prediction accuracy when the effectiveness of all the facts increases with time or remains unchanged.Under four real datasets,the superiority of EvolveKG is confirmed in prediction accuracy.
基金This work was supported by the National Natural Science Foundation of China under Grant Nos. 61272440, 61472251, and U1536101, and China Postdoctoral Science Foundation under Grant Nos. 2013M531174 and 2014T70417.
文摘The white-box attack is a new attack context in which it is assumed that cryptographic software is implemented on an un-trusted platform and all the implementation details are controlled by the attackers. So far, almost all white-box solutions have been broken. In this study, we propose a white-box encryption scheme that is not a variant of obfuscating existing ciphers but a completely new solution. The new scheme is based on the unbalanced Feistel network as well as the ASASASA (where "A" means affine, and "S" means substitution) structure. It has an optional input block size and is suitable for saving space compared with other solutions because the space requirement grows slowly (linearly) with the growth of block size. Moreover, our scheme not only has huge white-box diversity and white-box ambiguity but also has a particular construction to bypass public white-box cryptanalysis techniques, including attacks aimed at white-box variants of existing ciphers and attacks specific to the ASASASA structure. More precisely, we present a definition of white-box security with regard to equivalent key, and prove that our scheme satisfies such security requirement.
文摘Symbolic execution is widely used in many code analysis, testing, and verification tools. As symbolic execu- tion exhaustively explores all feasible paths, it is quite time consuming. To handle the problem, researchers have par- alleled existing symbolic execution tools (e.g., KLEE). In particular, Cloud9 is a widely used paralleled symbolic exe- cution tool, and researchers have used the tool to analyze real code. However, researchers criticize that tools such as Cloud9 still cannot analyze large scale code. In this paper, we con- duct a field study on Cloud9, in which we use KLEE and Cloud9 to analyze benchmarks in C. Our results confirm the criticism. Based on the results, we identify three bottlenecks that hinder the performance of Cloud9: the communication time gap, the job transfer policy, and the cache management of the solved constraints. To handle these problems, we tune the communication time gap with better parameters, modify the job transfer policy, and implement an approach for cache management of solved constraints. We conduct two evalua- tions on our benchmarks and a real application to understand our improvements. Our results show that our tuned Cloud9 reduces the execution time significantly, both on our bench- marks and the real application. Furthermore, our evaluation results show that our tuning techniques improve the effective- ness on all the devices, and the improvement can be achievedupto five times, depending upon a tuning value of our ap- proach and the behaviour of program under test.
文摘In this paper, we propose to detect a special group of microblog users: the "marionette" users, who are created or employed by backstage "puppeteers", either through programs or manually. Unlike normal users that access microblog for information sharing or social communication, the marionette users perform specific tasks to earn financial profits. For example, they follow certain users to increase their "statistical popularity", or retweet some tweets to amplify their "statistical impact". The fabricated follower or retweet counts not only mislead normal users to wrong information, but also seriously impair microblog-based applications, such as hot tweets selection and expert finding. In this paper, we study the important problem of detecting marionette users on microblog platforms. This problem is challenging because puppeteers are employing complicated strategies to generate marionette users that present similar behaviors as normal users. To tackle this challenge, we propose to take into account two types of discriminative information: 1) individual user tweeting behavior and 2) the social interactions among users. By integrating both information into a semi-supervised probabilistic model, we can effectively distinguish marionette users from normal ones. By applying the proposed model to one of the most popular microblog platforms (Sina Weibo) in China, we find that the model can detect marionette users with F-measure close to 0.9. In addition, we apply the proposed model to calculate the marionette ratio of the top 200 most followed microbloggers and the top 50 most retweeted posts in Sina Weibo. To accelerate the detecting speed and reduce feature generation cost, we further propose a light-weight model which utilizes fewer features to identify marionettes from retweeters.
基金Supported partially by the National Basic Research Program of China (Grant No. 2009CB320804)the National High-Tech Research & Development Program of China (Grant No. 2006AA01Z307)
文摘In this paper, we give an up-to-date survey on physically-based fluid animation research. As one of the most popular approaches to simulate realistic fluid effects, physically-based fluid animation has spurred a large number of new results in recent years. We classify and discuss the existing methods within three categories: Lagrangian method, Eulerian method and Lattice-Boltzmann method. We then introduce techniques for seven different kinds of special fluid effects. Finally we review the latest hot research areas and point out some future research trends, including surface tracking, fluid control, hybrid method, model reduction, etc.
文摘Identifying ambiguous queries is crucial to research on personalized Web search and search result diversity. Intuitively, query logs contain valuable information on how many intentions users have when issuing a query. However, previous work showed user clicks alone are misleading in judging a query as being ambiguous or not. In this paper, we address the problem of learning a query ambiguity model by using search logs. First, we propose enriching a query by mining the documents clicked by users and the relevant follow up queries in a session. Second, we use a text classifier to map the documents and the queries into predefined categories. Third, we propose extracting features from the processed data. Finally, we apply a state-of-the-art algorithm, Support Vector Machine (SVM), to learn a query ambiguity classifier. Experimental results verify that the sole use of click based features or session based features perform worse than the previous work based on top retrieved documents. When we combine the two sets of features, our proposed approach achieves the best effectiveness, specifically 86% in terms of accuracy. It significantly improves the click based method by 5.6% and the session based method by 4.6%.
文摘In the research of software reuse, feature models have been widely adopted to capture, organize and reuse the requirements of a set of similar applications in a software do- main. However, the construction, especially the refinement, of feature models is a labor-intensive process, and there lacks an effective way to aid domain engineers in refining feature models. In this paper, we propose a new approach to support interactive refinement of feature models based on the view updating technique. The basic idea of our approach is to first extract features and relationships of interest from a possibly large and complicated feature model, then organize them into a comprehensible view, and finally refine the feature model through modifications on the view. The main characteristics of this approach are twofold: a set of powerful rules (as the slicing criterion) to slice the feature model into a view auto- matically, and a novel use of a bidirectional transformation language to make the view updatable. We have successfully developed a tool, and a nontrivial case study shows the feasi- bility of this approach.
文摘Protein phosphorylation/dephosphorylation is the central mechanism of post-translational modification which regulates cellular responses and phenotypes. Due to the efficiency and resource constraints of the in vivo methods for identifying phosphorylation sites, there is a strong motivation to computationally predict potential phosphorylation sites. In this work, we propose to use a unique set of features to represent the peptides surrounding the amino acid sites of interest and use feature selection support vector machine to predict whether the serine/threonine sites are potentially phosphorylable, as well as selecting important features that may lead to phosphorylation. Experimental results indicate that the new features and the prediction method can more effectively predict protein phosphorylation sites than the existing state of the art methods. The features selected by our prediction model provide biological insights to the in vivo phosphorylation.
基金The corresponding author Weinan Zhang was supported by the“New Generation of AI 2030”Major Project(2018AAA0100900)the National Natural Science Foundation of China(Grant Nos.62076161,61772333,61632017).
文摘In the reinforcement learning,policy evaluation aims to predict long-term values of a state under a certain policy.Since high-dimensional representations become more and more common in the reinforcement learning,how to reduce the computational cost becomes a significant problem to the policy evaluation.Many recent works focus on adopting matrix sketching methods to accelerate least-square temporal difference(TD)algorithms and quasi-Newton temporal difference algorithms.Among these sketching methods,the truncated incremental SVD shows better performance because it is stable and efficient.However,the convergence properties of the incremental SVD is still open.In this paper,we first show that the conventional incremental SVD algorithms could have enormous approximation errors in the worst case.Then we propose a variant of incremental SVD with better theoretical guarantees by shrinking the singular values periodically.Moreover,we employ our improved incremental SVD to accelerate least-square TD and quasi-Newton TD algorithms.The experimental results verify the correctness and effectiveness of our methods.
基金We thank our anonymous reviewers for their feedback and suggestions.This work was partially sponsored by the National Basic Research 973 Program of China(2015CB352403)the National Natural Science Foundation of China(NSFC)(Grant Nos.61702328,61602301,61632017).
文摘Robust regression plays an important role in many machine learning problems.A primal approach relies on the use of Huber loss and an iteratively reweightedℓ2 method.However,because the Huber loss is not smooth and its corresponding distribution cannot be represented as a Gaussian scale mixture,such an approach is extremely difficult to handle using a probabilistic framework.To address those limitations,this paper proposes two novel losses and the corresponding probability functions.One is called Soft Huber,which is well suited for modeling non-Gaussian noise.Another is Nonconvex Huber,which can help produce much sparser results when imposed as a prior on regression vector.They can represent anyℓq loss(1/2≤q<2)with tuning parameters,which makes the regression model more robust.We also show that both distributions have an elegant form,which is a Gaussian scale mixture with a generalized inverse Gaussian mixing density.This enables us to devise an expectation maximization(EM)algorithm for solving the regression model.We can obtain an adaptive weight through EM,which is very useful to remove noise data or irrelevant features in regression problems.We apply our model to the face recognition problem and show that it not only reduces the impact of noise pixels but also removes more irrelevant face images.Our experiments demonstrate the promising results on two datasets.
基金The paper is supported by the NationalNature Science Foundation of China(Grant No.61100053)a research grant from Intel Asia-PacificResearch and Development Co.,Ltd.
文摘The increasing interest in exploring the correlation between personal-ity traits and real-life individual characteristics has been driven by the growing popularity of the Myers–Briggs Type Indicator(MBTI)on social media plat-forms.To investigate this correlation,we conduct an analysis on a Myers–Briggs Type Indicator(MBTI)-demographic dataset and present MBTIviz,a visualiza-tion system that enables researchers to conduct a comprehensive and accessible analysis of the correlation between personality and demographic variables such as occupation and nationality.While humanities and computer disciplines provide valuable insights into the behavior of small groups and data analysis,analysing demographic data with personality information poses challenges due to the com-plexity of big data.Additionally,the correlation analysis table commonly used in the humanities does not offer an intuitive representation when examining the relationship between variables.To address these issues,our system provides an integrated view of statistical data that presents all demographic information in a single visual format and a more informative and visually appealing approach to presenting correlation data,facilitating further exploration of the linkages between personality traits and real-life individual characteristics.It also includes machine learning predictive views that help nonexpert users understand their personality traits and provide career predictions based on demographic data.In this paper,we utilize the MBTIviz system to analyse the MBTI-demographic dataset,calcu-lating age,gender,and occupation percentages for each MBTI and studying the correlation between MBTI,occupation,and nationality.