针对传统细胞穿透肽的预测方法严重依赖于繁琐的特征抽取和特征重建步骤、算法复杂且准确度不高等问题,提出了一种利用自然语言处理中的字符嵌入方法结合CNN-LSTM组合机器学习框架来预测细胞穿透肽的方法.方法采用字符嵌入将氨基酸的代...针对传统细胞穿透肽的预测方法严重依赖于繁琐的特征抽取和特征重建步骤、算法复杂且准确度不高等问题,提出了一种利用自然语言处理中的字符嵌入方法结合CNN-LSTM组合机器学习框架来预测细胞穿透肽的方法.方法采用字符嵌入将氨基酸的代表字符通过网络学习映射到紧凑表示的向量空间中,每种氨基酸字符对应一个紧凑表示的向量,然后将肽序列通过由训练得到的嵌入向量转化为数值矩阵作为CNN-LSTM模型的输入,模型自行抽取特征后自动对输入序列的细胞穿透性进行预测.实验结果显示,在相同数据集进行实验时,研究的方法在测试集上的AUC (the area under ROC curve)值达到0.97,正确指数达到0.846,优于其它方法,说明上述方法能够简单、高效地进行细胞穿透肽的预测.展开更多
目的:设计一种新型的钓鱼网站检测技术,以提高检测的精确率。方法:提出了一种利用BERT(Bidirectional Encoder Representations from Transformers)提取HTML字符串嵌入特征的方法,将HTML文档转化为词嵌入向量。同时提出一种结合四种分...目的:设计一种新型的钓鱼网站检测技术,以提高检测的精确率。方法:提出了一种利用BERT(Bidirectional Encoder Representations from Transformers)提取HTML字符串嵌入特征的方法,将HTML文档转化为词嵌入向量。同时提出一种结合四种分类器的Stacking集成学习模型,使用HTML字符串嵌入特征以及筛选出的URL特征进行钓鱼网站检测。结果:在10万级数据集上精确率达到98.52%,F_(1)值达到98.81%。且相较只使用URL特征,引入上述HTML字符串嵌入特征后,检测钓鱼网站的精确率提升了近两个百分点。结论:本文所提出的基于BERT提取的HTML字符串嵌入特征对于检测钓鱼网站具有显著提升。展开更多
Sign Writing, a writing system for sign language, is becoming a useful and convenient communication aid for people who are deaf. Principally, people who are deaf find it difficult to communicate with the hearing commu...Sign Writing, a writing system for sign language, is becoming a useful and convenient communication aid for people who are deaf. Principally, people who are deaf find it difficult to communicate with the hearing community and due to recent technological advancement they communicate amongst themselves and with the hearing community via text messaging on mobile phones. Existing messaging function is limited to writing based on the Roman alphabets or pictographic languages like Mandarin; writing in signs is deemed to be deficient in a mobile context. Hence, the aim of this paper is to examine the feasibility of writing and reading text messages in signs as an alternative communication mean besides Short Messaging Service (SMS). Initial experimental results have significantly exemplified that sign writing gains well acceptance and is preferred among the hearing-impaired community to communicate within or between the hearing communities.展开更多
文摘针对传统细胞穿透肽的预测方法严重依赖于繁琐的特征抽取和特征重建步骤、算法复杂且准确度不高等问题,提出了一种利用自然语言处理中的字符嵌入方法结合CNN-LSTM组合机器学习框架来预测细胞穿透肽的方法.方法采用字符嵌入将氨基酸的代表字符通过网络学习映射到紧凑表示的向量空间中,每种氨基酸字符对应一个紧凑表示的向量,然后将肽序列通过由训练得到的嵌入向量转化为数值矩阵作为CNN-LSTM模型的输入,模型自行抽取特征后自动对输入序列的细胞穿透性进行预测.实验结果显示,在相同数据集进行实验时,研究的方法在测试集上的AUC (the area under ROC curve)值达到0.97,正确指数达到0.846,优于其它方法,说明上述方法能够简单、高效地进行细胞穿透肽的预测.
文摘目的:设计一种新型的钓鱼网站检测技术,以提高检测的精确率。方法:提出了一种利用BERT(Bidirectional Encoder Representations from Transformers)提取HTML字符串嵌入特征的方法,将HTML文档转化为词嵌入向量。同时提出一种结合四种分类器的Stacking集成学习模型,使用HTML字符串嵌入特征以及筛选出的URL特征进行钓鱼网站检测。结果:在10万级数据集上精确率达到98.52%,F_(1)值达到98.81%。且相较只使用URL特征,引入上述HTML字符串嵌入特征后,检测钓鱼网站的精确率提升了近两个百分点。结论:本文所提出的基于BERT提取的HTML字符串嵌入特征对于检测钓鱼网站具有显著提升。
文摘Sign Writing, a writing system for sign language, is becoming a useful and convenient communication aid for people who are deaf. Principally, people who are deaf find it difficult to communicate with the hearing community and due to recent technological advancement they communicate amongst themselves and with the hearing community via text messaging on mobile phones. Existing messaging function is limited to writing based on the Roman alphabets or pictographic languages like Mandarin; writing in signs is deemed to be deficient in a mobile context. Hence, the aim of this paper is to examine the feasibility of writing and reading text messages in signs as an alternative communication mean besides Short Messaging Service (SMS). Initial experimental results have significantly exemplified that sign writing gains well acceptance and is preferred among the hearing-impaired community to communicate within or between the hearing communities.