The Large Sky Area Multi-Object Fiber Spectroscopic Telescope (LAMOST, also called the Guo Shou Jing Telescope) is a special reflecting Schmidt telescope. LAMOST’s special design allows both a large aperture (effecti...The Large Sky Area Multi-Object Fiber Spectroscopic Telescope (LAMOST, also called the Guo Shou Jing Telescope) is a special reflecting Schmidt telescope. LAMOST’s special design allows both a large aperture (effective aperture of 3.6 m–4.9 m) and a wide field of view (FOV) (5°). It has an innovative active reflecting Schmidt configuration which continuously changes the mirror’s surface that adjusts during the observation process and combines thin deformable mirror active optics with segmented active optics. Its primary mirror (6.67m×6.05 m) and active Schmidt mirror (5.74m×4.40 m) are both segmented, and composed of 37 and 24 hexagonal sub-mirrors respectively. By using a parallel controllable fiber positioning technique, the focal surface of 1.75 m in diameter can accommodate 4000 optical fibers. Also, LAMOST has 16 spectrographs with 32 CCD cameras. LAMOST will be the telescope with the highest rate of spectral acquisition. As a national large scientific project, the LAMOST project was formally proposed in 1996, and approved by the Chinese government in 1997. The construction started in 2001, was completed in 2008 and passed the official acceptance in June 2009. The LAMOST pilot survey was started in October 2011 and the spectroscopic survey will launch in September 2012. Up to now, LAMOST has released more than 480 000 spectra of objects. LAMOST will make an important contribution to the study of the large-scale structure of the Universe, structure and evolution of the Galaxy, and cross-identification of multiwaveband properties in celestial objects.展开更多
数据流分类是数据流挖掘领域一项重要研究任务,目标是从不断变化的海量数据中捕获变化的类结构.目前,几乎没有框架可以同时处理数据流中常见的多类非平衡、概念漂移、异常点和标记样本成本高昂问题.基于此,提出一种非平衡数据流在线主...数据流分类是数据流挖掘领域一项重要研究任务,目标是从不断变化的海量数据中捕获变化的类结构.目前,几乎没有框架可以同时处理数据流中常见的多类非平衡、概念漂移、异常点和标记样本成本高昂问题.基于此,提出一种非平衡数据流在线主动学习方法(Online active learning method for imbalanced data stream,OALM-IDS).AdaBoost是一种将多个弱分类器经过迭代生成强分类器的集成分类方法,AdaBoost.M2引入了弱分类器的置信度,此类方法常用于静态数据.定义了基于非平衡比率和自适应遗忘因子的训练样本重要性度量,从而使AdaBoost.M2方法适用于非平衡数据流,提升了非平衡数据流集成分类器的性能.提出了边际阈值矩阵的自适应调整方法,优化了标签请求策略.将概念漂移程度融入模型构建过程中,定义了基于概念漂移指数的自适应遗忘因子,实现了漂移后的模型重构.在6个人工数据流和4个真实数据流上的对比实验表明,提出的非平衡数据流在线主动学习方法的分类性能优于其他5种非平衡数据流学习方法.展开更多
Recently, sequence anomaly detection has been widely used in many fields. Sequence data in these fields are usually multi-dimensional over the data stream. It is a challenge to design an anomaly detection method for a...Recently, sequence anomaly detection has been widely used in many fields. Sequence data in these fields are usually multi-dimensional over the data stream. It is a challenge to design an anomaly detection method for a multi-dimensional sequence over the data stream to satisfy the requirements of accuracy and high speed. It is because:(1) Redundant dimensions in sequence data and large state space lead to a poor ability for sequence modeling;(2) Anomaly detection cannot adapt to the high-speed nature of the data stream, especially when concept drift occurs, and it will reduce the detection rate. On one hand, most existing methods of sequence anomaly detection focus on the single-dimension sequence. On the other hand, some studies concerning multi-dimensional sequence concentrate mainly on the static database rather than the data stream. To improve the performance of anomaly detection for a multi-dimensional sequence over the data stream, we propose a novel unsupervised fast and accurate anomaly detection(FAAD) method which includes three algorithms. First, a method called "information calculation and minimum spanning tree cluster" is adopted to reduce redundant dimensions. Second, to speed up model construction and ensure the detection rate for the sequence over the data stream, we propose a method called"random sampling and subsequence partitioning based on the index probabilistic suffix tree." Last, the method called "anomaly buffer based on model dynamic adjustment" dramatically reduces the effects of concept drift in the data stream. FAAD is implemented on the streaming platform Storm to detect multi-dimensional log audit data.Compared with the existing anomaly detection methods, FAAD has a good performance in detection rate and speed without being affected by concept drift.展开更多
文摘The Large Sky Area Multi-Object Fiber Spectroscopic Telescope (LAMOST, also called the Guo Shou Jing Telescope) is a special reflecting Schmidt telescope. LAMOST’s special design allows both a large aperture (effective aperture of 3.6 m–4.9 m) and a wide field of view (FOV) (5°). It has an innovative active reflecting Schmidt configuration which continuously changes the mirror’s surface that adjusts during the observation process and combines thin deformable mirror active optics with segmented active optics. Its primary mirror (6.67m×6.05 m) and active Schmidt mirror (5.74m×4.40 m) are both segmented, and composed of 37 and 24 hexagonal sub-mirrors respectively. By using a parallel controllable fiber positioning technique, the focal surface of 1.75 m in diameter can accommodate 4000 optical fibers. Also, LAMOST has 16 spectrographs with 32 CCD cameras. LAMOST will be the telescope with the highest rate of spectral acquisition. As a national large scientific project, the LAMOST project was formally proposed in 1996, and approved by the Chinese government in 1997. The construction started in 2001, was completed in 2008 and passed the official acceptance in June 2009. The LAMOST pilot survey was started in October 2011 and the spectroscopic survey will launch in September 2012. Up to now, LAMOST has released more than 480 000 spectra of objects. LAMOST will make an important contribution to the study of the large-scale structure of the Universe, structure and evolution of the Galaxy, and cross-identification of multiwaveband properties in celestial objects.
文摘数据流分类是数据流挖掘领域一项重要研究任务,目标是从不断变化的海量数据中捕获变化的类结构.目前,几乎没有框架可以同时处理数据流中常见的多类非平衡、概念漂移、异常点和标记样本成本高昂问题.基于此,提出一种非平衡数据流在线主动学习方法(Online active learning method for imbalanced data stream,OALM-IDS).AdaBoost是一种将多个弱分类器经过迭代生成强分类器的集成分类方法,AdaBoost.M2引入了弱分类器的置信度,此类方法常用于静态数据.定义了基于非平衡比率和自适应遗忘因子的训练样本重要性度量,从而使AdaBoost.M2方法适用于非平衡数据流,提升了非平衡数据流集成分类器的性能.提出了边际阈值矩阵的自适应调整方法,优化了标签请求策略.将概念漂移程度融入模型构建过程中,定义了基于概念漂移指数的自适应遗忘因子,实现了漂移后的模型重构.在6个人工数据流和4个真实数据流上的对比实验表明,提出的非平衡数据流在线主动学习方法的分类性能优于其他5种非平衡数据流学习方法.
基金Project supported by the National Key R&D Program of China(No.2016YFB1000101)the National Natural Science Foundation of China(Nos.61379052 and 61502513)+1 种基金the Natural Science Foundation for Distinguished Young Scholars of Hunan Province,China(No.14JJ1026)the Specialized Research Fund for the Doctoral Program of Higher Education,China(No.20124307110015)
文摘Recently, sequence anomaly detection has been widely used in many fields. Sequence data in these fields are usually multi-dimensional over the data stream. It is a challenge to design an anomaly detection method for a multi-dimensional sequence over the data stream to satisfy the requirements of accuracy and high speed. It is because:(1) Redundant dimensions in sequence data and large state space lead to a poor ability for sequence modeling;(2) Anomaly detection cannot adapt to the high-speed nature of the data stream, especially when concept drift occurs, and it will reduce the detection rate. On one hand, most existing methods of sequence anomaly detection focus on the single-dimension sequence. On the other hand, some studies concerning multi-dimensional sequence concentrate mainly on the static database rather than the data stream. To improve the performance of anomaly detection for a multi-dimensional sequence over the data stream, we propose a novel unsupervised fast and accurate anomaly detection(FAAD) method which includes three algorithms. First, a method called "information calculation and minimum spanning tree cluster" is adopted to reduce redundant dimensions. Second, to speed up model construction and ensure the detection rate for the sequence over the data stream, we propose a method called"random sampling and subsequence partitioning based on the index probabilistic suffix tree." Last, the method called "anomaly buffer based on model dynamic adjustment" dramatically reduces the effects of concept drift in the data stream. FAAD is implemented on the streaming platform Storm to detect multi-dimensional log audit data.Compared with the existing anomaly detection methods, FAAD has a good performance in detection rate and speed without being affected by concept drift.