Cloud applications are implemented on top of different distributed systems to provide online service.A service request is decomposed into multiple sub-tasks,which are dispatched to different distributed systems compon...Cloud applications are implemented on top of different distributed systems to provide online service.A service request is decomposed into multiple sub-tasks,which are dispatched to different distributed systems components.For cloud providers,monitoring the execution of a service request is crucial to promptly find problems that may compromise cloud availability.In this paper,we present AgamottoEye,to automatically construct request flow from existing logs.AgamottoEye addresses the challenges of analyzing interleaved log instances,and can successfully extract request flow spread across multiple distributed systems.Our experiments with Hadoop2/YARN show that AgamottoEye can analyze 25,050 log instances in 57.4s,and the extracted request flow information is helpful with error detection and diagnosis.展开更多
System logs,serving as a pivotal data source for performance monitoring and anomaly detection,play an indispensable role in assuring service stability and reliability.Despite this,the majority of existing log-based an...System logs,serving as a pivotal data source for performance monitoring and anomaly detection,play an indispensable role in assuring service stability and reliability.Despite this,the majority of existing log-based anomaly detection methodologies predominantly depend on the sequence or quantity attributes of logs,utilizing solely a single Recurrent Neural Network(RNN)and its variant sequence models for detection.These approaches have not thoroughly exploited the semantic information embedded in logs,exhibit limited adaptability to novel logs,and a single model struggles to fully unearth the potential features within the log sequence.Addressing these challenges,this article proposes a hybrid architecture based on amultiscale convolutional neural network,efficient channel attention and mogrifier gated recurrent unit networks(LogCEM),which amalgamates multiple neural network technologies.Capitalizing on the superior performance of robustly optimized BERT approach(RoBERTa)in the realm of natural language processing,we employ RoBERTa to extract the original word vectors from each word in the log template.In conjunction with the enhanced Smooth Inverse Frequency(SIF)algorithm,we generate more precise log sentence vectors,thereby achieving an in-depth representation of log semantics.Subsequently,these log vector sequences are fed into a hybrid neural network,which fuses 1D Multi-Scale Convolutional Neural Network(MSCNN),Efficient Channel Attention Mechanism(ECA),and Mogrifier Gated Recurrent Unit(GRU).This amalgamation enables themodel to concurrently capture the local and global dependencies of the log sequence and autonomously learn the significance of different log sequences,thereby markedly enhancing the efficacy of log anomaly detection.To validate the effectiveness of the LogCEM model,we conducted evaluations on two authoritative open-source datasets.The experimental results demonstrate that LogCEM not only exhibits excellent accuracy and robustness,but also outperforms the current mainstream log anomaly detection methods.展开更多
System logs are essential for detecting anomalies,querying faults,and tracing attacks.Because of the time-consuming and labor-intensive nature of manual system troubleshooting and anomaly detection,it cannot meet the ...System logs are essential for detecting anomalies,querying faults,and tracing attacks.Because of the time-consuming and labor-intensive nature of manual system troubleshooting and anomaly detection,it cannot meet the actual needs.The implementation of automated log anomaly detection is a topic that demands urgent research.However,the prior work on processing log data is mainly one-dimensional and cannot profoundly learn the complex associations in log data.Meanwhile,there is a lack of attention to the utilization of log labels and usually relies on a large number of labels for detection.This paper proposes a novel and practical detection model named LCC-HGLog,the core of which is the conversion of log anomaly detection into a graph classification problem.Semantic temporal graphs(STG)are constructed by extracting the raw logs’execution sequences and template semantics.Then a unique graph classifier is used to better comprehend each STG’s semantic,sequential,and structural features.The classification model is trained jointly by graph classification loss and label contrastive loss.While achieving discriminability at the class-level,it increases the fine-grained identification at the instance-level,thus achieving detection performance even with a small amount of labeled data.We have conducted numerous experiments on real log datasets,showing that the proposed model outperforms the baseline methods and obtains the best all-around performance.Moreover,the detection performance degrades to less than 1%when only 10%of the labeled data is used.With 200 labeled samples,we can achieve the same or better detection results than the baseline methods.展开更多
In order to obtain information or discover knowledge from system logs,the first step is to performlog parsing,whereby unstructured raw logs can be transformed into a sequence of structured events.Although comprehensiv...In order to obtain information or discover knowledge from system logs,the first step is to performlog parsing,whereby unstructured raw logs can be transformed into a sequence of structured events.Although comprehensive studies on log parsing have been conducted in recent years,most assume that one event object corresponds to a single-line message.However,in a growing number of scenarios,one event object spans multiple lines in the log,for which parsing methods toward single-line events are not applicable.In order to address this problem,this paper proposes an automated log parsing method for multiline events(LPME).LPME finds multiline event objects via iterative scanning,driven by a set of heuristic rules derived from practice.The advantage of LPME is that it proposes a cohesion-based evaluation method for multiline events and a bottom-up search approach that eliminates the process of enumerating all combinations.We analyze the algorithmic complexity of LPME and validate it on four datasets from different backgrounds.Evaluations show that the actual time complexity of LPME parsing for multiline events is close to the constant time,which enables it to handle large-scale sample inputs.On the experimental datasets,the performance of LPME achieves 1.0 for recall,and the precision is generally higher than 0.9,which demonstrates the effectiveness of the proposed LPME.展开更多
Workflow logs that record the execution of business processes offer very valuable data resource for real-time enterprise performance measurement. In this paper, a novel scheme that uses the technology of data warehous...Workflow logs that record the execution of business processes offer very valuable data resource for real-time enterprise performance measurement. In this paper, a novel scheme that uses the technology of data warehouse and OLAP to explore workflow logs and create complex analysis reports for enterprise performance measurement is proposed. Three key points of this scheme are studied: 1) the measure set; 2) the open and flexible architecture for workflow logs analysis system; 3) the data models in WFMS and data warehouse. A case study that shows the validity of the scheme is also provided.展开更多
Logging facies analysis is a significant aspect of reservoir description.In particular,as a commonly used method for logging facies identification,Multi-Resolution Graph-based Clustering(MRGC)can perform depth analysi...Logging facies analysis is a significant aspect of reservoir description.In particular,as a commonly used method for logging facies identification,Multi-Resolution Graph-based Clustering(MRGC)can perform depth analysis on multidimensional logging curves to predict logging facies.However,this method is very time-consuming and highly dependent on the initial parameters in the propagation process,which limits the practical application effect of the method.In this paper,an Adaptive Multi-Resolution Graph-based Clustering(AMRGC)is proposed,which is capable of both improving the efficiency of calculation process and achieving a stable propagation result.More specifically,the proposed method,1)presents a light kernel representative index(LKRI)algorithm which is proved to need less calculation resource than those kernel selection methods in the literature by exclusively considering those"free attractor"points;2)builds a Multi-Layer Perceptron(MLP)network with back propagation algorithm(BP)so as to avoid the uncertain results brought by uncertain parameter initializations which often happened by only using the K nearest neighbors(KNN)method.Compared with those clustering methods often used in image-based sedimentary phase analysis,such as Self Organizing Map(SOM),Dynamic Clustering(DYN)and Ascendant Hierarchical Clustering(AHC),etc.,the AMRGC performs much better without the prior knowledge of data structure.Eventually,the experimental results illustrate that the proposed method also outperformed the original MRGC method on the task of clustering and propagation prediction,with a higher efficiency and stability.展开更多
The illegal use of compromised email accounts by adversaries can have severe consequences for enterprises and society.Detecting compromised email accounts is more challenging than in the social network field,where ema...The illegal use of compromised email accounts by adversaries can have severe consequences for enterprises and society.Detecting compromised email accounts is more challenging than in the social network field,where email accounts have only a few interaction events(sending and receiving).To address the issue of insufficient features,we propose a novel approach to detecting compromised accounts by combining time zone differences and alternate logins to identify abnormal behavior.Based on this approach,we propose a compromised email account detection framework that relies on widely available and less sensitive login logs and does not require labels.Our framework characterizes login behaviors to identify logins that do not belong to the account owner and outputs a list of account-subnet pairs ranked by their likelihood of having abnormal login relationships.This approach reduces the number of account-subnet pairs that need to be investigated and provides a reference for investigation priority.Our evaluation demonstrates that our method can detect most email accounts that have been accessed by disclosed malicious IP addresses and outperforms similar research.Additionally,our framework has the capability to uncover undisclosed malicious IP addresses.展开更多
Three-dimensional(3 D)static modelling techniques are applied to the characterization of the Qishn Formation(Fm.)in the Sharyoof oil field locating within the Masila basin,southeastern Yemen.The present study was init...Three-dimensional(3 D)static modelling techniques are applied to the characterization of the Qishn Formation(Fm.)in the Sharyoof oil field locating within the Masila basin,southeastern Yemen.The present study was initiated by the seismic structural interpretation,followed by building a 3 D structural framework,in addition to analysing well log data and from these,3 D facies and petrophysical models are constructed.In the Sharyoof oil field,the Qishn Fm.exhibits depth values within the range of 400-780 m below sea level,with a general increase towards the SSE.A set of high dip angle normal faults with a general ENE-WSW trend dissect the rocks.The strata are also folded as a main anticline with an axis that is parallel to the fault trend,formed as a result of basement uplift.According to the facies models,the Qishn Fm.comprises 43.83% limestone,21.53% shale,21.26% sandstone,13.21% siltstone and 0.17% dolomite.The Qishn Carbonates Member has low porosity values making it a potential seal for the underlying reservoirs whereas the Upper Qishn Clastics SI A and C have good reservoir quality and SIB has fair reservoir quality.The Upper Qishn Clastics S2 and S3 also have fair reservoir quality,while the Lower Qishn Clastics zone has good reservoir quality.The water saturation decreases towards the west and east and increases towards north and south.The total original oil in-place(OOIP)of the Upper Qishn clastics is 106 million STB within the SI A,SIC and S2 zones.Drilling of development wells is recommended in the eastern study area,where good trapping configuration is exhibited in addition to the presence of a potential seal(Upper Qishn Carbonates Member)and reservoir(Qishn Clastics Member)with high porosity and low water saturation.展开更多
1 IntroductionNowadays in China, there are more than six hundred million netizens [1]. On April 11, 2015, the nmnbet of simultaneous online users of the Chinese instant message application QQ reached two hundred milli...1 IntroductionNowadays in China, there are more than six hundred million netizens [1]. On April 11, 2015, the nmnbet of simultaneous online users of the Chinese instant message application QQ reached two hundred million [2]. The fast growth ol the lnternet pusnes me rapid development of information technology (IT) and communication technology (CT). Many traditional IT service and CT equipment providers are facing the fusion of IT and CT in the age of digital transformation, and heading toward ICT enterprises. Large global ICT enterprises, such as Apple, Google, Microsoft, Amazon, Verizon, and AT&T, have been contributing to the performance improvement of IT service and CT equipment.展开更多
Nowadays,in almost every computer system,log files are used to keep records of occurring events.Those log files are then used for analyzing and debugging system failures.Due to this important utility,researchers have ...Nowadays,in almost every computer system,log files are used to keep records of occurring events.Those log files are then used for analyzing and debugging system failures.Due to this important utility,researchers have worked on finding fast and efficient ways to detect anomalies in a computer system by analyzing its log records.Research in log-based anomaly detection can be divided into two main categories:batch log-based anomaly detection and streaming log-based anomaly detection.Batch log-based anomaly detection is computationally heavy and does not allow us to instantaneously detect anomalies.On the other hand,streaming anomaly detection allows for immediate alert.However,current streaming approaches are mainly supervised.In this work,we propose a fully unsupervised framework which can detect anomalies in real time.We test our framework on hdfs log files and successfully detect anomalies with an F-1 score of 83%.展开更多
Software application is still a heavy dependence for most of the business operation today.Whenever software application encounters error that causes downtime in the production environment,the root cause of the error c...Software application is still a heavy dependence for most of the business operation today.Whenever software application encounters error that causes downtime in the production environment,the root cause of the error can be either within the software application layer or any other factor outside the software application layer.To accurately identify the root cause is difficult whenever more than one log file is required for the root cause analysis activity.Due to such complexity,it leads to the entire duration on the root cause analysis activity became prolong.This will increase the total time taken on restoring the software application service back to the users.In order to identify the root cause of software application error in a more accurate manner,and shorten the duration of root cause analysis activity conducting on software application error,a Prescriptive Analytical Logic Model incorporates with Analytic Hierarchy Process(AHP)is proposed.The proposed Logic Model along with the algorithm will contribute a new knowledge in the area of log file analysis to shorten the total time spent on root cause analysis activity.展开更多
Logs contain runtime information for both systems and users.As many of them use natural language,a typical log-based analysis needs to parse logs into the structured format first.Existing parsing approaches often take...Logs contain runtime information for both systems and users.As many of them use natural language,a typical log-based analysis needs to parse logs into the structured format first.Existing parsing approaches often take two steps.The first step is to find similar words(tokens)or sentences.Second,parsers extract log templates by replacing different tokens with variable placeholders.However,we observe that most parsers concentrate on precisely grouping similar tokens or logs.But they do not have a well-designed template extraction process,which leads to inconsistent accuracy on particular datasets.The root cause is the ambiguous definition of variable placeholders and similar templates.The consequences include abuse of variable placeholders,incorrectly divided templates,and an excessive number of templates over time.In this paper,we propose our online log parsing approach Cognition.It redefines variable placeholders via a strict lower bound to avoid ambiguity first.Then,it applies our template correction technique to merge and absorb similar templates.It eliminates the interference of commonly used parameters and thus isolates template quantity.Evaluation through 16 public datasets shows that Cognition has better accuracy and consistency than the state-of-the-art approaches.It also saves up to 52.1%of time cost on average than the others.展开更多
Purpose: This study attempts to investigate how a user's search behavior changes in the exploratory search process in order to understand the characteristics of the user's search behavior and build a behaviora...Purpose: This study attempts to investigate how a user's search behavior changes in the exploratory search process in order to understand the characteristics of the user's search behavior and build a behavioral model.Design/methodology/approach: Forty-two matriculated full-time senior college students with a female-to-male ratio of 1 to 1 who majored in medical science in Jilin University participated in our experiment. The task of the experiment was to search for information about 'the influence of environmental pollution on daily life' in order to write a report about this topic. The research methods include concept map, query log analysis and questionnaire survey.Findings: The results indicate that exploratory search can significantly change the knowledge structure of searchers. As searchers were moving through different stages of the exploratory search process, they experienced cognitive changes, and their search behaviors were characterized by quick browsing, careful browsing and focused searching.Research limitations: The study used only one search topic, and there is no comparision or control group. Although we took search habits, personal thinking habits, personality characteristics and professional background into account, a more detailed study to analyze the effects of these factors on exploratory search behavior is needed in our further research.Practical implications: This study can serve as a reference for other researchers engaged in the same effort to construct the supporting system of exploratory search.Originality/value: Three methods are used to investigate the behavior characteristics during exploratory search.展开更多
As software systems grow more and more complex,extensive techniques have been proposed to analyze the log data to obtain the insight of the system status.However,during log data analysis,tedious manual efforts are pai...As software systems grow more and more complex,extensive techniques have been proposed to analyze the log data to obtain the insight of the system status.However,during log data analysis,tedious manual efforts are paid to search interesting or informative log patterns from a huge volume of log data,named pattern-based queries.Although existing log management tools and DMBS systems can also support pattern-based queries,they suffer from a low efficiency.To deal with this problem,we propose a novel approach,named PLQ(Pattern-based Log Query).First,PLQ organizes logs into disjoint chunks and builds chunk-wise bitmap indexes for log types and attribute values.Then,based on bitmap indexes,PLQ finds candidate logs with a set of efficient bit-wise operations.Finally,PLQ fetches candidate logs and validates them according to the queried pattern.Extensive experiments are conducted on real-life datasets.According to experimental results,compared with existing log management systems,PLQ is more efficient in querying log patterns and has a higher pruning rate for filtering irrelevant logs.Moreover,in PLQ,since the ratio of the index size to the data size does not exceed 2.5%for log datasets of different sizes,PLQ has a high scalability.展开更多
For this special section on software systems, six research leaders in software systems, as guest editors tor this special section, discuss important issues that will shape this field's future research directions. The...For this special section on software systems, six research leaders in software systems, as guest editors tor this special section, discuss important issues that will shape this field's future research directions. The essays included in this roundtable article cover research opportunities and challenges for large-scale software systems such as querying organization- wide software behaviors (Xusheng Xiao), logging and log analysis (Jian-Ouang Lou), engineering reliable cloud distributed systems (Shan Lu), usage data (David C. Shepherd), clone detection and management (Xin Peng), and code search and beyond (Qian-Xiang Wang). - Tao Xie, Leading Editor of Software Systems.展开更多
In this paper,we used the platform log data to extract three features(proportion of passive video time,proportion of active video time,and proportion of assignment time)aligning with different learning activities in t...In this paper,we used the platform log data to extract three features(proportion of passive video time,proportion of active video time,and proportion of assignment time)aligning with different learning activities in the Interactive-Constructive-Active-Passive(ICAP)framework,and applied hierarchical clustering to detect student engagement modes.A total of 840 learning rounds were clustered into four categories of engagement:passive(n=80),active(n=366),constructive(n=75)and resting(n=319).The results showed that there were differences in the performance of the four engagement modes,and three types of learning status were identified based on the sequences of student engagement modes:difficult,balanced and easy.This study indicated that based on the ICAP framework,the online learning platform log data could be used to automatically detect different engagement modes of students,which could provide useful references for online learning analysis and personalized learning.展开更多
文摘Cloud applications are implemented on top of different distributed systems to provide online service.A service request is decomposed into multiple sub-tasks,which are dispatched to different distributed systems components.For cloud providers,monitoring the execution of a service request is crucial to promptly find problems that may compromise cloud availability.In this paper,we present AgamottoEye,to automatically construct request flow from existing logs.AgamottoEye addresses the challenges of analyzing interleaved log instances,and can successfully extract request flow spread across multiple distributed systems.Our experiments with Hadoop2/YARN show that AgamottoEye can analyze 25,050 log instances in 57.4s,and the extracted request flow information is helpful with error detection and diagnosis.
基金supported by the Science and Technology Program State Grid Corporation of China,Grant SGSXDK00DJJS2250061.
文摘System logs,serving as a pivotal data source for performance monitoring and anomaly detection,play an indispensable role in assuring service stability and reliability.Despite this,the majority of existing log-based anomaly detection methodologies predominantly depend on the sequence or quantity attributes of logs,utilizing solely a single Recurrent Neural Network(RNN)and its variant sequence models for detection.These approaches have not thoroughly exploited the semantic information embedded in logs,exhibit limited adaptability to novel logs,and a single model struggles to fully unearth the potential features within the log sequence.Addressing these challenges,this article proposes a hybrid architecture based on amultiscale convolutional neural network,efficient channel attention and mogrifier gated recurrent unit networks(LogCEM),which amalgamates multiple neural network technologies.Capitalizing on the superior performance of robustly optimized BERT approach(RoBERTa)in the realm of natural language processing,we employ RoBERTa to extract the original word vectors from each word in the log template.In conjunction with the enhanced Smooth Inverse Frequency(SIF)algorithm,we generate more precise log sentence vectors,thereby achieving an in-depth representation of log semantics.Subsequently,these log vector sequences are fed into a hybrid neural network,which fuses 1D Multi-Scale Convolutional Neural Network(MSCNN),Efficient Channel Attention Mechanism(ECA),and Mogrifier Gated Recurrent Unit(GRU).This amalgamation enables themodel to concurrently capture the local and global dependencies of the log sequence and autonomously learn the significance of different log sequences,thereby markedly enhancing the efficacy of log anomaly detection.To validate the effectiveness of the LogCEM model,we conducted evaluations on two authoritative open-source datasets.The experimental results demonstrate that LogCEM not only exhibits excellent accuracy and robustness,but also outperforms the current mainstream log anomaly detection methods.
基金the National Natural Science Foundation of China(U20B2045).
文摘System logs are essential for detecting anomalies,querying faults,and tracing attacks.Because of the time-consuming and labor-intensive nature of manual system troubleshooting and anomaly detection,it cannot meet the actual needs.The implementation of automated log anomaly detection is a topic that demands urgent research.However,the prior work on processing log data is mainly one-dimensional and cannot profoundly learn the complex associations in log data.Meanwhile,there is a lack of attention to the utilization of log labels and usually relies on a large number of labels for detection.This paper proposes a novel and practical detection model named LCC-HGLog,the core of which is the conversion of log anomaly detection into a graph classification problem.Semantic temporal graphs(STG)are constructed by extracting the raw logs’execution sequences and template semantics.Then a unique graph classifier is used to better comprehend each STG’s semantic,sequential,and structural features.The classification model is trained jointly by graph classification loss and label contrastive loss.While achieving discriminability at the class-level,it increases the fine-grained identification at the instance-level,thus achieving detection performance even with a small amount of labeled data.We have conducted numerous experiments on real log datasets,showing that the proposed model outperforms the baseline methods and obtains the best all-around performance.Moreover,the detection performance degrades to less than 1%when only 10%of the labeled data is used.With 200 labeled samples,we can achieve the same or better detection results than the baseline methods.
文摘In order to obtain information or discover knowledge from system logs,the first step is to performlog parsing,whereby unstructured raw logs can be transformed into a sequence of structured events.Although comprehensive studies on log parsing have been conducted in recent years,most assume that one event object corresponds to a single-line message.However,in a growing number of scenarios,one event object spans multiple lines in the log,for which parsing methods toward single-line events are not applicable.In order to address this problem,this paper proposes an automated log parsing method for multiline events(LPME).LPME finds multiline event objects via iterative scanning,driven by a set of heuristic rules derived from practice.The advantage of LPME is that it proposes a cohesion-based evaluation method for multiline events and a bottom-up search approach that eliminates the process of enumerating all combinations.We analyze the algorithmic complexity of LPME and validate it on four datasets from different backgrounds.Evaluations show that the actual time complexity of LPME parsing for multiline events is close to the constant time,which enables it to handle large-scale sample inputs.On the experimental datasets,the performance of LPME achieves 1.0 for recall,and the precision is generally higher than 0.9,which demonstrates the effectiveness of the proposed LPME.
文摘Workflow logs that record the execution of business processes offer very valuable data resource for real-time enterprise performance measurement. In this paper, a novel scheme that uses the technology of data warehouse and OLAP to explore workflow logs and create complex analysis reports for enterprise performance measurement is proposed. Three key points of this scheme are studied: 1) the measure set; 2) the open and flexible architecture for workflow logs analysis system; 3) the data models in WFMS and data warehouse. A case study that shows the validity of the scheme is also provided.
基金sponsored by the Science and Technology Project of CNPC(No.2018D-5010-16 and 2019D-3808)。
文摘Logging facies analysis is a significant aspect of reservoir description.In particular,as a commonly used method for logging facies identification,Multi-Resolution Graph-based Clustering(MRGC)can perform depth analysis on multidimensional logging curves to predict logging facies.However,this method is very time-consuming and highly dependent on the initial parameters in the propagation process,which limits the practical application effect of the method.In this paper,an Adaptive Multi-Resolution Graph-based Clustering(AMRGC)is proposed,which is capable of both improving the efficiency of calculation process and achieving a stable propagation result.More specifically,the proposed method,1)presents a light kernel representative index(LKRI)algorithm which is proved to need less calculation resource than those kernel selection methods in the literature by exclusively considering those"free attractor"points;2)builds a Multi-Layer Perceptron(MLP)network with back propagation algorithm(BP)so as to avoid the uncertain results brought by uncertain parameter initializations which often happened by only using the K nearest neighbors(KNN)method.Compared with those clustering methods often used in image-based sedimentary phase analysis,such as Self Organizing Map(SOM),Dynamic Clustering(DYN)and Ascendant Hierarchical Clustering(AHC),etc.,the AMRGC performs much better without the prior knowledge of data structure.Eventually,the experimental results illustrate that the proposed method also outperformed the original MRGC method on the task of clustering and propagation prediction,with a higher efficiency and stability.
基金supported by the Youth Innovation Promotion Association CAS(No.2019163)the Strategic Priority Research Program of Chinese Academy of Sciences(No.XDC02040100)the Key Laboratory of Network Assessment Technology at Chinese Academy of Sciences and Beijing Key Laboratory of Network security and Protection Technology.
文摘The illegal use of compromised email accounts by adversaries can have severe consequences for enterprises and society.Detecting compromised email accounts is more challenging than in the social network field,where email accounts have only a few interaction events(sending and receiving).To address the issue of insufficient features,we propose a novel approach to detecting compromised accounts by combining time zone differences and alternate logins to identify abnormal behavior.Based on this approach,we propose a compromised email account detection framework that relies on widely available and less sensitive login logs and does not require labels.Our framework characterizes login behaviors to identify logins that do not belong to the account owner and outputs a list of account-subnet pairs ranked by their likelihood of having abnormal login relationships.This approach reduces the number of account-subnet pairs that need to be investigated and provides a reference for investigation priority.Our evaluation demonstrates that our method can detect most email accounts that have been accessed by disclosed malicious IP addresses and outperforms similar research.Additionally,our framework has the capability to uncover undisclosed malicious IP addresses.
基金supported by the RUDN Strategic Academic Leadership Program。
文摘Three-dimensional(3 D)static modelling techniques are applied to the characterization of the Qishn Formation(Fm.)in the Sharyoof oil field locating within the Masila basin,southeastern Yemen.The present study was initiated by the seismic structural interpretation,followed by building a 3 D structural framework,in addition to analysing well log data and from these,3 D facies and petrophysical models are constructed.In the Sharyoof oil field,the Qishn Fm.exhibits depth values within the range of 400-780 m below sea level,with a general increase towards the SSE.A set of high dip angle normal faults with a general ENE-WSW trend dissect the rocks.The strata are also folded as a main anticline with an axis that is parallel to the fault trend,formed as a result of basement uplift.According to the facies models,the Qishn Fm.comprises 43.83% limestone,21.53% shale,21.26% sandstone,13.21% siltstone and 0.17% dolomite.The Qishn Carbonates Member has low porosity values making it a potential seal for the underlying reservoirs whereas the Upper Qishn Clastics SI A and C have good reservoir quality and SIB has fair reservoir quality.The Upper Qishn Clastics S2 and S3 also have fair reservoir quality,while the Lower Qishn Clastics zone has good reservoir quality.The water saturation decreases towards the west and east and increases towards north and south.The total original oil in-place(OOIP)of the Upper Qishn clastics is 106 million STB within the SI A,SIC and S2 zones.Drilling of development wells is recommended in the eastern study area,where good trapping configuration is exhibited in addition to the presence of a potential seal(Upper Qishn Carbonates Member)and reservoir(Qishn Clastics Member)with high porosity and low water saturation.
基金supported in part by Ministry of Education/China Mobile joint research grant under Project No.5-10Nanjing University of Posts and Telecommunications under Grants No.NY214135 and NY215045
文摘1 IntroductionNowadays in China, there are more than six hundred million netizens [1]. On April 11, 2015, the nmnbet of simultaneous online users of the Chinese instant message application QQ reached two hundred million [2]. The fast growth ol the lnternet pusnes me rapid development of information technology (IT) and communication technology (CT). Many traditional IT service and CT equipment providers are facing the fusion of IT and CT in the age of digital transformation, and heading toward ICT enterprises. Large global ICT enterprises, such as Apple, Google, Microsoft, Amazon, Verizon, and AT&T, have been contributing to the performance improvement of IT service and CT equipment.
文摘Nowadays,in almost every computer system,log files are used to keep records of occurring events.Those log files are then used for analyzing and debugging system failures.Due to this important utility,researchers have worked on finding fast and efficient ways to detect anomalies in a computer system by analyzing its log records.Research in log-based anomaly detection can be divided into two main categories:batch log-based anomaly detection and streaming log-based anomaly detection.Batch log-based anomaly detection is computationally heavy and does not allow us to instantaneously detect anomalies.On the other hand,streaming anomaly detection allows for immediate alert.However,current streaming approaches are mainly supervised.In this work,we propose a fully unsupervised framework which can detect anomalies in real time.We test our framework on hdfs log files and successfully detect anomalies with an F-1 score of 83%.
文摘Software application is still a heavy dependence for most of the business operation today.Whenever software application encounters error that causes downtime in the production environment,the root cause of the error can be either within the software application layer or any other factor outside the software application layer.To accurately identify the root cause is difficult whenever more than one log file is required for the root cause analysis activity.Due to such complexity,it leads to the entire duration on the root cause analysis activity became prolong.This will increase the total time taken on restoring the software application service back to the users.In order to identify the root cause of software application error in a more accurate manner,and shorten the duration of root cause analysis activity conducting on software application error,a Prescriptive Analytical Logic Model incorporates with Analytic Hierarchy Process(AHP)is proposed.The proposed Logic Model along with the algorithm will contribute a new knowledge in the area of log file analysis to shorten the total time spent on root cause analysis activity.
基金supported by the National Key Research and Development Program of China under Grant No.2019YFB1802800the National Science Fund for Distinguished Young Scholars of China under Grant No.61725206。
文摘Logs contain runtime information for both systems and users.As many of them use natural language,a typical log-based analysis needs to parse logs into the structured format first.Existing parsing approaches often take two steps.The first step is to find similar words(tokens)or sentences.Second,parsers extract log templates by replacing different tokens with variable placeholders.However,we observe that most parsers concentrate on precisely grouping similar tokens or logs.But they do not have a well-designed template extraction process,which leads to inconsistent accuracy on particular datasets.The root cause is the ambiguous definition of variable placeholders and similar templates.The consequences include abuse of variable placeholders,incorrectly divided templates,and an excessive number of templates over time.In this paper,we propose our online log parsing approach Cognition.It redefines variable placeholders via a strict lower bound to avoid ambiguity first.Then,it applies our template correction technique to merge and absorb similar templates.It eliminates the interference of commonly used parameters and thus isolates template quantity.Evaluation through 16 public datasets shows that Cognition has better accuracy and consistency than the state-of-the-art approaches.It also saves up to 52.1%of time cost on average than the others.
基金supported by the National Social Science Foundation(Grant No.:11BTQ045)
文摘Purpose: This study attempts to investigate how a user's search behavior changes in the exploratory search process in order to understand the characteristics of the user's search behavior and build a behavioral model.Design/methodology/approach: Forty-two matriculated full-time senior college students with a female-to-male ratio of 1 to 1 who majored in medical science in Jilin University participated in our experiment. The task of the experiment was to search for information about 'the influence of environmental pollution on daily life' in order to write a report about this topic. The research methods include concept map, query log analysis and questionnaire survey.Findings: The results indicate that exploratory search can significantly change the knowledge structure of searchers. As searchers were moving through different stages of the exploratory search process, they experienced cognitive changes, and their search behaviors were characterized by quick browsing, careful browsing and focused searching.Research limitations: The study used only one search topic, and there is no comparision or control group. Although we took search habits, personal thinking habits, personality characteristics and professional background into account, a more detailed study to analyze the effects of these factors on exploratory search behavior is needed in our further research.Practical implications: This study can serve as a reference for other researchers engaged in the same effort to construct the supporting system of exploratory search.Originality/value: Three methods are used to investigate the behavior characteristics during exploratory search.
基金This work was supported by the National Natural Science Foundation of China under Grant No.61672163the MIIT Project:Data Management Standards and Verfication for Industrial Internet Identifer Resoluation.
文摘As software systems grow more and more complex,extensive techniques have been proposed to analyze the log data to obtain the insight of the system status.However,during log data analysis,tedious manual efforts are paid to search interesting or informative log patterns from a huge volume of log data,named pattern-based queries.Although existing log management tools and DMBS systems can also support pattern-based queries,they suffer from a low efficiency.To deal with this problem,we propose a novel approach,named PLQ(Pattern-based Log Query).First,PLQ organizes logs into disjoint chunks and builds chunk-wise bitmap indexes for log types and attribute values.Then,based on bitmap indexes,PLQ finds candidate logs with a set of efficient bit-wise operations.Finally,PLQ fetches candidate logs and validates them according to the queried pattern.Extensive experiments are conducted on real-life datasets.According to experimental results,compared with existing log management systems,PLQ is more efficient in querying log patterns and has a higher pruning rate for filtering irrelevant logs.Moreover,in PLQ,since the ratio of the index size to the data size does not exceed 2.5%for log datasets of different sizes,PLQ has a high scalability.
文摘For this special section on software systems, six research leaders in software systems, as guest editors tor this special section, discuss important issues that will shape this field's future research directions. The essays included in this roundtable article cover research opportunities and challenges for large-scale software systems such as querying organization- wide software behaviors (Xusheng Xiao), logging and log analysis (Jian-Ouang Lou), engineering reliable cloud distributed systems (Shan Lu), usage data (David C. Shepherd), clone detection and management (Xin Peng), and code search and beyond (Qian-Xiang Wang). - Tao Xie, Leading Editor of Software Systems.
文摘In this paper,we used the platform log data to extract three features(proportion of passive video time,proportion of active video time,and proportion of assignment time)aligning with different learning activities in the Interactive-Constructive-Active-Passive(ICAP)framework,and applied hierarchical clustering to detect student engagement modes.A total of 840 learning rounds were clustered into four categories of engagement:passive(n=80),active(n=366),constructive(n=75)and resting(n=319).The results showed that there were differences in the performance of the four engagement modes,and three types of learning status were identified based on the sequences of student engagement modes:difficult,balanced and easy.This study indicated that based on the ICAP framework,the online learning platform log data could be used to automatically detect different engagement modes of students,which could provide useful references for online learning analysis and personalized learning.