Author Profiling (AP) is a subsection of digital forensics that focuses on the detection of the author’s personalinformation, such as age, gender, occupation, and education, based on various linguistic features, e.g....Author Profiling (AP) is a subsection of digital forensics that focuses on the detection of the author’s personalinformation, such as age, gender, occupation, and education, based on various linguistic features, e.g., stylistic,semantic, and syntactic. The importance of AP lies in various fields, including forensics, security, medicine, andmarketing. In previous studies, many works have been done using different languages, e.g., English, Arabic, French,etc.However, the research on RomanUrdu is not up to the mark.Hence, this study focuses on detecting the author’sage and gender based on Roman Urdu text messages. The dataset used in this study is Fire’18-MaponSMS. Thisstudy proposed an ensemble model based on AdaBoostM1 and Random Forest (AMBRF) for AP using multiplelinguistic features that are stylistic, character-based, word-based, and sentence-based. The proposed model iscontrasted with several of the well-known models fromthe literature, including J48-Decision Tree (J48),Na飗e Bays(NB), K Nearest Neighbor (KNN), and Composite Hypercube on Random Projection (CHIRP), NB-Updatable,RF, and AdaboostM1. The overall outcome shows the better performance of the proposed AdaboostM1 withRandom Forest (ABMRF) with an accuracy of 54.2857% for age prediction and 71.1429% for gender predictioncalculated on stylistic features. Regarding word-based features, age and gender were considered in 50.5714% and60%, respectively. On the other hand, KNN and CHIRP show the weakest performance using all the linguisticfeatures for age and gender prediction.展开更多
This study explores the area of Author Profiling(AP)and its importance in several industries,including forensics,security,marketing,and education.A key component of AP is the extraction of useful information from text...This study explores the area of Author Profiling(AP)and its importance in several industries,including forensics,security,marketing,and education.A key component of AP is the extraction of useful information from text,with an emphasis on the writers’ages and genders.To improve the accuracy of AP tasks,the study develops an ensemble model dubbed ABMRF that combines AdaBoostM1(ABM1)and Random Forest(RF).The work uses an extensive technique that involves textmessage dataset pretreatment,model training,and assessment.To evaluate the effectiveness of several machine learning(ML)algorithms in classifying age and gender,including Composite Hypercube on Random Projection(CHIRP),Decision Trees(J48),Na飗e Bayes(NB),K Nearest Neighbor,AdaboostM1,NB-Updatable,RF,andABMRF,they are compared.The findings demonstrate thatABMRFregularly beats the competition,with a gender classification accuracy of 71.14%and an age classification accuracy of 54.29%,respectively.Additional metrics like precision,recall,F-measure,Matthews Correlation Coefficient(MCC),and accuracy support ABMRF’s outstanding performance in age and gender profiling tasks.This study demonstrates the usefulness of ABMRF as an ensemble model for author profiling and highlights its possible uses in marketing,law enforcement,and education.The results emphasize the effectiveness of ensemble approaches in enhancing author profiling task accuracy,particularly when it comes to age and gender identification.展开更多
primarily driven by advancements in technology,changes in healthcare delivery,and a deeper understanding of disease processes.Advancements in technology have revolutionized patient monitoring,diagnosis,and treatment i...primarily driven by advancements in technology,changes in healthcare delivery,and a deeper understanding of disease processes.Advancements in technology have revolutionized patient monitoring,diagnosis,and treatment in the critical care setting.From minimally invasive procedures to advances imaging techniques,clinicians now have access to a wide array of tools to assess and manage critically ill patients more effectively.In this editorial we comment on the review article published by Padte S et al wherein they concisely describe the latest developments in critical care medicine.展开更多
The proliferation of maliciously coded documents as file transfers increase has led to a rise in sophisticated attacks.Portable Document Format(PDF)files have emerged as a major attack vector for malware due to their ...The proliferation of maliciously coded documents as file transfers increase has led to a rise in sophisticated attacks.Portable Document Format(PDF)files have emerged as a major attack vector for malware due to their adaptability and wide usage.Detecting malware in PDF files is challenging due to its ability to include various harmful elements such as embedded scripts,exploits,and malicious URLs.This paper presents a comparative analysis of machine learning(ML)techniques,including Naive Bayes(NB),K-Nearest Neighbor(KNN),Average One Dependency Estimator(A1DE),RandomForest(RF),and SupportVectorMachine(SVM)forPDFmalware detection.The study utilizes a dataset obtained from the Canadian Institute for Cyber-security and employs different testing criteria,namely percentage splitting and 10-fold cross-validation.The performance of the techniques is evaluated using F1-score,precision,recall,and accuracy measures.The results indicate that KNNoutperforms other models,achieving an accuracy of 99.8599%using 10-fold cross-validation.The findings highlight the effectiveness of ML models in accurately detecting PDF malware and provide insights for developing robust systems to protect against malicious activities.展开更多
Today,liver disease,or any deterioration in one’s ability to survive,is extremely common all around the world.Previous research has indicated that liver disease is more frequent in younger people than in older ones.W...Today,liver disease,or any deterioration in one’s ability to survive,is extremely common all around the world.Previous research has indicated that liver disease is more frequent in younger people than in older ones.When the liver’s capability begins to deteriorate,life can be shortened to one or two days,and early prediction of such diseases is difficult.Using several machine learning(ML)approaches,researchers analyzed a variety of models for predicting liver disorders in their early stages.As a result,this research looks at using the Random Forest(RF)classifier to diagnose the liver disease early on.The dataset was picked from the University of California,Irvine repository.RF’s accomplishments are contrasted to those of Multi-Layer Perceptron(MLP),Average One Dependency Estimator(A1DE),Support Vector Machine(SVM),Credal Decision Tree(CDT),Composite Hypercube on Iterated Random Projection(CHIRP),K-nearest neighbor(KNN),Naïve Bayes(NB),J48-Decision Tree(J48),and Forest by Penalizing Attributes(Forest-PA).Some of the assessment measures used to evaluate each classifier include Root Relative Squared Error(RRSE),Root Mean Squared Error(RMSE),accuracy,recall,precision,specificity,Matthew’s Correlation Coefficient(MCC),F-measure,and G-measure.RF has an RRSE performance of 87.6766 and an RMSE performance of 0.4328,however,its percentage accuracy is 72.1739.The widely acknowledged result of this work can be used as a starting point for subsequent research.As a result,every claim that a new model,framework,or method enhances forecastingmay be benchmarked and demonstrated.展开更多
In the medical profession,recent technological advancements play an essential role in the early detection and categorization of many diseases that cause mortality.The technique rising on daily basis for detecting illn...In the medical profession,recent technological advancements play an essential role in the early detection and categorization of many diseases that cause mortality.The technique rising on daily basis for detecting illness in magnetic resonance through pictures is the inspection of humans.Automatic(computerized)illness detection in medical imaging has found you the emergent region in several medical diagnostic applications.Various diseases that cause death need to be identified through such techniques and technologies to overcome the mortality ratio.The brain tumor is one of the most common causes of death.Researchers have already proposed various models for the classification and detection of tumors,each with its strengths and weaknesses,but there is still a need to improve the classification process with improved efficiency.However,in this study,we give an in-depth analysis of six distinct machine learning(ML)algorithms,including Random Forest(RF),Naïve Bayes(NB),Neural Networks(NN),CN2 Rule Induction(CN2),Support Vector Machine(SVM),and Decision Tree(Tree),to address this gap in improving accuracy.On the Kaggle dataset,these strategies are tested using classification accuracy,the area under the Receiver Operating Characteristic(ROC)curve,precision,recall,and F1 Score(F1).The training and testing process is strengthened by using a 10-fold cross-validation technique.The results show that SVM outperforms other algorithms,with 95.3%accuracy.展开更多
Phishing attacks pose a significant security threat by masquerading as trustworthy entities to steal sensitive information,a problem that persists despite user awareness.This study addresses the pressing issue of phis...Phishing attacks pose a significant security threat by masquerading as trustworthy entities to steal sensitive information,a problem that persists despite user awareness.This study addresses the pressing issue of phishing attacks on websites and assesses the performance of three prominent Machine Learning(ML)models—Artificial Neural Networks(ANN),Convolutional Neural Networks(CNN),and Long Short-Term Memory(LSTM)—utilizing authentic datasets sourced from Kaggle and Mendeley repositories.Extensive experimentation and analysis reveal that the CNN model achieves a better accuracy of 98%.On the other hand,LSTM shows the lowest accuracy of 96%.These findings underscore the potential of ML techniques in enhancing phishing detection systems and bolstering cybersecurity measures against evolving phishing tactics,offering a promising avenue for safeguarding sensitive information and online security.展开更多
Movies are the better source of entertainment.Every year,a great percentage of movies are released.People comment on movies in the form of reviews after watching them.Since it is difficult to read all of the reviews f...Movies are the better source of entertainment.Every year,a great percentage of movies are released.People comment on movies in the form of reviews after watching them.Since it is difficult to read all of the reviews for a movie,summarizing all of the reviews will help make this decision without wasting time in reading all of the reviews.Opinion mining also known as sentiment analysis is the process of extracting subjective information from textual data.Opinion mining involves identifying and extracting the opinions of individuals,which can be positive,neutral,or negative.The task of opinion mining also called sentiment analysis is performed to understand people’s emotions and attitudes in movie reviews.Movie reviews are an important source of opinion data because they provide insight into the general public’s opinions about a particular movie.The summary of all reviews can give a general idea about the movie.This study compares baseline techniques,Logistic Regression,Random Forest Classifier,Decision Tree,K-Nearest Neighbor,Gradient Boosting Classifier,and Passive Aggressive Classifier with Linear Support Vector Machines and Multinomial Naïve Bayes on the IMDB Dataset of 50K reviews and Sentiment Polarity Dataset Version 2.0.Before applying these classifiers,in pre-processing both datasets are cleaned,duplicate data is dropped and chat words are treated for better results.On the IMDB Dataset of 50K reviews,Linear Support Vector Machines achieve the highest accuracy of 89.48%,and after hyperparameter tuning,the Passive Aggressive Classifier achieves the highest accuracy of 90.27%,while Multinomial Nave Bayes achieves the highest accuracy of 70.69%and 71.04%after hyperparameter tuning on the Sentiment Polarity Dataset Version 2.0.This study highlights the importance of sentiment analysis as a tool for understanding the emotions and attitudes in movie reviews and predicts the performance of a movie based on the average sentiment of all the reviews.展开更多
Author’s Profile(AP)may only be displayed as an article,similar to text collection of material,and must differentiate between gender,age,education,occupation,local language,and relative personality traits.In several ...Author’s Profile(AP)may only be displayed as an article,similar to text collection of material,and must differentiate between gender,age,education,occupation,local language,and relative personality traits.In several informationrelated fields,including security,forensics,and marketing,and medicine,AP prediction is a significant issue.For instance,it is important to comprehend who wrote the harassing communication.In essence,from a marketing perspective,businesses will get to know one another through examining items and websites on the internet.Accordingly,they will direct their efforts towards a certain gender or age restriction based on the kind of individuals who comment on their products.Recently many approaches have been presented many techniques to automatically detect user age and gender from the language which is based on text,documents,or comments on social media.The purpose of this research is to classify age(18–24,25–34,35–49,50–64,and 65–70)and gender(male,female)from a PAN 2014 Hotel Reviews dataset of the English language.The usage of six machine learning models is the main emphasis of this work,including the methods of Support Vector Machine(SVM),Random Forest(RF),Naive Bayes(NB),Logistic Regression(LR),Decision Tree(DT)and K-Nearest Neighbors(KNN).展开更多
<span style="font-family:;" "=""><span style="font-family:Verdana;">Endosco</span><span style="font-family:Verdana;">pic sub-mucosal dismemberment (...<span style="font-family:;" "=""><span style="font-family:Verdana;">Endosco</span><span style="font-family:Verdana;">pic sub-mucosal dismemberment (ESD) has become a settled strat</span><span style="font-family:Verdana;">egy for treatment of shallow neoplasms in the gastrointestinal tract. In three local areas, ESD was introduced to overcome traditional endoscopic mucous resecti</span><span style="font-family:Verdana;">on (EMR) and inadequate resection of the EMR, combining mout</span><span style="font-family:Verdana;">h, stomach, and the colon, for early disruptive sores. ESD was grown first in Japan since that nation has the highest predominance of gastric malignant growth on the p</span><span style="font-family:Verdana;">lanet. Endoscopic sub-mucosal analyzation causes enormous fake ulc</span><span style="font-family:Verdana;">ers with </span><span style="font-family:Verdana;">more severe dangers of intra-usable and deferred postoperative draining. However, </span><span style="font-family:Verdana;">there is no agreement in regards to the ideal peri-usable administration for the anticipation of free draining and the advancement of ulcer mending. The hugeness of this investigation is to locate a superior procedure to bring down the hazard post ESD draining and to plan to defeat the confinements of re</span><span style="font-family:Verdana;">gular EMR (endoscopic mucosal resection) and fragmented resection for</span><span style="font-family:Verdana;"> early malignant injuries in the three districts which incorporate throat, stomach, and colon. However, it has considered a standard in Eastern Asian nations and Japan because of the incredible importance of ESD. The EMR and </span><span style="font-family:Verdana;">ESD approaches are discussed in this report. Thus, the warning factors for early gastric neoplasms of PPB after ESD were established, and a superi</span><span style="font-family:Verdana;">or technique was created to mitigate the danger of ESD dying. EMR was already wide</span><span style="font-family:Verdana;">ly used for treating early neoplastic sores in the gastrointestinal tra</span><span style="font-family:Verdana;">ct;colon adenoma and colorectal tumors are widely acknowledged.</span></span>展开更多
Field studies were conducted at Hazara Agriculture Research Station, Abbottabad to evaluate thirteen AVRDC lines along with one commercial check (Roma) for potential of fruit yield against septoria leaf spot during su...Field studies were conducted at Hazara Agriculture Research Station, Abbottabad to evaluate thirteen AVRDC lines along with one commercial check (Roma) for potential of fruit yield against septoria leaf spot during summer season 2014. The disease established itself by natural infection and disease severity was estimated with the help of 0 - 5 disease rating scale after 15 days interval from the onset of symptoms. The lines showed significant difference in % septoria leaf spot infection. The disease severity % increased up to 100% in line AVTO1314 whereas the lowest % severity was recorded in AVTO1173 which showed the highest yield (468.1 g) with average fruit weight 122.22 g while the significantly lowest mean yield/plant (35.05 g) was calculated in line AVTO1314 with fruit weight 47.92 g. It was concluded that the line AVTO1173 could be useful in genetic programs for incorporating resistant genes in local tomato germplasm against septoria leaf spot disease.展开更多
Dynamic analysis of malware allows us to examine malware samples, and then group those samples into families based on observed behavior. Using Boolean variables to represent the presence or absence of a range of malwa...Dynamic analysis of malware allows us to examine malware samples, and then group those samples into families based on observed behavior. Using Boolean variables to represent the presence or absence of a range of malware behavior, we create a bitstring that represents each malware behaviorally, and then group samples into the same class if they exhibit the same behavior. Combining class definitions with malware discovery dates, we can construct a timeline of showing the emergence date of each class, in order to examine prevalence, complexity, and longevity of each class. We find that certain behavior classes are more prevalent than others, following a frequency power law. Some classes have had lower longevity, indicating that their attack profile is no longer manifested by new variants of malware, while others of greater longevity, continue to affect new computer systems. We verify for the first time commonly held intuitions on malware evolution, showing quantitatively from the archaeological record that over 80% of the time, classes of higher malware complexity emerged later than classes of lower complexity. In addition to providing historical perspective on malware evolution, the methods described in this paper may aid malware detection through classification, leading to new proactive methods to identify malicious software.展开更多
Nowadays multiple wireless communication systems operate in industrial environments side by side.In such an environment performance of one wireless network can be degraded by the collocated hostile wireless network ha...Nowadays multiple wireless communication systems operate in industrial environments side by side.In such an environment performance of one wireless network can be degraded by the collocated hostile wireless network having higher transmission power or higher carrier sensing threshold.Unlike the previous research works which considered IEEE 802.15.4 for the Industrial Wireless communication systems(iWCS)this paper examines the coexistence of IEEE 802.11 based iWCS used for delay-stringent communication in process automation and gWLAN(general-purpose WLAN)used for non-real time communication.In this paper,we present a Markov chain-based performance model that described the transmission failure of iWCS due to geographical collision with gWLAN.The presented analytic model accurately determines throughput,packet transaction delay,and packet loss probability of iWCS when it is collocated with gWLAN.The results of the Markov model match more than 90%with our simulation results.Furthermore,we proposed an adaptive transmission power control technique for iWCS to overcome the potential interferences caused by the gWLAN transmissions.The simulation results show that the proposed technique significantly improves iWCS performance in terms of throughput,packet transaction,and cycle period reduction.Moreover,it enables the industrial network for the use of delay critical applications in the presence of gWLAN without affecting its performance.展开更多
By using the basic(or q)-Calculus many subclasses of analytic and univalent functions have been generalized and studied from different viewpoints and perspectives.In this paper,we aim to define certain new subclasses ...By using the basic(or q)-Calculus many subclasses of analytic and univalent functions have been generalized and studied from different viewpoints and perspectives.In this paper,we aim to define certain new subclasses of an analytic function.We then give necessary and sufficient conditions for each of the defined function classes.We also study necessary and sufficient conditions for a function whose coefficients are probabilities of q-Poisson distribution.To validate our results,some known consequences are also given in the form of Remarks and Corollaries.展开更多
Heart disease prognosis(HDP)is a difficult undertaking that requires knowledge and expertise to predict early on.Heart failure is on the rise as a result of today’s lifestyle.The healthcare business generates a vast ...Heart disease prognosis(HDP)is a difficult undertaking that requires knowledge and expertise to predict early on.Heart failure is on the rise as a result of today’s lifestyle.The healthcare business generates a vast volume of patient records,which are challenging to manage manually.When it comes to data mining and machine learning,having a huge volume of data is crucial for getting meaningful information.Several methods for predictingHDhave been used by researchers over the last few decades,but the fundamental concern remains the uncertainty factor in the output data,aswell as the need to decrease the error rate and enhance the accuracy of HDP assessment measures.However,in order to discover the optimal HDP solution,this study compares multiple classification algorithms utilizing two separate heart disease datasets from the Kaggle repository and the University of California,Irvine(UCI)machine learning repository.In a comparative analysis,Mean Absolute Error(MAE),Relative Absolute Error(RAE),precision,recall,fmeasure,and accuracy are used to evaluate Linear Regression(LR),Decision Tree(J48),Naive Bayes(NB),Artificial Neural Network(ANN),Simple Cart(SC),Bagging,Decision Stump(DS),AdaBoost,Rep Tree(REPT),and Support Vector Machine(SVM).Overall,the SVM classifier surpasses other classifiers in terms of increasing accuracy and decreasing error rate,with RAE of 33.2631 andMAEof 0.165,the precision of 0.841,recall of 0.835,f-measure of 0.833,and accuracy of 83.49 percent for the dataset gathered from UCI.The SC improves accuracy and reduces the error rate for the Kaggle dataset,which is 3.30%for RAE,0.016 percent for MAE,0.984%for precision,0.984 percent for recall,0.984 percent for f-measure,and 98.44%for accuracy.展开更多
During the United States economic recession of 2008-2011, the number of homeless and unstably housed people in the United States increased considerably. Homeless adult women and unaccompanied homeless youth make up th...During the United States economic recession of 2008-2011, the number of homeless and unstably housed people in the United States increased considerably. Homeless adult women and unaccompanied homeless youth make up the most marginal segments of this population. Because homeless individuals are a hard to reach population, research into these marginal groups has traditionally been a challenge for researchers interested in substance abuse and mental health. Network analysis techniques and research strategies offer means for dealing with traditional challenges such as missing sampling frames, variation in definitions of homelessness and study inclusion criteria, and enumeration/population estimation procedures. This review focuses on the need for, and recent steps toward, solutions to these problems that involve network science strategies for data collection and analysis. Research from a range of fields is reviewed and organized according to a new stress process framework aimed at understanding how homeless status interacts with issues related to substance abuse and mental health. Three types of network innovation are discussed: network scale-up methods, a network ecology approach to social resources, and the integration of network variables into the proposed stress process model of homeless substance abuse and mental health. By employing network methods and integrating these methods into existing models, research on homeless and unstably housed women and unaccompanied young people can address existing research challenges and promote more effective intervention and care programs.展开更多
This paper presents a new method for obtaining network properties from incomplete data sets. Problems associated with missing data represent well-known stumbling blocks in Social Network Analysis. The method of “esti...This paper presents a new method for obtaining network properties from incomplete data sets. Problems associated with missing data represent well-known stumbling blocks in Social Network Analysis. The method of “estimating connectivity from spanning tree completions” (ECSTC) is specifically designed to address situations where only spanning tree(s) of a network are known, such as those obtained through respondent driven sampling (RDS). Using repeated random completions derived from degree information, this method forgoes the usual step of trying to obtain final edge or vertex rosters, and instead aims to estimate network-centric properties of vertices probabilistically from the spanning trees themselves. In this paper, we discuss the problem of missing data and describe the protocols of our completion method, and finally the results of an experiment where ECSTC was used to estimate graph dependent vertex properties from spanning trees sampled from a graph whose characteristics were known ahead of time. The results show that ECSTC methods hold more promise for obtaining network-centric properties of individuals from a limited set of data than researchers may have previously assumed. Such an approach represents a break with past strategies of working with missing data which have mainly sought means to complete the graph, rather than ECSTC’s approach, which is to estimate network properties themselves without deciding on the final edge set.展开更多
Recent interest by physicists in social networks and disease transmission factors has prompted debate over the topology of degree distributions in sexual networks. Social network researchers have been critical of “sc...Recent interest by physicists in social networks and disease transmission factors has prompted debate over the topology of degree distributions in sexual networks. Social network researchers have been critical of “scale-free” Barabasi-Albert approaches, and largely rejected the preferential attachment, “rich-get-richer” assumptions that underlie that model. Instead, research on sexual networks has pointed to the importance of homophily and local sexual norms in dictating degree distributions, and thus disease transmission thresholds. Injecting Drug User (IDU) network topologies may differ from the emerging models of sexual networks, however. Degree distribution analysis of a Brooklyn, NY, IDU network indicates a different topology than the spanning tree configurations discussed for sexual networks, instead featuring comparatively short cycles and high concurrency. Our findings suggest that IDU networks do in some ways conform to a “scale-free” topology, and thus may represent “reservoirs” of potential infection despite seemingly low transmission thresholds.展开更多
Hospital facilities use a collection of heterogeneous devices, produced by many different vendors, to monitor the state of patient vital signs. The limited interoperability of current devices makes it difficult to syn...Hospital facilities use a collection of heterogeneous devices, produced by many different vendors, to monitor the state of patient vital signs. The limited interoperability of current devices makes it difficult to synthesize multivariate monitoring data into a unified array of real-time information regarding the patients state. Without an infrastructure for the integrated evaluation, display, and storage of vital sign data, one cannot adequately ensure that the assignment of caregivers to patients reflects the relative urgency of patient needs. This is an especially serious issue in critical care units (CCUs). We present a formal mathematical model of an operational critical care unit, together with metrics for evaluating the systematic impact of caregiver scheduling decisions on patient care. The model is rich enough to capture the essential features of device and patient diversity, and so enables us to test the hypothesis that integration of vital sign data could realistically yield a significant positive impact on the efficacy of critical care delivery outcome. To test the hypothesis, we employ the model within a computer simulation. The simulation enables us to compare the current scheduling processes in widespread use within CCUs, against a new scheduling algorithm that makes use of an integrated array of patient information collected by an (anticipated) vital sign data integration infrastructure. The simulation study provides clear evidence that such an infrastructure reduces risk to patients and lowers operational costs, and in so doing reveals the inherent costs of medical device non-interoperability.展开更多
基金the support of Prince Sultan University for the Article Processing Charges(APC)of this publication。
文摘Author Profiling (AP) is a subsection of digital forensics that focuses on the detection of the author’s personalinformation, such as age, gender, occupation, and education, based on various linguistic features, e.g., stylistic,semantic, and syntactic. The importance of AP lies in various fields, including forensics, security, medicine, andmarketing. In previous studies, many works have been done using different languages, e.g., English, Arabic, French,etc.However, the research on RomanUrdu is not up to the mark.Hence, this study focuses on detecting the author’sage and gender based on Roman Urdu text messages. The dataset used in this study is Fire’18-MaponSMS. Thisstudy proposed an ensemble model based on AdaBoostM1 and Random Forest (AMBRF) for AP using multiplelinguistic features that are stylistic, character-based, word-based, and sentence-based. The proposed model iscontrasted with several of the well-known models fromthe literature, including J48-Decision Tree (J48),Na飗e Bays(NB), K Nearest Neighbor (KNN), and Composite Hypercube on Random Projection (CHIRP), NB-Updatable,RF, and AdaboostM1. The overall outcome shows the better performance of the proposed AdaboostM1 withRandom Forest (ABMRF) with an accuracy of 54.2857% for age prediction and 71.1429% for gender predictioncalculated on stylistic features. Regarding word-based features, age and gender were considered in 50.5714% and60%, respectively. On the other hand, KNN and CHIRP show the weakest performance using all the linguisticfeatures for age and gender prediction.
文摘This study explores the area of Author Profiling(AP)and its importance in several industries,including forensics,security,marketing,and education.A key component of AP is the extraction of useful information from text,with an emphasis on the writers’ages and genders.To improve the accuracy of AP tasks,the study develops an ensemble model dubbed ABMRF that combines AdaBoostM1(ABM1)and Random Forest(RF).The work uses an extensive technique that involves textmessage dataset pretreatment,model training,and assessment.To evaluate the effectiveness of several machine learning(ML)algorithms in classifying age and gender,including Composite Hypercube on Random Projection(CHIRP),Decision Trees(J48),Na飗e Bayes(NB),K Nearest Neighbor,AdaboostM1,NB-Updatable,RF,andABMRF,they are compared.The findings demonstrate thatABMRFregularly beats the competition,with a gender classification accuracy of 71.14%and an age classification accuracy of 54.29%,respectively.Additional metrics like precision,recall,F-measure,Matthews Correlation Coefficient(MCC),and accuracy support ABMRF’s outstanding performance in age and gender profiling tasks.This study demonstrates the usefulness of ABMRF as an ensemble model for author profiling and highlights its possible uses in marketing,law enforcement,and education.The results emphasize the effectiveness of ensemble approaches in enhancing author profiling task accuracy,particularly when it comes to age and gender identification.
文摘primarily driven by advancements in technology,changes in healthcare delivery,and a deeper understanding of disease processes.Advancements in technology have revolutionized patient monitoring,diagnosis,and treatment in the critical care setting.From minimally invasive procedures to advances imaging techniques,clinicians now have access to a wide array of tools to assess and manage critically ill patients more effectively.In this editorial we comment on the review article published by Padte S et al wherein they concisely describe the latest developments in critical care medicine.
文摘The proliferation of maliciously coded documents as file transfers increase has led to a rise in sophisticated attacks.Portable Document Format(PDF)files have emerged as a major attack vector for malware due to their adaptability and wide usage.Detecting malware in PDF files is challenging due to its ability to include various harmful elements such as embedded scripts,exploits,and malicious URLs.This paper presents a comparative analysis of machine learning(ML)techniques,including Naive Bayes(NB),K-Nearest Neighbor(KNN),Average One Dependency Estimator(A1DE),RandomForest(RF),and SupportVectorMachine(SVM)forPDFmalware detection.The study utilizes a dataset obtained from the Canadian Institute for Cyber-security and employs different testing criteria,namely percentage splitting and 10-fold cross-validation.The performance of the techniques is evaluated using F1-score,precision,recall,and accuracy measures.The results indicate that KNNoutperforms other models,achieving an accuracy of 99.8599%using 10-fold cross-validation.The findings highlight the effectiveness of ML models in accurately detecting PDF malware and provide insights for developing robust systems to protect against malicious activities.
基金the support of the Deputy for Research and Innovation-Ministry of Education,Kingdom of Saudi Arabia for this research at Najran University,Kingdom of Saudi Arabiathe support of the Deputy for Research and Innovation-Ministry of Education,Kingdom of Saudi Arabia for this research through a grant(NU/IFC/ENT/01/014)under the institutional Funding Committee at Najran University,Kingdom of Saudi Arabia.
文摘Today,liver disease,or any deterioration in one’s ability to survive,is extremely common all around the world.Previous research has indicated that liver disease is more frequent in younger people than in older ones.When the liver’s capability begins to deteriorate,life can be shortened to one or two days,and early prediction of such diseases is difficult.Using several machine learning(ML)approaches,researchers analyzed a variety of models for predicting liver disorders in their early stages.As a result,this research looks at using the Random Forest(RF)classifier to diagnose the liver disease early on.The dataset was picked from the University of California,Irvine repository.RF’s accomplishments are contrasted to those of Multi-Layer Perceptron(MLP),Average One Dependency Estimator(A1DE),Support Vector Machine(SVM),Credal Decision Tree(CDT),Composite Hypercube on Iterated Random Projection(CHIRP),K-nearest neighbor(KNN),Naïve Bayes(NB),J48-Decision Tree(J48),and Forest by Penalizing Attributes(Forest-PA).Some of the assessment measures used to evaluate each classifier include Root Relative Squared Error(RRSE),Root Mean Squared Error(RMSE),accuracy,recall,precision,specificity,Matthew’s Correlation Coefficient(MCC),F-measure,and G-measure.RF has an RRSE performance of 87.6766 and an RMSE performance of 0.4328,however,its percentage accuracy is 72.1739.The widely acknowledged result of this work can be used as a starting point for subsequent research.As a result,every claim that a new model,framework,or method enhances forecastingmay be benchmarked and demonstrated.
基金support of the Deputy for Research and Innovation-Ministry of Education,Kingdom of Saudi Arabia for this research through a grant(NU/IFC/ENT/01/014)under the institutional Funding Committee at Najran University,Kingdom of Saudi Arabia.
文摘In the medical profession,recent technological advancements play an essential role in the early detection and categorization of many diseases that cause mortality.The technique rising on daily basis for detecting illness in magnetic resonance through pictures is the inspection of humans.Automatic(computerized)illness detection in medical imaging has found you the emergent region in several medical diagnostic applications.Various diseases that cause death need to be identified through such techniques and technologies to overcome the mortality ratio.The brain tumor is one of the most common causes of death.Researchers have already proposed various models for the classification and detection of tumors,each with its strengths and weaknesses,but there is still a need to improve the classification process with improved efficiency.However,in this study,we give an in-depth analysis of six distinct machine learning(ML)algorithms,including Random Forest(RF),Naïve Bayes(NB),Neural Networks(NN),CN2 Rule Induction(CN2),Support Vector Machine(SVM),and Decision Tree(Tree),to address this gap in improving accuracy.On the Kaggle dataset,these strategies are tested using classification accuracy,the area under the Receiver Operating Characteristic(ROC)curve,precision,recall,and F1 Score(F1).The training and testing process is strengthened by using a 10-fold cross-validation technique.The results show that SVM outperforms other algorithms,with 95.3%accuracy.
文摘Phishing attacks pose a significant security threat by masquerading as trustworthy entities to steal sensitive information,a problem that persists despite user awareness.This study addresses the pressing issue of phishing attacks on websites and assesses the performance of three prominent Machine Learning(ML)models—Artificial Neural Networks(ANN),Convolutional Neural Networks(CNN),and Long Short-Term Memory(LSTM)—utilizing authentic datasets sourced from Kaggle and Mendeley repositories.Extensive experimentation and analysis reveal that the CNN model achieves a better accuracy of 98%.On the other hand,LSTM shows the lowest accuracy of 96%.These findings underscore the potential of ML techniques in enhancing phishing detection systems and bolstering cybersecurity measures against evolving phishing tactics,offering a promising avenue for safeguarding sensitive information and online security.
文摘Movies are the better source of entertainment.Every year,a great percentage of movies are released.People comment on movies in the form of reviews after watching them.Since it is difficult to read all of the reviews for a movie,summarizing all of the reviews will help make this decision without wasting time in reading all of the reviews.Opinion mining also known as sentiment analysis is the process of extracting subjective information from textual data.Opinion mining involves identifying and extracting the opinions of individuals,which can be positive,neutral,or negative.The task of opinion mining also called sentiment analysis is performed to understand people’s emotions and attitudes in movie reviews.Movie reviews are an important source of opinion data because they provide insight into the general public’s opinions about a particular movie.The summary of all reviews can give a general idea about the movie.This study compares baseline techniques,Logistic Regression,Random Forest Classifier,Decision Tree,K-Nearest Neighbor,Gradient Boosting Classifier,and Passive Aggressive Classifier with Linear Support Vector Machines and Multinomial Naïve Bayes on the IMDB Dataset of 50K reviews and Sentiment Polarity Dataset Version 2.0.Before applying these classifiers,in pre-processing both datasets are cleaned,duplicate data is dropped and chat words are treated for better results.On the IMDB Dataset of 50K reviews,Linear Support Vector Machines achieve the highest accuracy of 89.48%,and after hyperparameter tuning,the Passive Aggressive Classifier achieves the highest accuracy of 90.27%,while Multinomial Nave Bayes achieves the highest accuracy of 70.69%and 71.04%after hyperparameter tuning on the Sentiment Polarity Dataset Version 2.0.This study highlights the importance of sentiment analysis as a tool for understanding the emotions and attitudes in movie reviews and predicts the performance of a movie based on the average sentiment of all the reviews.
文摘Author’s Profile(AP)may only be displayed as an article,similar to text collection of material,and must differentiate between gender,age,education,occupation,local language,and relative personality traits.In several informationrelated fields,including security,forensics,and marketing,and medicine,AP prediction is a significant issue.For instance,it is important to comprehend who wrote the harassing communication.In essence,from a marketing perspective,businesses will get to know one another through examining items and websites on the internet.Accordingly,they will direct their efforts towards a certain gender or age restriction based on the kind of individuals who comment on their products.Recently many approaches have been presented many techniques to automatically detect user age and gender from the language which is based on text,documents,or comments on social media.The purpose of this research is to classify age(18–24,25–34,35–49,50–64,and 65–70)and gender(male,female)from a PAN 2014 Hotel Reviews dataset of the English language.The usage of six machine learning models is the main emphasis of this work,including the methods of Support Vector Machine(SVM),Random Forest(RF),Naive Bayes(NB),Logistic Regression(LR),Decision Tree(DT)and K-Nearest Neighbors(KNN).
文摘<span style="font-family:;" "=""><span style="font-family:Verdana;">Endosco</span><span style="font-family:Verdana;">pic sub-mucosal dismemberment (ESD) has become a settled strat</span><span style="font-family:Verdana;">egy for treatment of shallow neoplasms in the gastrointestinal tract. In three local areas, ESD was introduced to overcome traditional endoscopic mucous resecti</span><span style="font-family:Verdana;">on (EMR) and inadequate resection of the EMR, combining mout</span><span style="font-family:Verdana;">h, stomach, and the colon, for early disruptive sores. ESD was grown first in Japan since that nation has the highest predominance of gastric malignant growth on the p</span><span style="font-family:Verdana;">lanet. Endoscopic sub-mucosal analyzation causes enormous fake ulc</span><span style="font-family:Verdana;">ers with </span><span style="font-family:Verdana;">more severe dangers of intra-usable and deferred postoperative draining. However, </span><span style="font-family:Verdana;">there is no agreement in regards to the ideal peri-usable administration for the anticipation of free draining and the advancement of ulcer mending. The hugeness of this investigation is to locate a superior procedure to bring down the hazard post ESD draining and to plan to defeat the confinements of re</span><span style="font-family:Verdana;">gular EMR (endoscopic mucosal resection) and fragmented resection for</span><span style="font-family:Verdana;"> early malignant injuries in the three districts which incorporate throat, stomach, and colon. However, it has considered a standard in Eastern Asian nations and Japan because of the incredible importance of ESD. The EMR and </span><span style="font-family:Verdana;">ESD approaches are discussed in this report. Thus, the warning factors for early gastric neoplasms of PPB after ESD were established, and a superi</span><span style="font-family:Verdana;">or technique was created to mitigate the danger of ESD dying. EMR was already wide</span><span style="font-family:Verdana;">ly used for treating early neoplastic sores in the gastrointestinal tra</span><span style="font-family:Verdana;">ct;colon adenoma and colorectal tumors are widely acknowledged.</span></span>
文摘Field studies were conducted at Hazara Agriculture Research Station, Abbottabad to evaluate thirteen AVRDC lines along with one commercial check (Roma) for potential of fruit yield against septoria leaf spot during summer season 2014. The disease established itself by natural infection and disease severity was estimated with the help of 0 - 5 disease rating scale after 15 days interval from the onset of symptoms. The lines showed significant difference in % septoria leaf spot infection. The disease severity % increased up to 100% in line AVTO1314 whereas the lowest % severity was recorded in AVTO1173 which showed the highest yield (468.1 g) with average fruit weight 122.22 g while the significantly lowest mean yield/plant (35.05 g) was calculated in line AVTO1314 with fruit weight 47.92 g. It was concluded that the line AVTO1173 could be useful in genetic programs for incorporating resistant genes in local tomato germplasm against septoria leaf spot disease.
文摘Dynamic analysis of malware allows us to examine malware samples, and then group those samples into families based on observed behavior. Using Boolean variables to represent the presence or absence of a range of malware behavior, we create a bitstring that represents each malware behaviorally, and then group samples into the same class if they exhibit the same behavior. Combining class definitions with malware discovery dates, we can construct a timeline of showing the emergence date of each class, in order to examine prevalence, complexity, and longevity of each class. We find that certain behavior classes are more prevalent than others, following a frequency power law. Some classes have had lower longevity, indicating that their attack profile is no longer manifested by new variants of malware, while others of greater longevity, continue to affect new computer systems. We verify for the first time commonly held intuitions on malware evolution, showing quantitatively from the archaeological record that over 80% of the time, classes of higher malware complexity emerged later than classes of lower complexity. In addition to providing historical perspective on malware evolution, the methods described in this paper may aid malware detection through classification, leading to new proactive methods to identify malicious software.
基金This research was supported by the Basic Science Research Program through the National Research Foundation of Korea(NRF)funded by the Ministry of Education(No.2018R1D1A1B07049758).
文摘Nowadays multiple wireless communication systems operate in industrial environments side by side.In such an environment performance of one wireless network can be degraded by the collocated hostile wireless network having higher transmission power or higher carrier sensing threshold.Unlike the previous research works which considered IEEE 802.15.4 for the Industrial Wireless communication systems(iWCS)this paper examines the coexistence of IEEE 802.11 based iWCS used for delay-stringent communication in process automation and gWLAN(general-purpose WLAN)used for non-real time communication.In this paper,we present a Markov chain-based performance model that described the transmission failure of iWCS due to geographical collision with gWLAN.The presented analytic model accurately determines throughput,packet transaction delay,and packet loss probability of iWCS when it is collocated with gWLAN.The results of the Markov model match more than 90%with our simulation results.Furthermore,we proposed an adaptive transmission power control technique for iWCS to overcome the potential interferences caused by the gWLAN transmissions.The simulation results show that the proposed technique significantly improves iWCS performance in terms of throughput,packet transaction,and cycle period reduction.Moreover,it enables the industrial network for the use of delay critical applications in the presence of gWLAN without affecting its performance.
文摘By using the basic(or q)-Calculus many subclasses of analytic and univalent functions have been generalized and studied from different viewpoints and perspectives.In this paper,we aim to define certain new subclasses of an analytic function.We then give necessary and sufficient conditions for each of the defined function classes.We also study necessary and sufficient conditions for a function whose coefficients are probabilities of q-Poisson distribution.To validate our results,some known consequences are also given in the form of Remarks and Corollaries.
基金Authors would like to acknowledge the support of the Deputy for Research and Innovation-Ministry of Education,Kingdom of Saudi Arabia for this research at Najran University,Kingdom of Saudi Arabia.
文摘Heart disease prognosis(HDP)is a difficult undertaking that requires knowledge and expertise to predict early on.Heart failure is on the rise as a result of today’s lifestyle.The healthcare business generates a vast volume of patient records,which are challenging to manage manually.When it comes to data mining and machine learning,having a huge volume of data is crucial for getting meaningful information.Several methods for predictingHDhave been used by researchers over the last few decades,but the fundamental concern remains the uncertainty factor in the output data,aswell as the need to decrease the error rate and enhance the accuracy of HDP assessment measures.However,in order to discover the optimal HDP solution,this study compares multiple classification algorithms utilizing two separate heart disease datasets from the Kaggle repository and the University of California,Irvine(UCI)machine learning repository.In a comparative analysis,Mean Absolute Error(MAE),Relative Absolute Error(RAE),precision,recall,fmeasure,and accuracy are used to evaluate Linear Regression(LR),Decision Tree(J48),Naive Bayes(NB),Artificial Neural Network(ANN),Simple Cart(SC),Bagging,Decision Stump(DS),AdaBoost,Rep Tree(REPT),and Support Vector Machine(SVM).Overall,the SVM classifier surpasses other classifiers in terms of increasing accuracy and decreasing error rate,with RAE of 33.2631 andMAEof 0.165,the precision of 0.841,recall of 0.835,f-measure of 0.833,and accuracy of 83.49 percent for the dataset gathered from UCI.The SC improves accuracy and reduces the error rate for the Kaggle dataset,which is 3.30%for RAE,0.016 percent for MAE,0.984%for precision,0.984 percent for recall,0.984 percent for f-measure,and 98.44%for accuracy.
文摘During the United States economic recession of 2008-2011, the number of homeless and unstably housed people in the United States increased considerably. Homeless adult women and unaccompanied homeless youth make up the most marginal segments of this population. Because homeless individuals are a hard to reach population, research into these marginal groups has traditionally been a challenge for researchers interested in substance abuse and mental health. Network analysis techniques and research strategies offer means for dealing with traditional challenges such as missing sampling frames, variation in definitions of homelessness and study inclusion criteria, and enumeration/population estimation procedures. This review focuses on the need for, and recent steps toward, solutions to these problems that involve network science strategies for data collection and analysis. Research from a range of fields is reviewed and organized according to a new stress process framework aimed at understanding how homeless status interacts with issues related to substance abuse and mental health. Three types of network innovation are discussed: network scale-up methods, a network ecology approach to social resources, and the integration of network variables into the proposed stress process model of homeless substance abuse and mental health. By employing network methods and integrating these methods into existing models, research on homeless and unstably housed women and unaccompanied young people can address existing research challenges and promote more effective intervention and care programs.
文摘This paper presents a new method for obtaining network properties from incomplete data sets. Problems associated with missing data represent well-known stumbling blocks in Social Network Analysis. The method of “estimating connectivity from spanning tree completions” (ECSTC) is specifically designed to address situations where only spanning tree(s) of a network are known, such as those obtained through respondent driven sampling (RDS). Using repeated random completions derived from degree information, this method forgoes the usual step of trying to obtain final edge or vertex rosters, and instead aims to estimate network-centric properties of vertices probabilistically from the spanning trees themselves. In this paper, we discuss the problem of missing data and describe the protocols of our completion method, and finally the results of an experiment where ECSTC was used to estimate graph dependent vertex properties from spanning trees sampled from a graph whose characteristics were known ahead of time. The results show that ECSTC methods hold more promise for obtaining network-centric properties of individuals from a limited set of data than researchers may have previously assumed. Such an approach represents a break with past strategies of working with missing data which have mainly sought means to complete the graph, rather than ECSTC’s approach, which is to estimate network properties themselves without deciding on the final edge set.
文摘Recent interest by physicists in social networks and disease transmission factors has prompted debate over the topology of degree distributions in sexual networks. Social network researchers have been critical of “scale-free” Barabasi-Albert approaches, and largely rejected the preferential attachment, “rich-get-richer” assumptions that underlie that model. Instead, research on sexual networks has pointed to the importance of homophily and local sexual norms in dictating degree distributions, and thus disease transmission thresholds. Injecting Drug User (IDU) network topologies may differ from the emerging models of sexual networks, however. Degree distribution analysis of a Brooklyn, NY, IDU network indicates a different topology than the spanning tree configurations discussed for sexual networks, instead featuring comparatively short cycles and high concurrency. Our findings suggest that IDU networks do in some ways conform to a “scale-free” topology, and thus may represent “reservoirs” of potential infection despite seemingly low transmission thresholds.
文摘Hospital facilities use a collection of heterogeneous devices, produced by many different vendors, to monitor the state of patient vital signs. The limited interoperability of current devices makes it difficult to synthesize multivariate monitoring data into a unified array of real-time information regarding the patients state. Without an infrastructure for the integrated evaluation, display, and storage of vital sign data, one cannot adequately ensure that the assignment of caregivers to patients reflects the relative urgency of patient needs. This is an especially serious issue in critical care units (CCUs). We present a formal mathematical model of an operational critical care unit, together with metrics for evaluating the systematic impact of caregiver scheduling decisions on patient care. The model is rich enough to capture the essential features of device and patient diversity, and so enables us to test the hypothesis that integration of vital sign data could realistically yield a significant positive impact on the efficacy of critical care delivery outcome. To test the hypothesis, we employ the model within a computer simulation. The simulation enables us to compare the current scheduling processes in widespread use within CCUs, against a new scheduling algorithm that makes use of an integrated array of patient information collected by an (anticipated) vital sign data integration infrastructure. The simulation study provides clear evidence that such an infrastructure reduces risk to patients and lowers operational costs, and in so doing reveals the inherent costs of medical device non-interoperability.