An encoding method has a direct effect on the quality and the representationof the discovered knowledge in data mining systems. Biological macromolecules are encoded by stringsof characters, called primary structures....An encoding method has a direct effect on the quality and the representationof the discovered knowledge in data mining systems. Biological macromolecules are encoded by stringsof characters, called primary structures. Knowing that data mining systems usually use relationaltables to encode data, we have then to re-encode these strings and transform them into relationaltables. In this paper, we do a comparative study of the existing static encoding methods, that arebased on the Biologist know-how, and our new dynamic encoding one, that is based on the constructionof Discriminant and Minimal Substrings (DMS). Different classification methods are used to do thisstudy. The experimental results show that our dynamic encoding method is more efficient than thestatic ones, to encode biological macromolecules within a data mining perspective.展开更多
文摘An encoding method has a direct effect on the quality and the representationof the discovered knowledge in data mining systems. Biological macromolecules are encoded by stringsof characters, called primary structures. Knowing that data mining systems usually use relationaltables to encode data, we have then to re-encode these strings and transform them into relationaltables. In this paper, we do a comparative study of the existing static encoding methods, that arebased on the Biologist know-how, and our new dynamic encoding one, that is based on the constructionof Discriminant and Minimal Substrings (DMS). Different classification methods are used to do thisstudy. The experimental results show that our dynamic encoding method is more efficient than thestatic ones, to encode biological macromolecules within a data mining perspective.