论文修改 Data mining and semantic web

8年前 904次浏览 论文修改 Data mining and semantic web已关闭评论

2 An ontology-based Framework for Text Mining

This framework is constructed by S. Bloehdorn, P. Cimiano, A. Htho and s. Staab[1] that uses text mining to learn the target ontology from text documents and uses then the same target ontology in order to improve the effectiveness of both supervised an unsupervised text categorization approaches.

The architecture builds upon the Karlsruhe Ontology and Semantic Web Infrastructure (KAON) that's a general and multi-functional open source ontology management infrastructure and tool suite developed at Karlsruhe University. In this framework some definitions of ontology is given that define the core ontology, sub concepts and super concepts, domain and range, lexicon for an ontology and knowledge base. The main component of the framework that is responsible for creating and maintaining ontologies is "TextToOnto". It employs text mining techniques such as term clustering and matching of lexico-syntactic patterns as well as other resources of a general nature such as WordNet[1]. It has three main components: Ontology Management Component that provides basic ontology management such as editing and browsing and evolution of ontologies. The second component is the Algorithm Library Component that incorporates a number of text mining methods. The third component is Coordination Component that is used to interact with the different ontology learning algorithms from the algorithm library.

2.1 Ontology-based Text Clustering and Classification

The demand of systems that automatically classify text documents into predefined thematic classes or detect clusters of documents with similar content is very urgent due to the ever growing amount of textual information available electronically. Existing text categorization systems have typically used the Bag-of-Words model that is a model in information retrieval where single words or word stems are uses as features for representing document content. In this paradigm documents are represented as bags of terms. The absolute frequency of term t in document d is given by tf(d,t) and Term vectors are denoted td = (tf(d, t1); : : : ; tf(d, tm)).

To exploit background knowledge about concepts that is given according to the ontology model, term vectors extended by new entries for ontological concepts c appearing in the document set.

The process of extracting concepts from texts has five steps: 1. Candidate Term Detection that's an algorithm that maps multi-word expression to the most appropriate concept.2. Syntactical Patterns that uses part-of-speech tags of the words3. Morphological Transformations 4. Word Sense Disambiguation 5. Generalization: The last step in the process is about going from the specific concepts found in the text to more general concept representations.

3 Semantic Web and data mining in Healthcare

This section discuss about use of semantic web and data mining in health care. First part discuss about overall usage, 3.2 discusses about using semantic dependencies to mine depressive symptoms from consultation records and 3.3 discusses about the requirements for ontologies in medical data integration.

3.1 Overview

The Web has become a major vehicle in performing research and practice related activities for healthcare researchers and practitioners, because it has so many resources and potentials to offer in their specialized professional fields. []. There is tremendous amount of information and knowledge existing on the Web and waiting to be discovered, shared and utilized. The research in improving the quality of life through the Web has become attractive. Both healthcare researchers and practitioners require a lot of information to make their healthcare related activities and practices either with drug prescriptions which can effectively cure patients' illness or with correct and efficient medical/clinical procedures and services. Information technology has been playing an important and critical role in this field for many years. By using the Semantic Web and mining technologies, not only can researchers and practitioners in healthcare from different countries share their information by exchanging the XML-based ontology, but they can also effectively collaborate on healthcare research projects and work closely together as a team. By focusing on the semantic based information, they will have better access to the knowledge and information required to effectively prescribe drugs and medical procedures to prevent/treat dangerous and infectious diseases. Researchers and practitioners in healthcare have access to the databases of the latest diseases, their symptoms, treatments, diagnosis analysis and other important information. This kind of information can be structured in a more understandable and machine interpretable way by using Semantic Web languages. If this is done successfully, then this ontology or RDF can be fed into an inference engine, which can effectively make new discoveries useful to the patient treatment procedures or the general healthcare activities. Ontologies play a key role in describing semantics of data in both traditional knowledge engineering and emerging Semantic Web. Since ontology defines the exact nature of every resource in its domain and the relationship among these resources, it becomes much simpler to extract the users' needs and usage tendencies.

网络已成为一个主要的车辆进行研究和实践相关的活动,为医疗保健研究人员和从业者,因为它有这么多的资源和潜力,在他们的专业领域提供。[ ]。有大量的信息和知识存在于网络上,并等待被发现,共享和利用。通过网络提高生活质量的研究已成为有吸引力的。无论是医疗保健研究人员和从业者都需要大量的信息,使他们的医疗保健相关的活动和做法,无论是与药物处方,可以有效地治愈患者的疾病或正确和有效的医疗/临床程序和服务。多年来,信息技术在这一领域发挥着重要而关键的作用。通过使用语义Web和挖掘技术,不仅可以在医疗保健的研究人员和从业者共享他们的信息通过交换基于XML的本体论,但他们也可以有效地合作医疗保健研究项目,并紧密合作,作为一个团队。通过专注于基于语义的信息,他们将有更好的访问到所需的知识和信息,有效地规定药物和医疗程序,以防止/治疗危险和传染病。医疗保健的研究人员和从业者可以访问最新的疾病的数据库,他们的症状,治疗,诊断分析和其他重要信息。这种信息可以以更容易理解,机器可解释的方式,通过使用语义Web语言。如果这是成功的,那么这个本体或RDF可以放入一个推理引擎,可以有效地使新发现病人或治疗程序的一般医疗活动的有用。本体在传统知识工程和新兴语义网中对数据语义的描述中起着关键性的作用。由于本体定义了其域中的每一个资源的确切性质和这些资源之间的关系,它变得更简单,提取用户的需求和使用倾向。

这些您可能会感兴趣

筛选出你可能感兴趣的一些文章,让您更加的了解我们。