Project Login
Registration No:
Password:
MAIL ALERTS SMS ALERTS
 
     
   
     

A DECISION-THEORETIC FRAMEWORK FOR NUMERICAL ATTRIBUTE VALUE RECONCILIATION

Platform : DOT NET

IEEE Projects Years : 2012 - 13

A DECISION-THEORETIC FRAMEWORK FOR NUMERICAL ATTRIBUTE VALUE RECONCILIATION

 

ABSTRACT:

 

 

 

One of the major challenges of data integration is to resolve conflicting numerical attribute values caused by data heterogeneity. In addressing this problem, existing approaches proposed in prior literature often ignore such data inconsistencies or resolve them in an ad hoc manner. In this study, we propose a decision-theoretical framework that resolves numerical value conflicts in a systematic manner. The framework takes into consideration the consequences of incorrect numerical values and selects the value that minimizes the expected cost of errors for all data application problems under consideration. Experimental results show that significant savings can be achieved by adopting the proposed framework instead of ad hoc approaches.

 

 

 

 

 

EXISTING SYSTEM:

 

 

 

Data integration remains one of the most difficult tasks in data asset management. Challenges exist at three different levels: schema heterogeneity, entity heterogeneity, and data heterogeneity. Among them, schema heterogeneity is caused by the use of different structures and/or different names for the same information. Entity heterogeneity arises when information about the same real world entity is stored in different data sources using different identifiers. Data heterogeneity refers to data inconsistencies in the absence of schema heterogeneity and entity heterogeneity.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

PROPOSED SYSTEM

 

 

 

  • The attribute value reconciliation framework we propose in this study is decision-theoretic in nature—with the correct attribute value unknown, the framework selects the value that minimizes the total expected error cost for all data usage problems under consideration.
  • To obtain the cost-minimizing value, the proposed framework explicitly takes into consideration the probability distribution of the true attribute values and the costs of errors for all problems.
  • The general framework consists of several major steps: estimating the posterior probability of possible true values, deriving the cost associated with each candidate value, and selecting the value that minimizes the expected cost of errors. To make the proposed framework feasible for attributes with large domains, we also develop techniques to reduce the number of candidate values to be considered and the complexity of probability estimation

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Software Requirements:

 

  • .Net
  • Front End- ASP.Net
  • Back End- SQL Server
  • Language- C#.Net
  • Windows XP

 

 

 

Hardware Requirements:

 

  • RAM         : 512 Mb
  • Hard Disk : 80 Gb
  • Processor  : Pentium IV

 

 

 

Modules

 

 

 

Key extraction

 

 

 

Our work on extracting competitive domain from web pages is related to previous studies of identifying key (salient) phrases in text mining. Many features have been proposed. One popular used property is Term Frequency / Inverted Document Frequency (TFIDF). The independence of a phrase is also proposed and measured by the entropy of its context. More properties like phrase length are also studied. Key phrase extraction has many applications, e.g., search result clustering, topic mining. Zamir and Etzioni resented a Suffix Tree Clustering (STC) which first identifies sets of documents that share common phrases, and then creates clusters according to these phrases. Zeng et al. used a machine learning method to rank the salient phrases for clustering the web results. Used a key phrase extraction algorithm to extract sub-topic terms for a given topic. Differently, our work uses the salient phrases ranking method to discover the competitive domains and further calculates several important properties to identify the domains for two comparative entities.

 

 

 

search and mining

 

 

 

Much work has been done on helping companies and individuals gain marketing information by mining online resources. For example, product reputation mining customer opinion extraction and summarization and sentiment classification. However, none of them detects the comparable products or discovers company’ competitors. Studies of comparative search with different granularity have also been investigated, e.g., comparative search engine, comparative web site mining, comparative text collection mining and comparative sentence mining. Sun proposed a comparative search engine, where inputting two highly related an entity is the key. In contrast to this, inputting one entity only is enough for our system and all competitive entities against the given one will be extracted. In Liu et al. compared two web sites to find unexpected information. Zang and Zhai defined a comparative text mining problem (CTM) and proposed a Mixture Model working well in the aspect of discovering common themes and specific themes for each collection. CoMiner is distinguished from above, since it is to mine competitors instead of comparing the content and structure of two web sites/text collections. More recently, comparative sentence mining, which is used in our competitive evidence mining. Differently, we need to quickly identify the competitive evidences in a web-scale setting. A simplified yet effective approach is proposed based on our observation.

 

 

 

COMPETITORS FROM THE WEB

 

 

 

The task of mining competitors includes acquiring the competitors for a given entity, elaborating the competitive domains (fields) with respect to the competitors, and summarizing the opinion of detailed competitive evidences. Let us first formally define competitor, competitive domain, and competitive evidence for the ease of understanding as follows:

 

  • Competitor: The competitor of a particular entity E is a contestant C that E hopes to defeat. For example, letting Microsoft be the given entity, competitor may be Google, Sony, etc.
  •  Competitive Domain: The competitive domain between the given entity E and its competitor C is a phrase D which describes the field or feature that E and C compete with. For example, a well known competitive domain between Microsoft and Google would be web search.
  • Competitive Evidence: The competitive evidence is a sentence that expresses a competitive/comperative relation based on similarities or differences of the given entity E and its competitor C. For example, the sentence “Google only ahead of Microsoft in search” said by Ballmer is obviously a competitive evidence of Google and Microsoft in their competitive domain search.

 

In this paper, a new algorithm, CoMiner, is proposed for mining competitors from the web with the help of web search engine.

 

 

 

COMPETITOR DISCOVERY

 

The objective of this step is to extract then to rank the competitors of the given entity from a set of pages returned by the search engine. Our competitor discovery algorithm is based on the following observations.

 

  • Web Redundancy: Although on the web there are lots of varied expressions which indicate the comparative relationships between the given entity and its competitors, we need only a few common patterns to extract candidates from the web pages. In another word, the entities distributed in the infinite domains can be extracted with the use of finite kinds of patterns. E.g., Sony and its competitor Microsoft may appear in many commonly used forms, like Microsoft vs. Sony and Microsoft or Sony.
  • Uneven Co-occurrence: It means that the entity and its competitor usually have much more co occurrence than those non-competitor pairs. For example, people often discuss Sony together with Microsoft rather than Google which does not compete with Sony in many domains. Based on the above observations, the detailed algorithm for competitor discovery is proposed

 

 



NOW GET PROJECTS ! GET TRAINED ! GET PLACED !

IEEE, NON-IEEE, REAL TIME LIVE ACADEMIC PROJECTS,

PROJECTS WITH COMPLETE COURSES,SOFT SKILLS & PLACEMENTS

ALLOVER INDIA & WORLD WIDE,

HOSTEL FACILITY AVAILABLE FOR GIRLS & BOYS SEPARATELY,

CALL: 08985129129 ,  E-Mail Id: support@ascentit.in

REGISTER FOR PROJECTS NOW ! GET DISCOUNT
   
1