Improving Multi-class Text Classification with Naive Bayes

Please use this identifier to cite or link to this item: http://dspace.mediu.edu.my:8181/xmlui/handle/1721.1/7074

Full metadata record

DC Field	Value	Language
dc.creator	Rennie, Jason D. M.	-
dc.date	2004-10-20T20:28:16Z	-
dc.date	2004-10-20T20:28:16Z	-
dc.date	2001-09-01	-
dc.date.accessioned	2013-10-09T02:48:10Z	-
dc.date.available	2013-10-09T02:48:10Z	-
dc.date.issued	2013-10-09	-
dc.identifier	AITR-2001-004	-
dc.identifier	http://hdl.handle.net/1721.1/7074	-
dc.identifier.uri	http://koha.mediu.edu.my:8181/xmlui/handle/1721	-
dc.description	There are numerous text documents available in electronic form. More and more are becoming available every day. Such documents represent a massive amount of information that is easily accessible. Seeking value in this huge collection requires organization; much of the work of organizing documents can be automated through text classification. The accuracy and our understanding of such systems greatly influences their usefulness. In this paper, we seek 1) to advance the understanding of commonly used text classification techniques, and 2) through that understanding, improve the tools that are available for text classification. We begin by clarifying the assumptions made in the derivation of Naive Bayes, noting basic properties and proposing ways for its extension and improvement. Next, we investigate the quality of Naive Bayes parameter estimates and their impact on classification. Our analysis leads to a theorem which gives an explanation for the improvements that can be found in multiclass classification with Naive Bayes using Error-Correcting Output Codes. We use experimental evidence on two commonly-used data sets to exhibit an application of the theorem. Finally, we show fundamental flaws in a commonly-used feature selection algorithm and develop a statistics-based framework for text feature selection. Greater understanding of Naive Bayes and the properties of text allows us to make better use of it in text classification.	-
dc.format	49 p.	-
dc.format	2017370 bytes	-
dc.format	687421 bytes	-
dc.format	application/postscript	-
dc.format	application/pdf	-
dc.language	en_US	-
dc.relation	AITR-2001-004	-
dc.subject	AI	-
dc.subject	naive bayes	-
dc.subject	text	-
dc.subject	classification	-
dc.subject	feature selection	-
dc.title	Improving Multi-class Text Classification with Naive Bayes	-
Appears in Collections:	MIT Items

Files in This Item:

There are no files associated with this item.

Show simple item record

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets