DSpace Repository

Improving Multi-class Text Classification with Naive Bayes

Show simple item record

dc.creator Rennie, Jason D. M.
dc.date 2004-10-20T20:28:16Z
dc.date 2004-10-20T20:28:16Z
dc.date 2001-09-01
dc.date.accessioned 2013-10-09T02:48:10Z
dc.date.available 2013-10-09T02:48:10Z
dc.date.issued 2013-10-09
dc.identifier AITR-2001-004
dc.identifier http://hdl.handle.net/1721.1/7074
dc.identifier.uri http://koha.mediu.edu.my:8181/xmlui/handle/1721
dc.description There are numerous text documents available in electronic form. More and more are becoming available every day. Such documents represent a massive amount of information that is easily accessible. Seeking value in this huge collection requires organization; much of the work of organizing documents can be automated through text classification. The accuracy and our understanding of such systems greatly influences their usefulness. In this paper, we seek 1) to advance the understanding of commonly used text classification techniques, and 2) through that understanding, improve the tools that are available for text classification. We begin by clarifying the assumptions made in the derivation of Naive Bayes, noting basic properties and proposing ways for its extension and improvement. Next, we investigate the quality of Naive Bayes parameter estimates and their impact on classification. Our analysis leads to a theorem which gives an explanation for the improvements that can be found in multiclass classification with Naive Bayes using Error-Correcting Output Codes. We use experimental evidence on two commonly-used data sets to exhibit an application of the theorem. Finally, we show fundamental flaws in a commonly-used feature selection algorithm and develop a statistics-based framework for text feature selection. Greater understanding of Naive Bayes and the properties of text allows us to make better use of it in text classification.
dc.format 49 p.
dc.format 2017370 bytes
dc.format 687421 bytes
dc.format application/postscript
dc.format application/pdf
dc.language en_US
dc.relation AITR-2001-004
dc.subject AI
dc.subject naive bayes
dc.subject text
dc.subject classification
dc.subject feature selection
dc.title Improving Multi-class Text Classification with Naive Bayes


Files in this item

Files Size Format View

There are no files associated with this item.

This item appears in the following Collection(s)

Show simple item record

Search DSpace


Advanced Search

Browse

My Account