به نام خدا
Title: Threading Machine Generated Email
Authors: Nir Ailon, Zohar S Karnin, Edo Liberty, Yoelle Maarek
Abstract: Viewing email messages as parts of a sequence or a thread isa convenient way to quickly understand their context. Cur-rent threading techniques rely on purely syntactic methods,matching sender information, subject line, and reply/forwardpre�xes. As such, they are mostly limited to personal con-versations. In contrast, machine-generated email, whichamount, as per our experiments, to more than 60% of theoverall email traffic, requires a different kind of threadingthat should reflect how a sequence of emails is caused bya few related user actions. For example, purchasing goodsfrom an online store will result in a receipt or a con�rma-tion message, which may be followed, possibly after a fewdays, by a shipment noti�cation message from an expressshipping service. In today_s mail systems, they will not bea part of the same thread, while we believe they should.In this paper, we focus on this type of threading that wecoin “causal threading�. We demonstrate that, by analyzingrecurring patterns over hundreds of millions of mail users,we can infer a causality relation between these two indi-vidual messages. In addition, by observing multiple causalrelations over common messages, we can generate “causalthreads� over a sequence of messages. The four key stagesof our approach consist of: (1) identifying messages that areinstances of the same email type or“template� (generated bythe same machine process on the sender side) (2) building acausal graph, in which nodes correspond to email templatesand edges indicate potential causal relations (3) learning acausal relation prediction function, and (4) automatically“threading� the incoming email stream. We present detailedexperimental results obtained by analyzing the inboxes of12.5 million Yahoo! Mail users, who voluntarily opted-in forsuch research. Supervised editorial judgments show thatwe can identify more than 70% (recall rate) of all “causalthreads�at a precision level of 90%. In addition, for a searchscenario we show that we achieve a precision close to 80%at 90% recall. We believe that supporting causal threads inPermission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for profit or commercial advantage and that copiesbear this notice and the full citation on the first page. To copy otherwise, torepublish, to post on servers or to redistribute to lists, requires prior specificpermission and/or a fee.
Publish Year: 2013
Publisher: ACM-WSDM
موضوع: یادگیری ماشین (Machine Learning)
ایران سای – مرجع علمی فنی مهندسی
حامی دانش بومی ایرانیان