Grob, L. (2019):

Detection of unsolicited mails based on similarity in structural traits

Email is one of the easiest and most popular means for attackers to distribute malware or to lure unsuspecting victims into giving away sensitive information. Even when the attacks are unsuccessful and no harm is done, these mails, together with common spam, are a major annoyance for both, business and private users. Common countermeasures to these threats include spam filters and anti-virus software, both of which heavily rely on traits found in the mails' contents or attachments. Attackers can evade spam filters by carefully designing their mails in an inconspicuous way, and anti-virus software often does not protect against new samples of malware. However, attackers often reuse their templates and tools to generate these mails, resulting in structurally similar mails compared to prior attacks. This thesis therefore explores a way to detect unsolicited mails regardless of their contents, but solely based on similarity in the mails' structural traits, and evaluates which machine learning classifiers are suitable for this task. Since this requires labeled training data, this thesis also proposes a solution how the necessary training data can be collected and labeled automatically. E-Mail ist für Angreifer einer der einfachsten und beliebtesten Wege Schadsoftware zu verteilen oder Opfer dazu verleiten vertrauliche Informationen preiszugeben. Selbst in Fällen in denen diese Angriffe erfolglos bleiben und keinen Schaden anrichten, sind diese Mails ein großer Störfaktor für Endanwender und Unternehmen. Gewöhnliche Abwehrmaßnahmen umfassen Spamfilter und Antivirensoftware, die beide stark auf Merkmale vertrauen, die im Inhalt oder Anhang der E-Mails zu finden sind. Spamfilter können leicht umgangen werden indem Angreifer ihre Mails unauffällig gestalten und Antivirensoftware bietet oft keinen verlässlichen Schutz gegen bisher unbekannte Schadsoftware. Beim Erstellen ihrer Mails bedienen sich Angreifer aber häufig der gleichen Programme und Vorlagen, wodurch Mails neuer Angriffe häufig eine ähnliche Struktur zu vorhergegangenen Angriffen aufweisen. Diese Arbeit untersucht deshalb Wege wie sich die Erkennung unerwünschter E-Mails anhand von Strukturmerkmalen verbessern lässt, ohne auf Inhaltsmerkmale zu vertrauen, und untersucht welche Klassifikatoren aus dem Bereich des maschinellen Lernens dafür geeignet sind. Da ein solcher Klassifikator vorgelabelte Trainingsdaten benötigt, wird in dieser Arbeit außerdem ein System vorgeschlagen, das die benötigten Daten automatisch sammeln und labeln kann.

PDF Version (1,959.5 KB)

Funktionen

Grob, L. (2019):

Detection of unsolicited mails based on similarity in structural traits