Book Excerpt

Antispam approaches

About the book


For many companies and individuals, spam is an annoyance and undesired expense. This series excerpt from Privacy: What Developers and IT Professionals Should Know offers advice on what we can do to fight spam, how we can protecting legitimate e-mail and develop e-mail-friendly solutions.

Author J.C. CANNON, privacy strategist at Microsoft's Corporate Privacy Group, specializes in implementing application technologies that maximize consumer control over privacy, and enable developers to create privacy-aware applications. Cannon organized Microsoft's Privacy Response Center, an automated resource for tracking privacy issues throughout Microsoft. He works closely with Microsoft product groups and external developers to help them build privacy into applications. He also contributed the chapter on privacy to Michael Howard's Writing Secure Code. Cannon has spent nearly twenty-five years in software development.

Sample Chapter is provided courtesy of Addison Wesley Professional.

This section looks at several approaches that have been taken to combat spam. Most of these are techniques that have been incorporated into tools and larger applications. The last two are approaches with which many of us could become more involved:
  • Accept list - This is a list of e-mail addresses or domains that are determined to be trusted. This list is built over time as the user determines which e-mails are spam and which ones are legitimate. The drawback to the approach is that spammers often use fake e-mail addresses to evade being identified by these lists. For example, I often get e-mails that have my e-mail address as the sender. This approach also requires constant interaction from the user.
  • Block list - This is a list of e-mail addresses or domains that are determined to be responsible for sending spam. This list is built over time as the user determines which e-mails are spam and which ones are legitimate. The drawback to the approach is that spammers usually use fake e-mail addresses and domain names and often change them to evade being identified by these lists. This approach also requires constant interaction from the user.
  • Challenge-response - This technique sends an e-mail to the originator of an e-mail asking the originator to validate the e-mail by answering a question or typing in a sequence of numbers and letters displayed in an image that cannot be easily read by a computer. This method easily catches spam sent by automated systems where no one is monitoring received e-mails. Unfortunately, this can include legitimate automated response systems from which you may receive an e-mail as the result of an online purchase or a subscription to an online newsletter. This technique can also be an annoyance because e-mails are delayed by a request being sent to the originator asking for validation.
  • Keyword-search - This approach looks for certain words or a combination of words in the subject line or body of an e-mail. For example, an e-mail that promotes organ enlargements or Viagra would be considered spam. Using a keyword search to validate e-mails for children may be fine. However, many of the words in a keyword search could be part of legitimate e-mails. Moreover, many spammers use clever misspellings to get around these types of filters. Search rules are not case sensitive (so SEX, Sex, and sex as subject words would all be detected). Misspelling and punctuation in the middle of a spam word defeats keyword search spam detectors. Spammers also add additional white space or invisible characters between letters in a word to avoid these filters.
  • Hashing - With hashing, the contents of a known piece of spam is hashed and stored. Each received e-mail is then hashed and if the hash matches any of the stored hash values for spam, it is rejected. Although this technique is quite accurate at rejecting known spam, it requires additional computing power to process each e-mail, and it is not very effective against most spammers. Many spammers modify their e-mails by adding a random phrase at the beginning or end of an e-mail, which renders hashing useless.
  • Header analysis - Each e-mail that is sent across the Internet has a header associated with it that contains routing information. This routing information can be analyzed to determine whether it has the wrong format, because many spammers try to hide their tracks by placing invalid information in the header. For example, the from-host field of one line may not match the by-host field of a previous line. Although this may indicate spam, it could also indicate a misconfigured e-mail server. Equally, a well-formed header doesn't necessarily mean that an e-mail is not from a spammer.
  • Reverse DNS lookup - This approach validates the domain name of the originator of an e-mail by performing a Domain Name System (DNS) lookup using the IP address of the originator. The domain name that is returned from the lookup request is compared against the domain of the sender to see whether they match. If there is no match, this e-mail is considered spam. Although this can be effective in many cases, some companies do not have their DNS information set up properly, causing their e-mail to be interpreted as spam. This happens often enough to be a problem. That, combined with the performance hit for doing this, makes this solution less than optimum. To perform your own DNS lookup, go to http://remote.12dt.com/rns/.
  • Image processing- Many advertising e-mails contain images of products or pornographic material. These images usually have a link associated with them so the recipient of the e-mail can click it to obtain more information about the product or service being advertised. Images can also contain a Web bug used to validate an e-mail address. Blocking these images can protect children from harmful images. Some spam tools flag e-mails with images, especially if they are associated with a link, and block them from the inbox. Some sophisticated tools can perform a keyword search of images and reject an e-mail based on the results.
  • Heuristics - This technique looks at various properties of an e-mail to determine whether collectively enough evidence exists to suggest that a piece of e-mail is spam. Using this approach, several of the techniques previously mentioned, such as header analysis and reverse DNS lookup, are combined and a judgment made based on the results. Although this approach is more accurate than any of the approaches used individually, it is still not foolproof and requires a lot of tweaking to compensate for new evasion techniques that spammers deploy.
  • Bayesian filter - This filtering technique is one of the cleverest and most effective means for combating spam. It is a self-learning mechanism that can continue to outwit spammers during its lifetime. It works by taking the top tokens from legitimate e-mails and spam e-mails and placing them in a weighted list. Tokens are words, numbers, and other data that might be found in an e-mail. Fifteen tokens are considered to be the optimum number of tokens to use. Too few tokens and you get false hits because the few tokens will exist in good and bad e-mail. Selecting too many tokens results in more tokens appearing in good and bad e-mail.
    • Suppose, for example, that you are a doctor. It may be common for you to receive e-mail with the words breast and Viagra in them. However, the words examination, patient, x-ray, and results should be more common for your legitimate e-mails than spam. These words would become tokens for the legitimate list, and spam-related tokens would go in the other list.
    • You can see how this technique would be more effective on the client than at the server. Deploying this at the server will result in a more generic set of tokens than tokens that are customized for the type of e-mail that each individual would receive. Looking at the previous example, the tokens for the doctor would probably not appear in the legitimate e-mail list because the majority of the e-mails being received by the e-mail server probably won't be for a doctor, or certainly not for the same type of doctor.
  • Payment at risk - This is an idea that was presented at the World Economic Forum in Davos. It would charge the sender of e-mail a small amount of money each time one of the sender's e-mails was rejected as spam. Although this may be worrisome for senders of legitimate bulk e-mail, it should not be a problem if they are using an opt-in model for determining who is sent e-mails.
  • Honeypots - Some spammers use open relays on the Internet to send their spam on to its final destination, thus hiding their own identity. A honeypot is a service that simulates the services of an open relay to attract spammers and detect their identity. Deploying these can help fight spam, but could also make you a target. There have been cases where companies that deployed honeypots suffered denial-of-service attacks from spammers attempting to seek retribution. Operators of honeypots can also risk litigation by interfering with Internet communications. Funding one may be better.
  • Legislation - Legislation such as the Controlling the Assault of Non-Solicited Pornography and Marketing (CAN-SPAM) Act has made great strides in stopping U.S.-based spammers from sending out spam. The European Union's E-Privacy Directive Proposal also seeks to stop spammers. Support of these types of legislation can do a lot for national spam control and will hopefully encourage other nations to pass similar laws.

Challenge-response for account creation

Several ISPs, such as MSN, AOL, and Yahoo, have implemented challenge-response systems for the creation of new accounts to thwart spammers who use automated programs to create new e-mail accounts from which to send new spam. In typical challenge-response systems, the user is presented with a blurred image and asked to enter the characters displayed in it using the keyboard to complete the creation of a new account. This represents a major barrier to spammers who use automated account-creation systems. EarthLink has even extended this feature to force e-mail senders to respond to a challenge e-mail before their initial e-mail is delivered to the addressee.

A variation of this idea proposes to send a response to each sender of an e-mail to force the sender to perform a simple operation that will use up the resources of the originating e-mail server. Although this is not of any consequence to a sender of a few e-mails, this would heavily impact a company that sends millions of e-mails.

Client-side antispam solutions

Client-side e-mail solutions are features that come with an e-mail client such as Outlook, Netscape, or Eudora. ISPs such as MSN, Yahoo, and AOL also provide antispam features for their client software. These features usually consist of filters that check incoming e-mail and block it based on various criteria. Many of these filtering techniques were described in the previous section.

E-mails that are filtered may be placed into a spam folder, deleted-items folder, a specified folder, or just deleted. One of the problems with these filters is they can inadvertently filter out valid e-mails. Suppose, for example, that you have a filter that routes e-mails to a spam folder based on obscene words. After setting up the e-mail filter, you may receive an e-mail from your doctor about breast cancer. This e-mail could be filtered out of your inbox as spam. For this reason, some e-mail clients permit the user to flag e-mails that have been routed to a spam folder as legitimate e-mails. This flagging tells the filter utility to accept e-mails from specific e-mail addresses or domains. The utility remembers the user's selection and uses the information to filter successive e-mails that arrive at the client. However, this can be a bit tedious. Some more advanced filters automatically place the e-mail addresses of contacts and sent e-mails on the list of acceptable e-mails, relieving users of this burden.

Microsoft Outlook 2003 and MSN software both block images by default. Images that are embedded in e-mails may contain Web beacons that can be used by spammers to validate e-mail addresses. For users who have enabled the preview feature of their e-mail client, these Web beacons can be activated without reading the e-mail.

Peer-to-peer software such as Cloudmark permits users to mark e-mail as spam. Information about the marked e-mails goes to the other members in the peer-to-peer network to block the e-mails from other members' inboxes. This permits everyone in the peer-to-peer network to benefit from spam detection by any of the members.

The company Cobionmakes Windows, Linux, and Solaris-based e-mail filtering software. Their Web filter software controls which Web sites employees can visit based on the employee's role, the Web site's address, and the Web site's content. Their e-mail filter can control e-mail entering or leaving an enterprise. The e-mail filter makes use of acceptance and rejection lists. They also filter on domain name, subject, body content, and the content of attachments. The software is also able to scan an image file to determine whether it contains restricted text.

Spam and infected attachments

Undesired attachments that often accompany e-mail are not considered spam. However, when they contain viruses, they can be more harmful than the spam that delivered it. One thing that makes malicious attachments insidious is the fact that they can come from people you know who were previously infected by the same software virus. Using an antivirus application such as the ones that are made by McAfee, Symantec, or Computer Associates can help protect your computer and data from harm. Following are some guidelines that can help protect you against viruses:
  • Don't open attachments from unknown e-mail addresses.
  • Validate that attachments sent by friends were actually sent by them.
  • Use antivirus software to scan e-mail attachments.

Next section: Server-side antispam solutions

Index Page

This was first published in April 2005

There are Comments. Add yours.

 
TIP: Want to include a code block in your comment? Use <pre> or <code> tags around the desired text. Ex: <code>insert code</code>

REGISTER or login:

Forgot Password?
By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy
Sort by: OldestNewest

Forgot Password?

No problem! Submit your e-mail address below. We'll send you an email containing your password.

Your password has been sent to: