Distributed Checksum Clearinghouses

Overview

The Distributed Checksum Clearinghouses or DCC is an anti-spam content filter that runs on a variety of operating systems. The counts can be used by SMTP servers and mail user agents to detect and reject or filter spam or unsolicited bulk mail. DCC servers exchange or "flood" common checksums. The checksums include values that are constant across common variations in bulk messages, including "personalizations."

There are graphs of recently detected spam. Those graphs suggest the effectiveness of the system. For example, if you assume that 80% of all mail is spam and those graphs indicate that DCC finds 70% of mail is spam, then DCC detects 88% of spam.

DCC graphs DCC graphs
click for more graphs

The idea of DCC is that if mail recipients could compare the mail they receive, they could recognize unsolicited bulk mail. A DCC server totals reports of checksums of messages from clients and answers queries about the total counts for checksums of mail messages. A DCC client reports the checksums for a mail message to a server and is told the total number of recipients of mail with each checksum. If one of the totals is higher than a threshold set by the client and according to local whitelists the message is unsolicited, the DCC client can log, discard, or reject the message.

Because simplistic checksums of spam would not be effective, the main DCC checksums are fuzzy and ignore aspects of messages. The fuzzy checksums are changed as spam evolves. Since DCC started being used in late 2000, the fuzzy checksums have been modified several times.

Unless used with isolated DCC servers and so losing much of its power, DCC causes some additional network traffic. However, the client-server interaction for a mail message consists of exchanging a single pair of UDP/IP datagrams of about 150 bytes. That is often less than the several pairs of UDP/IP datagrams required for a single DNS query. SMTP servers make DNS queries to check the envelope Mail_From value and often several more. As with the Domain Name System, DCC servers should be placed near active clients to reduce DCC network costs. DCC servers exchange or flood reports of checksums, but only the checksums of bulk mail.

Listings and Removals

Do not send comments or questions about your "DCC listing" to any address at Rhyolite Software unless an SMTP server operated by Rhyolite Software LLC rejected your mail. Contact instead the operators of the system that rejected your mail.

DCC does not "list" domain names or IP addresses, but detects bulk mail messages. Domain names, IP addresses, and so forth are "listed" independently. by DCC users. If DCC users want to receive your bulk mail, they must whitelist it by adding your IP address, SMTP envelope sender, RFC 2369 SMTP List-* headers, or other characteristics of your mail to their whiteclnt files. Do not send "please remove my address" requests unless you want your domain name, mailbox, or IP address added to a blacklist.

A separate facility called DCC Reputations automatically computes the reputations for sending bulk mail. However, it makes no sense to ask for IP addresses to be removed from the distributed DCC Reputation database. A reputation for sending lots of bulk mail expires automatically a week to 30 days after the last bulk email reported by a DCC Reputation client mail system.

Spam is unsolicited bulk mail, and only mail targets can say whether a message is solicited. A virtue of DCC and DCC Reputations spam filtering is that mail targets decide whether they have subscribed to bulk mail or want to hear from senders with DCC Reputations for sending bulk mail. The opinions of bulk mail senders about whether their messages are spam are irrelevant.

Download

The current version of the DCC source is version 2.3.168, April 24, 2021. It is available at dcc-servers.net. It is usually best to update an existing installation with the /var/dcc/libexec/updatedcc script. Some previous versions are available.

License

The DCC software is distributed under a license that is free only to organizations that participate in the global DCC network. ISPs that use DCC to filter mail for their own users are intended to be covered by the license. You can redistribute unchanged copies of the DCC source, but you may not redistribute modified, "fixed," or "improved" versions of the source or binaries. You also can't call it your own or blame anyone for the results of using it.

Selling the bandwidth and, most important, human system administration work of the public DCC servers to third parties has always been wrong. Sellers of products, "appliances," or managed mail services must contract for or provide their own DCC servers.

DCC Client Problems

Incorrectly configured firewalls are the a common causes of problems of DCC client using the public DCC servers. Your firewalls must allow responses to requests from dccproc or dccifd on your system to come from UDP port 6277 at the public servers.

Excessive requests are another common cause. The public DCC servers have various defenses against DoS attacks including rate limiting or delaying responses based on the maximum of the requests made today and a recent daily average. When the delays would reach 4 seconds, the public servers completely ignore additional requests. If your mail system processes more than 100,000 messages per day, you should use your own, probably private DCC server connected to the global network of DCC servers.

If the public DCC servers not working for you, your firewalls allow UDP port 6277, and you are not sending an excessive number of requests, then the cause might be excessive or objectionable DCC operations that have been received from your network. See the blacklist of DCC clients used by the public DCC servers.

Documentation and Source

Each of the several parts of DCC have its own man page including:

There are also

The code seems to be compatible with flavors of UNIX-like systems. See the list of systems in the installation instructions.

Operational DCC Services

A useful anti-spam scheme is more than just code, and that is particularly true of the Distributed Checksum Clearinghouses, DCC, which are based sharing information about bulk mail If you do not run your own DCC server, you need to point your DCC client to someone else's server. The DCC client code does the right thing when it cannot contact any of the servers it knows about; it quickly passes the mail without worrying about its bulkiness. Given more than one server, the DCC client code uses the fastest or closest.

When using someone else's server, you must either contact them for a DCC client-ID and corresponding password.

Public DCC servers for anonymous DCC clients handling fewer than 100,000 mail messages per day are provided by people and organizations in the following list. The default contents of /var/dcc/map file point to these servers.

Organization Contact
Uptempo Marketing Corp Sven Willenberger
www.eatserver.nl dcc@eatserver.nl
Etherboy.com Dave Lugo
INFN (National Institute for Nuclear Physics) - Bari Domenico Diacono
INFN (National Institute for Nuclear Physics) - Turin Alberto D'Ambrosio
MGT Consulting --
INAF IASF (National Institute for Astrophysics)-Palermo-Italy Giacomo Fazio
Peregrine Computer Consultants Corporation Kevin A. McGrail
Sonic,net, Inc. Kelsey Cummings
Vienna University of Economics and Business Administration Franz Schaefer
Ratiokontakt GmbH Technik
Wikstrom Telephone Company Richard Laager
Nova53 DCC

The IP addresses of the public DCC servers define the DNS names dcc1.dcc-servers.net, dcc2.dcc-servers.net, dcc3.dcc-servers.net, dcc4.dcc-servers.net, and dcc5.dcc-servers.net. Use them by adding those names to your /var/dcc/map file with cdcc "add dcc1.dcc-servers.net" and so forth. The names are automatically installed when the DCC programs are installed with the ./configure script and Makefile in the source. See the installation instructions.

Note well that it has been wrong to take and resell the bandwidth and, most important, human system administration work of the public DCC servers to third parties. Blunt words for that include theft and stealing. Vendors of "spam appliances" or services including DCC such as "managed email" must provide DCC servers of their own or contract for DCC services from others.

Flooding Checksums among Private DCC servers

The effectiveness of DCC filtering increases with checksums "flooded" or exchanged with other DCC servers. The spam filtering results of violating the free license by not connecting a local, private server to the global network of DCC servers may be disappointing.

Mail systems that handle more than 100,000 mail messages per day should have a local DCC server so that processing incoming mail is not delayed by the time required for the UDP packets used by the DCC client protocol to cross the Internet. Organizations that deal with more than 500,000 mail messages per day benefit from two or more local DCC servers to ensure that at least one local DCC server is available despite system maintenance. Organizations that deal with fewer than 100,000 mail messages per day use less bandwidth of their own and of the servers in the global network by using the public servers.

The first step in configuring a DCC server to flood checksums is agreeing on the server-IDs of all participating servers. There is a private list of the DCC servers, server-IDs and so forth in the global network of DCC servers at https://www.rhyolite.com/dcc/private/. It is readable only by server operators. Contact Vernon Schryver at vjs@rhyolite.com for server-IDs.

Other Resources

Whitelists
Use of DCC to reject unsolicited bulk mail generally requires a whitelist of solicited bulk mail sources the local common /var/dcc/whiteclnt or /var/dcc/whitecommon files or a per-user whiteclnt file.

Blacklists
Blacklists can be used as "spam traps" to feed DCC. For example, sendmail can use an "access_db" to mark spam, and then report it via dccm.

DNS Blacklists
The DCC clients, dccm, dccifd, and dccproc can check domain names and IP addresses in SMTP envelope Mail_From values and in URLs in mail message bodies against DNS blacklists (DNSBL) such as the SBL. See the installation instructions and DNSBL_ARGS in the configuration file, dcc_conf, in the DCC home directory.

Greylisting
The DCC sendmail milter, dccm, and the dccifd general MTA interfaces can use a form of greylisting.

CGI Demonstration
There is a demonstration of the proof of concept CGI scripts that allow users to maintain individual whitelists and monitor individual logs of rejected mail at https://www.rhyolite.com/dcc-demo-cgi-bin/ or https://cgi-demo:cgi-demo@www.rhyolite.com/dcc-demo-cgi-bin/. It requires a user name of cgi-demo and a password of cgi-demo the same as the user name.

DCC Reputations

DCC Reputations are a distinct mechanism based on and contributing to DCC data.

History

DCC is based on an idea of Paul Vixie and on fuzzy body matching to reject spam on a corporate firewall operated by Vernon Schryver starting in 1997. The DCC software was designed and written at Rhyolite Software starting in 2000. It has been used in production since the winter of 2000/2001.

Contact Vernon Schryver at vjs@rhyolite.com or use the form.

$Date: 2023/10/09 19:25:15 $