CVS is a version control system which is widely used in the open source community, but as well in commercial enterprises. The advantages and deficiencies of CVS are well known, which may be one of the reasons for choosing it though there are lots of other options nowadays, some offering more functionality and features.
One of CVS's shortcomings is the pure centralistic organizational model it is based on. There is one central database, the repository, and very little support for offline operations. All developers need direct TCP/IP access for almost all version control operations, and since real software projects tend to become large very quickly, the network bandwidth should not be too small. Though the internet has improved very much in this respect in recent years, fast direct access may still be a problem especially for world-wide distributed software development efforts. It would be better if the central CVS repository could be distributed on several computers local to the development teams, and changes would be distributed automatically in background by hard-working daemons.
The structure of a CVS repository makes it quite unsuited for real distribution techniques like those used in transaction-oriented database systems. In spite of this, it is possible to realize a limited variant of the ideal solution described above, if modifications to the repository are limited to certain well-defined lines of development (branches) on every server. (This is the design that has also been implemented in ClearCase's Multi-Site product.)
In a DCVS system there are n repositories whose contents are kept equal or nearly equal by background processes. The program used for this task is an extended version of the well-known and efficient CVSup developed by John D. Polstra. The combination of DCVS repository, extended CVSup server, and DCVS server program (extended CVS server) will be called DCVS server or simply server in the following paragraphs.
All contents of all development lines can be checked out from any of the n DCVS servers into a DCVS workspace owned by a developer. All operations that do not modify the repository, such as diff, patch, log, annotate etc., will work just like in CVS, but they will always use the local repository and so be much faster in a distributed scenario.
Here is an example of a small DCVS server network:
But what happens when the code in the workspace must be changed? If all DCVS servers may check-in any changes at any time, there will be conflicts soon, and chaos and data loss will result. So every DCVS server gets assigned a set of development lines (DCVS branches) for which it is responsible. Modifications to a certain line of development may only be checked-in to the server which is responsible for this branch. The separation of modifications by lines of development makes it possible to automatically transfer and distribute changes in the DCVS network. Repository synchronization is performed on the level of single RCS delta elements by the extended cvsupd processes.
You may now say: "Fine, but what do I do if I want to commit changes for a development line my local DCVS server is not responsible for?" In this case you simply create a new development line (branch) and commit the changes to it. Your local DCVS server will automatically be responsible for any newly created line of development. If these changes need to be applied to the original branch, a developer on the responsible DCVS server needs to perform a merge operation. Alternatively, you can also contact the remote server directly over the internet and commit the changes there, overriding the DCVS repository setting. But generally it will be easier to follow a change set-oriented style of development model. DCVS therefore provides the functionality of change sets which enable developers to produce small sets of changes related to a feature or a defect, which can then be applied by others.
How can we ensure that deltas on a certain line of development can be identified as belonging to a certain DCVS server? In order to understand this, a certain amount of knowledge about the RCS file format, which is used by the CVS versioning mechanism, is needed. CVS uses so-called magic branch numbers, which are revision numbers with an even number of elements and the number 0 at the last-but-one position. Let's take 1.34.0.4 as an example. Deltas belonging to this line of development will be labeled 1.34.4.1, 1.34.4.2, 1.34.4.3, etc. To separate lines of development on different servers, we must ensure that the branch numbers (4 in our example) chosen to represent the branch are different for every server. This concerns as well the creation of new branches (tags) as the creation of new RCS deltas.
In order to achieve this, DCVS assigns a unique range of branch numbers to every (server, collection) pair. All ranges for all servers and collections must be mutually exclusive. The definitions for servers, collections, and ranges are read from a single configuration file, which must be the same on all DCVS servers. By consulting the contents of this file, every DCVS server can decide if it is responsible for a certain branch or delta of a given file. If so, all modifying operations are allowed; if not, modifying operations are only possible on the appropriate remote server.
Another problem when working with distributed DCVS repositories are the actual names of configurations (tags). These must be uniquely assignable to exactly one DCVS server, too. DCVS solves this problem in a very simple fashion by extending all tags with a server-specific suffix like _at_dcvs_mydomain_org, so that no conflicts in the tag name space may arise.
Currently the core of DCVS consists of three programs: the extended CVS executable dcvs, the extended CVSup server dcvsupd, and a administration program called cvsupadm.
Central for all DCVS programs is the DCVS configuration file, which is either named dcvs_config and searched for in the conventional /etc directories, or dcvsupd.servers and looked for in the DCVSup server base directory (/usr/local/etc/dcvsup by default). Currently the default ans only supported setup is to install everything under /usr/local/dcvs, with the configuration file named /usr/local/dcvs/etc/dcvs_config.
Here is a small example for a DCVS configuration file:
cvsupd has been extended to serve only deltas and tags the local DCVS server is responsible for. (Additional ranges may be specified, too.) Together with the cvsup-Client and cron a distribution strategy based on CVSup's pull model can be realized. Future extensions will probably replace the pull model with a push model, and eventually combine (D)CVSup server and executable in one daemon program. There are also some scripts that may be useful for proper DCVS setup (see below).
DCVS is an extended CVS program (based on the CVS version contained in FreeBSD 4.6, which is based on CVS 1.12.12). All environment variables, file and directory names that tarditionally begin with CVS have been replaced by DCVS. This is necessary to ensure that DCVS workspaces and repositories are not modified by standard CVS tools, which may result in data loss. DCVS has been extended to check the server's responsibility for all operations modifying the repository. Additionally automatic extensions for configuration names (tags) have been implemented for several operations.
Both dcvs and cvsup have been extended by proper posix file locking to ensure mutual exclusion of their operations.
cvsupadm is a small administration program, which currently supports creation and editing of DCVS configurations and DCVS collections. It also supports locking of repositories against dcvs and cvsup operations. Future versions will be able to lock single collections and transfer the responsibility for a (server, collection) range from one server to another.
Several scripts have been added to facilitate the automation and administration of DCVS: