Version Control System for Your PHP Projects

What are Version Control Systems (VCS)?

Since not every developer is familiar with version control systems as they should be, I would like to start by explaining what are these systems, so everybody is on the same page.
Version control systems (VCS) are systems that can keep track of the multiple versions of the files that make part of a project. They are also know revision control systems or source control systems.
The main goal of these systems is to keep an archive of the changes done to each project file, so it is possible to:
  1. Recover any past version of file
  2. Compare two versions of a file and see the actual changes done on the file contents
  3. See which developer was responsible for each change
  4. Undo inconvenient changes by reverting to past versions

Why Using Version Control System is Important for All Developers?

Looking above to what you can do using a VCS, it should be clear why these systems are important for all developers. Still, let me emphasize some aspects.
When using a VCS, your project files will be hosted in a repository. If you need to make a change to any of the files, you checkout the files, make the changes to any files you need, test the changes, commit the changes back to repository, so new versions of the changed files are added there.
If you do not use a VCS you will have an hard time managing the changes you do to a project, especially when it has already many files.
For instance, when you implement a new feature in a project, it is good that you review all your changes before you consider your work done. Very often you add debugging code that you need to remove when you finish.
Using a VCS it is very easy to see which files were changed and what exactly were the changes, so you can figure what is really the new code and what debugging code does not belong in the final changed version.
When you are in a project developed by multiple people you can easily see who has done each of the changes, so you can ask them to explain the changes they have done, in case you have questions.
If a change applied to the files of project caused problems, you can either fix the problems or revert the files to the state they were before the change, without having to figure manually the changes done in the files. The VCS can do that for you very quickly.
Another aspect is that repositories may be hosted in the local machine of a developer or in a remote machine. Keeping it in a remote machine makes it more convenient to manage a project developed by several people that collaborate and may make changes to the project files at the same time.

What types of files can be managed by a Version Control System?

Project files are mostly text files. Any VCS can manage text files as well binary files like for instance graphical image files.
The only aspect that matters to know about the distinction between binary and text files is that a VCS does not really keep integral copies of each version of each file. It just keeps track of the changes. This way it saves a lot of disk space, as the amount of data that changes between two revisions of a file tends to be very small.
Just keep in mind that VCS are usually not very smart in figuring the changes between two binary files because their contents change a lot. This means that when you commit changes done to binary files, the VCS may end up storing a lot of data to add a new revision of a binary file.
In practice, binary files not only may end up taking a lot of space, but it may also take a lot more time to send the changes done in binary files, especially if the repository is hosted in a remote machine.
I have seen people using VCS repositories as a sort of backup storage for files that probably do not need to be there, like for instance ZIP files of other projects. Be wise, avoid doing storing needless files in VCS repositories.

Using a Version Control System to Manage a Database Schema

One less common use of VCS is to manage a database schema. Databases are a very important storage support for most PHP applications.
Database are managed by external systems, so the traditional way to manage the schema of a database is to use the tools that come with those systems.
However, database schemata tends to evolve with the applications. New project versions may need schema changes, so schema versions need to be inline with the project versions.
Being aware of this need, in 1999 I have developed a database independent abstraction layer for PHP named Metabase that among other things allowed you to define your database schema as a text file in a XML format.
You just define your schema tables, fields, indexes and so on, in a simple XML format. Metabase takes care of generating the necessary SQL to create all the tables to install the database regardless of the database type.
When you change your database schema, just change the schema XML file. Metabase compares the old and the new schema versions and generates the necessary SQL statements to alter your database without affecting the data that may have already been inserted in the tables before by your application.
Since you have your database schema definition in a XML file, you can put that file in your project VCS like any other project file. If you need to go back to a past version and revert any changes, Metabase is able to make the necessary schema alterations safely.
You do not need to remember the exact syntax of the SQL that needs to be executed to make the schema alterations because Metabase does this for you.
If you request a schema change that it is not allowed by the database system, Metabase may prevent your application to do it before it even starts to make any database changes.
Metabase is smart enough to perform the changes by the right order when you make multiple changes to your database schema at the same time. Doing it by the wrong order could destroy your database data irreversibly.
Additionally I also have been developing a ORM (Object-Relational Mapping) tool since 2002 namedMetastorage. It can generate classes of objects to access the data in your application databases using objects.
It just takes the definition of your application objects from a XML file that you create and it generates the necessary code to access the database, as well the database schema definition files in Metabase XML format to store your application objects.
As Metastorage component class definitions are stored in files, they can also be kept in a VCS repository.
I have been using both Metastorage and Metabase for many years with great productivity increase levels. Both these projects are Open Source and are actively developed, despite they have not had many changes lately because they are already very mature.
There is an alternative method to integrate database schema changes in your project files that is used by some frameworks. It is called migrations. This method requires you to write SQL statements or some code in a similar language to define how to upgrade or downgrade database schemata between different revisions.
In my opinion migrations are a less productive way for developers to manage schema changes precisely because you need to learn how use the necessary SQL or equivalent statements to make the schema changes and you are responsible to do it by the right order.
Anyway, using database schema files or migrations are both good methods that allow you to manage schemata of your projects and keep it all under a VCS.

How can You Pick Version Control System for Your PHP Projects?

PHP projects are not different from any other projects. So, the reasoning to pick a VCS for PHP development is the same for any development in any other language.
There are several types of VCS. I will just mention a few that are or were more popular among PHP developers over time. Take a look at the features so you can understand what VCS PHP developers are or were using.
  1. CVS: It is one of the oldest VCS used by many PHP developers in the past. It was based on RCS, a simpler VCS that was meant to manage projects of files hosted on repositories accessible using the local file system.
    CVS added support to host remote repositories that could be accessed using a custom protocol. Repositories could be accessed concurrently by different developers from distinct machines.
  2. SubVersion: It is a project started by the Apache foundation that allows accessing remote repositories using HTTP rather than a custom protocol. This was an evolution because it allowed developers to access remote repositories without being blocked on corporate firewalls.
  3. Git: It was created by Linus Torvalds and many other collaborators mainly to manage the source of the Linux kernel. They previously used a proprietary VCS named Bitkeeper that that they wanted to move away from.
    Git not only allows accessing remote repositories via HTTP, but it also allows developers to keep copies of the whole repository, so when they commit changes, they do not need to have access to a central remote repository. This is useful when you need to develop a project and you do not have Internet access at the time you want to commit your changes to a project that hosted on repository on the Internet.
  4. Other: There are other version control systems but they are not as popular, so I will just mention a couple very briefly. Mercurial is very similar to Git. Visual Source Safe is a VCS system developed by Microsoft, so it is popular among developers of projects using Microsoft tools.

Top Version Control Systems used by PHP developers

The PHP project itself has been using CVS, SubVersion and Git throughout the years. It moved to Git lately because of the convenience of its features.
The PHP project is mostly written in the C language but it has all sorts of files in other languages. Even the main PHP site has its source hosted in a VCS repository. That site includes non-text files like images and other types of binary files.
As for the developers that use the PHP language for their projects, over time they tended to follow the same pattern of usage of the types of VCS.
Given that fact, you may understand now why Git is by far much more popular nowadays. This is illustrated by the ranking table below. Let me first clarify the origin of the figures.
Since 2011 the PHPClasses site allows authors that publish their classes in the site to import them from any VCS using either CVS, SubVersion or Git. The table below shows the number of packages that were imported from each type of VCS repository.
Since some authors submitted more than one package, there is a column that also shows the number of authors that have imported VCS systems.
RankSystemPackagesAuthors
1Git16281.4%10381.7%
2SubVersion2512.6%2116.6%
3CVS84.0%21.6%
Total199126
Personally I use CVS for my own projects. I started using CVS in 1997 and I got used so much to it that I need to have a good motivation to switch.
I have already considered to switch to Git but it seems it does not keep track some information that CVS keeps track. One of the types of information Git does not seem to keep track is file descriptions. File descriptions are useful to document your project. They help to remember what is the role of each file.
Another thing that Git does not keep track is file version numbers. Gits keeps track of files by a SHA1 hash of the file contents. That does not help me figuring if a certain file is older or newer than a different version of a file.
CVS also supports useful template marks that expand automatically inside the files. For instance, I can insert this mark on a PHP source comment
@(#) $Id: $
and it automatically expands to this information which gets updated on every revision
@(#) $Id: oauth_client.php,v 1.58 2013/04/11 09:33:16 mlemos Exp $
Anyway most of my projects are developed only by myself. For more collaborative projects, Git would be more convenient, as multiple developers can work independently and commit changes any time without having a connection to a central repository.
Anyway, unfortunately the majority of the authors that submit packages to the PHP Classes site does not use a VCS.
Below you may find a table that shows the number of packages approved in the PHP Classes site since January 1st 2012. As you may see only a small percentage was imported from VCS repository.
Even from the packages nominated for the PHP Programming Innovation Award, which tend to be developed by more experienced PHP developers, only a minority was imported from a VCS, despite it is a higher percentage than the rest of the contributing authors.
PackagesUsing a VCS
Published6478913.8%
Nominated1062523.6%

Conclusions

As you may have seen from the above, the great majority of the PHP developers are not using a VCS to manage their projects.
So, if you are one of those developers, it is never too late to learn how to use a VCS to manage your projects. Start using a VCS today!
There are many sites that can help you setting your own repositories if you have difficulties to do it all by yourself.
But if you want to keep your projects private, you can also host your repositories in your local machine. It is not required to host a repository in a public site unless you want to share your work with other developers.
As for developers submitting packages to the PHP Classes site, it is so much easier and faster to submit your packages if you can import the packages from remote VCS repository because it only requires a few clicks to import all package files at once.
The alternative method, which to upload one file at a time, is much more file painful, especially if you have many files in your packages.
Talking about importing packages from VCS repositories, I would like to clarify one common misunderstanding. Once in a while I see some developers claiming that PHP Classes is dead because GitHub is so much better.
Let me clarify that PHP Classes never intended to compete with GitHub nor any project hosting site like Sourceforge, Google Code, CodePlex, Gitorious, Bitbucket, etc.. Those are all fine sites to host the development of your projects.
The purpose of PHP Classes is not to offer project hosting. The purpose of PHP Classes is to give PHP projects much more visibility without any marketing effort on the site on the part of the author of the packages.
When an author publishes packages on the PHP Classes site, over 300,000 developers are notified by email. If they find the packages interesting, they come to the site and download it.
If your packages are innovative, you can be nominated for the PHP Programming Innovation Award organized by the PHP Classes site since 2004. This way you can even get more exposure and recognition to your work.
Project hosting sites are fine but they do not focus on giving your projects this kind of visibility and recognition because it takes a lot of manual work to moderate and nominate projects in the personalized way that is done on the PHP Classes site. If you want to publicize your project by just publishing it on a project hosting site, you need to do it yourself.
As you may have noticed by the figures above, many packages published on the PHP Classes site were imported from VCS repositories from project hosting sites. The authors of those packages imported them to PHP Classes to benefit from the additional exposure they get here.
The bottom line is that PHP Classes is just a complementary site to project hosting sites. Using a project hosting site is recommended but if you want greater visibility to your PHP classes, you can get that from the PHP Classes site.
To conclude this article, I would like to ask you, if you are not using a VCS to manage your projects, feel free to post a comment to this article explaining why not, or even if you would like to comment anything else about this article.

Comments

Popular posts from this blog

PHP / SQL Security – The Big Picture

PHP / SQL Security – SQL Commands and Non-String Variables

Top 50 Web Hacking Techniques