18 Critical Oversights in Web Development
Over the past years I had the opportunity to work on some interesting projects, complex in nature with an ongoing development, constantly upgrading, refactoring and adding new features to them.
This article will cover the biggest coding oversights most PHP developers make, when dealing with medium and large projects. Oversights such as not differentiating between development environments or not implementing caching and backup.
The examples below are in PHP, but the idea behind each problem is generic.
The root of these problems lies mainly in developers’ knowledge and experience, especially the lack of it. I’m not trying to bash anybody, I do not consider myself the perfect developer who knows everything, so bear with me.
In my experience we could categorize these problems in three main groups: design level, application level and database level oversights. We’ll break down each one separately.
Application Level Oversights
Developing with error reporting off
The only question I can ask is: why? Why do you not turn error reporting on when developing an application?
PHP has many levels of error reporting, and it ALL should be turned on in the development phase.
If you think errors will never occur, you are coding for the ideal scenario, which only happens in an ideal world.
Error reporting and displaying those errors are not the same either.
error_reporting()
sets the level of errors (e.g. notice, warnings, fatal errors) and display_errors
controls whether these errors will be outputted or not.
Error reporting should always be at the highest setting in development:
error_reporting(E_ALL);
andini_set('display_errors', true);
Note: E_ALL is the highest since PHP 5.4+, because E_STRICT errors became part of E_ALL in PHP 5.4. If you use an older PHP version than 5.4 use
error_reporting(E_ALL | E_STRICT);
to include strict error warnings too.Suppressing errors
Suppressing errors using the @ operator is even worse than not turning it on at all, because you’re consciously sweeping dirt under the carpet. You know the error is happening, you just want to hide it, close the task and go home early. What you don’t realize is that building something on a shaky foundation will have much bigger consequences later on.
You can read an in-depth explanation on this here.
No logging anywhere in the code
Developing a project has to happen with logging in mind from the start. You can’t just bolt on logging at the end.
Most of the developers do use logging one way or another, but almost none of them take the time to actually verify those logs for errors. What’s the point of logging if nobody looks at the logs?
PSR recommendations do exist for logging, PSR-3 to be exact, and this excellent article explains how to implement PSR-3 logging.
Not implementing caching
Caching can be done in many different ways on multiple levels in an application, such as on a server level, application level, database level, etc.
Caching, too, should be implemented from the start. You can always disable it in development, but make sure everything works once it’s pushed to a production environment.
On a server level you can use Varnish, which is a reverse HTTP proxy, it stores files in memory and it should be installed in front of the web server.
To speed up PHP, you can install/enable an opcode cache, which optimizes the compilation into byte code of PHP scripts. For PHP 5.5 and later an opcode cache is already compiled in the core called OpCache.
You can read about it in-depth in this article: SitePoint PHP – Undestanding OpCache.
Before PHP 5.5, you could use APC, which has user cache functionality too.
On an application level, you can use APCu which is the user cache extracted from APC, Yet Another Cachewhich has similar functionality as APCu, or Memcached which is a distributed caching system and it has solidPHP support. Memcached can also be used to cache database queries.
There are a couple of techniques when implementing caching in an application. A good practice is to cache data which doesn’t change very often, but is queried repeatedly.
Cache database queries heavily, because the database is always the biggest bottleneck in every PHP application.
Disregarding best practices and design patterns
How many times did you see someone implement his own password encryption algorithm? Sadly, this still happens today, because the lack of knowledge or more dangerously, because of an “I know it better” attitude.
Well, I hate to bring you the bad news, but 99% of the time you don’t know it better.
These best practices and design patterns were thought of and created for a reason by software engineers way smarter than you and me, the developer’s sole job is to pick the right pattern for the job.
There are many books and resources on this subject. I’ll mention two:
- Patterns of Enterprise Application Architecture by Martin Fowler
- PHP Objects, Patterns, and Practice by Matt Zandstra
Not using automated tests
Tests should be added for every feature of the web application, but tests are good for nothing, just like logs, if nobody is looking at them and actually running the test code to see if something breaks.
Running tests manually is a tiresome process. Fortunately, there “is an app tool for that”. In fact, there are lots of tools that can help automate your tests, a whole practice called Continuous Integration.
One such tool that widely used in the PHP community is called Jenkins, which is a CI server and can do a lot more than just test an application. Sebastian Bergmann created an excellent template for Jenkins specifically constructed to work with PHP projects.
If you find this too overwhelming, then at least write unit tests for your application using PHPUnit, Behat orPHPSpec. It may seem a lot of work at first, but it’s proven countless times that tests are helping projects in the long run.
Not reviewing / auditing code
Working in a team can be challenging, especially if every team member is used to different styles of programming, and without good specification a project can go sideways real fast.
If you’re in a team and not inspecting each others’ code, you should really do it. Just like unit tests, it helps a project stay clean and consistent.
The difference between review and audit is the time when you inspect the code. Review usually happens before any code is merged to the code base and audit after the code is merged in.
Review is a much much better thing to do, because you have the opportunity to talk about the code, suggest improvements or fixes before it gets merged with the other team members’ code.
The disadvantage of reviews is that it’s blocking development, because before every merge (after all tests are green) at least two developers need to discuss the code, and this is where audit comes into play.
Audit happens post merge, and it’s non-blocking, but it’s significantly less powerful, because it misses the opportunity of catching bugs early on.
Audit is still better than not inspecting code at all.
To help this process go as smooth as possible, you can use the tool called Phabricator, which was created specifically for this purpose by the good engineers at Facebook. It supports both code inspection strategies.
Coding for the ideal scenario
Ever find yourself in or heard about cases where some insignificant, boilerplate code was merged in and all hell broke loose? I sure did.
Most of the time this happens because developers are lazy and write code for the ideal scenario, where database fails, PHP fatal errors and server hacking are non-existent.
Code should be written with the exact opposite scenario in mind, developers should write code for the worst possible scenario that they can think of, and even then the code won’t cover some obscure corner case where the user types in a
$
sign and has instant full administrator access.
Assuming that your server won’t be hacked or your code won’t break at some point and your database will always be up and running is just wrong. Production code should cover these scenarios and log errors accordingly.
In PHP, it is so easy to commit errors without even realizing it. This is mainly because of poor language design decisions that were made in the past and not corrected in time.
PHP wants to make it easy for developers not to think about security, encodings and corner cases, where in fact developers should be very aware of this and always practice defensive programming.
Not using OOP principles correctly
Most PHP developers new to PHP are not using object oriented programming in their code, because the concept is a little bit hard to grasp at first. OOP was first used in the 1960s and constantly refined over the years, there is a ton of information about it on the Web.
Also, OOP is a lot more than just procedural code organized in classes.
The concept of objects, properties, methods, inheritance, encapsulation, etc. are all an integral part of OOP.
A developer who uses these principles correctly knows about OO design patterns, SOLID principles (Single responsibility, Open-closed, Liskov substitution, Interface segregation and Dependency inversion) and how to write clean code in general, code that is flexible, doesn’t have hard coded dependencies and is easy to extend and build upon.
Alejandro Gervasio covers these principles from top to bottom.
It’s never too late to learn about OOP and start writing clean code which doesn’t rely on hard dependencies (looking at you, PHP frameworks).
“On-the-fly” coding
What most developers do when they get yelled at “Quick, the client needs this feature up and running ASAP.”, is to hack some code together and push it directly to the live server. This is called on-the-fly coding or cowboy coding.
As in every other industry, in software development too, workflows and sane processes should be implemented in order for a project to succeed.
PHP and dynamic languages in general encourage rapid changes to the codebase, seeing the results of the modification instantly, but these changes should be limited in a production environment.
Only critical bugs should be committed and pushed directly to the production server. For the rest, a workflow should be implemented such as Github’s fork and pull request, or Gitflow. More on workflows using Git can be found here: https://www.atlassian.com/git/workflows.
Managers and clients who think these processes are unnecessary should be educated to see otherwise. I’ve never once met a client who couldn’t wait a couple of hours or a day for a little feature to go through the necessary steps in order to be deployed live.
One other thing to note, don’t confuse Continuous Delivery with cowboy coding and chaotic management. Continous delivery is exactly about implementing and optimizing the development workflow so code can be deployed to the production environment as soon as reasonably possible.
Database Level Oversights
Not differentiating between read / write queries
To support a long running complex project, scaling needs to be in the back of every developer’s mind. 99% of the time a web application doesn’t need to scale, because it won’t reach that kind of traffic.
If you know for sure that the web application will be used by many, such as an enterprise application used by hundreds of employees internally in the company, you can make the necessary steps to enable easier scaling for the project.
So why separate read / write queries?
The database is always the first bottleneck in every application. It will be the first one to fail under huge traffic. To offload traffic to multiple database servers, developers use either Master – Slave or Master – Master replication. Master – Slave is the more popular one, which says that every SELECT statement needs to be routed to the Slave database server(s), and other ones to the Master in order to balance traffic.
If your application doesn’t know the separation between read and write queries it won’t know to which database server to connect.
Keep this in mind if you know that eventually you will need to setup a Master – Slave replication scheme.
Only coding for one database connection
This strongly relates to the above oversight, but sometimes developers can have other reasons to connect to multiple databases. For example, if you keep user logs, activity streams, analytics or other data where you know the read/write operations happen often, it’s good to offload this traffic to a different database server.
Make sure you use a database library which allows you to connect to multiple database servers and it’s easy to switch between them. A good solution is to implement PDO and use Aura.SQL which extends PDO.
Not testing queries for exploits
This oversight relates to the “coding for the ideal scenario” oversight above. Same thing, different platform.
If you don’t test your database (and your application) for exploits, some hacker will, and he may succeed.
Databases are vulnerable to a whole range of exploits, the most common is SQL injection attacks.
Use this cheat sheet and run the queries through your application’s database access library. Write these statements in fields on your front-end like username, password fields on a sign up page.
If none of the queries go through, you can buy yourself a beer and celebrate.
Not adding indexes to tables
Indexes are like the TOC of a table, it’s a performance boost and should be added to every table, to the columns on which the query is performed (e.g. the columns after the WHERE clause).
There’s a whole theory behind database indexes, when to create it, on which columns and what to cover. A whole separate article series was written about that.
Not using transactions
Data integrity is very important for web applications. Whole websites could break if data is handled incorrectly.
You use transactions for related data that is handled together, either persisted or deleted together.
For example, you save data about a user such as: e-mail, username password in table 1, and profile data like first name, last name, gender age, etc. in table 2.
Now if a user wants to delete his account, this should be one operation regarding running the SQL query, using transactions. If you don’t use transactions, you risk loosing data integrity, because the operations on the data are running separately.
If deleting the data from table 1 succeeds, but fails on table 2, the profile data for the user will remain in the database and worse it won’t be connected to anything, it will be orphaned.
By using transactions this won’t happen, because the whole operation will succeed only if all the separate operations (e.g. deleting data from table 1 and table 2) in the transaction succeed, otherwise the database will roll back to the previous state.
Not securing sensitive data
Storing passwords in plain text, or rolling your own encryption algorithm in 2014 is unacceptable. The PHP community has matured enough to know better by now.
Still, there are, probably, thousands of databases out there where sensitive data is stored unencrypted begging to be stolen by hackers.
PHP 5.5 has already added strong hashing functions just for this, simply calling it Password Hashing. It’s really simple to use – you create a hash from the plain text password with this method:
$hash = password_hash( $password, PASSWORD_BCRYPT );
Note: There’s no need to salt this password, because it is already handled for you.
Store
$hash
in the database, then you verify the hash with this method:if ( password_verify( $password, $hash ) ) { ... }
Note: If you don’t have PHP 5.5 (you really should by now), you can use the password_compat library, which implements the exact same methods.
Handling financial data is much trickier, because you need to have PCI compliance on server, application and database levels. A more in-depth article is already written on the subject here: SitePoint PHP – PCI Compliance and the PHP Developer.
Application Design Oversights
Not differentiating between development environments
I saw many developers and even small teams setting up poor development environments for themselves.
For example, working on a new feature or fixing a bug and FTPing the files directly on the live website. This is wrong on so many levels.
There is an infinite number of workflows that teams can create, but the classical one for web development is to use at least three environments: development, staging and production.
A development environment can be local for each programmer, staging and production are usually remote and share some parts between them. Development is for coding, staging is for testing and finally production is for consumption.
The oversight happens when these environments are not set up the same way. For example each developer running a different version of PHP, or staging configuration differs from production.
Guess what happens? You’re right. Everything will be working in development and even in staging, and when you push it to the production server all hell breaks loose resulting in long nights and lots of caffeine.
No wonder the most common phrase in development circles is: “It works for me.”
So what’s the solution? Make sure everything is set up the same way in EVERY environment. The operating system should be the same, PHP, database, web server, all should have the same version across the environments.
Since the creation of Vagrant, Docker and VirtualBox it is very easy now to create identical environments with the same exact configuration on each one. If you haven’t used these tools before, you should stop whatever you’re doing and start using them immediately.
No backup
Everything is going well, the website is live, launched on time, everything is up and running, users consume the beautiful data. Nom, nom, nom… Until you receive an e-mail at 3AM.
Backup, just like logging, caching, security and defensive programming should be an integral part when developing a web application, but most developers (or sysadmins) forget to do this.
Backups should be automated as well, or if that’s not possible, at least a weekly manual backup should do. Any backup is better than no backup.
Store your code base in version control and use a distributed version control system like Git or Mercurial. This setup makes code bases very redundant, because every developer who’s working on the project has a version of the code base. Likewise, store the code base on Github or Bitbucket, they have backups.
Backing up the database is more important, because it’s user created content. ALWAYS store the actual data and the backup in different places.
Not backing up data can ruin businesses, and it will do that – see the famous case of Ma.gnolia, one of the better social bookmarking websites back in the day. Wired has a cover story on the whole disaster.
No monitoring
“Everything’s amazing and nobody’s happy.” – Louis C.K.
You’re not happy, because you don’t know what’s going on. Implementing an intelligent monitoring framework for your application is really important. Monitoring answers the following questions:
- Did somebody access the main application server?
- Are the servers under heavy load?
- Do we need to scale to another database server?
- Where is the application failing?
- Is it offline or not working only for me?
It is important to know the answers to these questions at any given moment, and with real-time monitoring, you will. To make this happen, tools like Nagios or New Relic should be part of your application’s infrastructure.
Conclusion
Use this knowledge to be a better programmer. Remember these oversights and try not to commit them. The application and database level oversights are the most important ones to remember.
Backup is very important, always practice defensive programming and be prepared for the worst, this is how web development works. Programming is hard, but when done right, a lot of fun.
Checklist
Below you’ll find a checklist of all the oversights found in this article. See how many can you cross off right now and always try to cross them all off.
- Is error reporting on and display errors on in development and off in production?
- Do not suppress errors in your code.
- Implement a logging framework.
- Use a caching strategy.
- Keep in mind and use programming design patterns and best practices.
- Use tests in your code and try to automate running these tests every time a change occurs in the code base.
- Review or at least audit team members’ code.
- Practice defensive programming.
- Learn and use OOP principles correctly.
- Have a solid workflow and processes for developing and deploying code.
- Differentiate between read / write database queries.
- Use a solid database library which can connect to multiple databases.
- Test SQL queries for exploits.
- Learn and use indexes on database tables
- Use database transactions.
- Secure sensitive data in the database.
- Use different coding environments: development, staging, production.
- Implement a backup and monitoring strategy.
Comments