People will try to break into your website’s database. Here’s how (and how to stop them).
Structured Query Language (“SQL”) is the most common way to manipulate databases.
Here’s a simple query to access all users in the state of New York:
SELECT * FROM Users WHERE location= 'NY'
Often, web developers will want to get a dynamic value; say, from a textbox:
<input type="text" name="location" />
This user-defined input will become a variable. We can then concatenate the static (hard-coded) part of the query with a dynamic value. The code becomes:
q = "SELECT * FROM Users WHERE location= '" . locationValue . "'"
Part of the query (SELECT * FROM Users WHERE Location= ‘ ) does not and cannot change. But the locationValue will of course change depending on what the user entered into the textbox.
Notice that the last character of the hard-coded portion of the query is a single quote (aka apostrophe). We’ll come back to this.
A SQL statement can be terminated with a semicolon ( ; ). A second statement can then be run immediately after. For example, these two queries would run one right after the other in a single command:
INSERT INTO Users(location) VALUES ('NY');SELECT MAX(id) AS maxID FROM Users;
Knowing that multiple queries can be run- just by adding a semicolon- a devious person could intentionally take advantage of this.
We now have a recipe for disaster:
- Hard-coded portions of a SQL statement may end in a single quote
2. It’s possible to run multiple queries when you only intended for one to be executed
The Problem
Hypothetically, you may expect a user to put in a value of NY that should be put into your query. The SQL that runs would become:
Let’s suppose instead of typing NY they put in ’; DROP Users; . Note the extra punctuation!
The user has intentionally ended the first statement by adding a “closing” single quote and semicolon. You would have unknowingly allowed your query to run two statements- where the second one would have dropped (as in, deleted) your Users table! 😦
This is called “SQL Injection”, as a user is injecting their own unintended values into your SQL statement.
The Solution: Don’t Trust Your Users
You cannot rely on user input from HTML forms. You need to ensure the input is “clean” and account for potentially malicious code.
There are two primary ways to resolve this:
- Sanitize the input
- Use a prepared statement
I’ll be giving PHP-specific examples, but the concept is the same across all programming languages.
Solution 1: Sanitize Inputs
Before using any input from the user (“as-is”), you can apply some validations.
This is a known issue, so there’s a built-in function in PHP to prevent this: mysqli_real_escape_string(). The function takes in two parameters: the database connection, and the actual value that you want to sanitize (“clean,” or make sure that it’s “safe”).
I like to do some additional validation and clean-up, so I tend to wrap this in my own function, still passing in the same two parameters:
This is for a procedural-style PHP coding; i.e. not object-oriented. I go into detail in my book Web Development for Intermediate Programmers — with PHP but basically:
There was a built-in function in PHP called get_magic_quotes_gpc() which was for this purpose. It used to be enabled by default, but now it’s now actually deprecated and should not be used. If it was enabled, then you could also use the built-in function stripslashes() to “unquote a quoted string”. Ultimately lines 3 and 4 would not be needed, but it’s included for completion.
Line 6 uses the built-in PHP function previously mentioned. Lines 7–9 perform the additional clean-up I voluntarily perform; including ensuring there aren’t any HTML tags and removing extra spaces from the user input.
We can then use the function like this:
Solution 2: Prepared Statements
Again, we’re using built-in functionality.
This time we’re separately passing in the hard-coded SQL and the variables into the code that runs the query.
This code sample is an object-oriented approach. Provided to me by Jake Cyr, here’s part of the database class I’m using:
Notice on Line 30 you’re “preparing” the statement, and adding in parameters on line 31.
I then create an array of variables that should be passed in:
I believe the prepared statement approach is considered the better practice of the two methods.
Additional Tip 📢
In the example above, the user would have dropped (deleted) your table.
If you know that you’ll never need to DROP a table in a production environment, then you can configure the production database settings such that you can’t drop a table; you can have different settings in your development environment if you needed to do this.