Sudoku solving using pure SQL
A few years back, when I started using Java, I wrote an object-oriented Sudoku solver that uses a human like approach to solve the puzzles. I used the Singles and Hidden Singles algorithm that Angus Johnson used in his Simple Sudoku program. It was great way to get to know Java and understand the principles of object-oriented programming.
Now that I’ve been professionally using Microsoft SQL for a couple of years, I thought about using SQL to solve Sudoku puzzles. It would be a great challenge to test my SQL skill.
To make it a little harder on myself, the database (or should I call it a program?) should adhere to the following requirements:
- Pure SQL only. No functions, stored procedures, triggers, cursors, loops, variables or modules. Using regular tables and views only
- Very simplistic table and view structure
- Everything has to be dynamic
- Human approach to solving the Sudoku puzzle
For the impatient: A link to the database dump can be found at the end of this blog post. Please continue reading on to find out how I approached the challenge.
Lets start out with the tables. The first table is the source table. This table contains the original puzzle.
The second table to be created is the solution table. This table is used to store all intermediate results and the final result.
The last three tables are reference tables containing the rows, columns and the possible numbers for each field. I’ve inserted the values 1 to 9 in each of the tables, because the Sudoku puzzle I’m using has 9 rows, 9 columns and you can use the digits 1 to 9 in each field.
With all the tables ready, it’s time to create a couple of views that help with solving the puzzle.
The first view to create returns the original Sudoku, but adds the candidates (values) 1 to 9 on each blank field. This is what a regular person would do when solving a Sudoku puzzle.
Using this view, we can load the puzzle into the solution database:
For the sake of convenience, I’ve added a view that displays the already solved fields from the solution table:
The Singles algorithm states that, for instance, if there is the number 2 on the first row and the third column, it is impossible to use the same number in the first row, the third column or the top-left box. By using this algorithm, we can delete these candidates based on the already solved fields.
There are four views to select the candidates to remove, one to scan the rows, one to scan the columns, one to scan the boxes and one to combine all three. Each view joins the solved_fields view to the solution table on the same row/column/box, the value and it excludes its own field.
Now that we know which candidates are impossible, we can delete them from the intermediate solution. We do have to perform the delete query multiple times, because each time we remove candidates, we solve new fields. And each time we solve new fields, we can remove more candidates.
Now that we have solved the puzzle, we can use the solution table to view the final result. To make everything a bit more human friendly, I’ve added the following view:
This was somewhat of a difficult decision. I didn’t want to use a proprietary pivot, so I used an aggregation. The problem is that the columns and rows are not dynamic. I haven’t adhered to my (own) requirements, but I guess I can let this one slip. Unless someone knows a solution that isn’t tied to one particular DBMS and doesn’t use dynamic SQL?
I can’t guarantee that any of the scripts above work, as I had to modify the PostgreSQL output to make them readable and to make them work in my syntax highlighter (apparently, using objects named “row”, “value” and “source” make my highlighter freak out).
Feel free to download the PostgreSQL database dump and mess about with it. I’ve added the “load_puzzle” and “delete_singles” functions to store the DELETE and INSERT queries I mentioned earlier.