Measure user distance in MySQL

0
4602
Measure user distance in MySQL

In this article, I will share with you some techniques to measure user distance in MySQL and MariaDB by computing the distance using SQL queries on the fly.

To begin with, you can create a dataset from geocoding databases and I wrote about before. If you have no idea what geocoding is, read my introduction to geocoding.

Measure user distance in MySQL

If you follow my blog daily, you might already know about the formulas to calculate distance of two locations on Earth just by using the GPS coordinates, which is represented as two values: longitude and latitude.

So, those formulas will be written in SQL queries in this article. We will use these two popular ones:

Calculate two locations distance

Following is my table structure, which generated directly from geocoding database.

mysql> describe zip_coordinates;
+----------+----------------+------+-----+---------+----------------+
| Field    | Type           | Null | Key | Default | Extra          |
+----------+----------------+------+-----+---------+----------------+
| id       | int(11)        | NO   | PRI | NULL    | auto_increment |
| zipcode  | varchar(5)     | NO   |     | NULL    |                |
| name     | varchar(64)    | NO   |     | NULL    |                |
| lat      | decimal(14,12) | NO   |     | NULL    |                |
| lon      | decimal(14,12) | NO   |     | NULL    |                |
+----------+----------------+------+-----+---------+----------------+

Let’s try with two locations:

mysql> select * from zip_coordinates where id = 1 or id = 2;
+----+---------+---------------------+-----------------+-----------------+
| id | zipcode | name                | lat             | lon             |
+----+---------+---------------------+-----------------+-----------------+
|  1 | 19348   | Berge bei Perleberg | 53.237460000000 | 11.870770000000 |
|  2 | 85309   | Pörnbach            | 48.616700000000 | 11.466700000000 |
+----+---------+---------------------+-----------------+-----------------+
2 rows in set (0.01 sec)

Alright, seems to be somewhere in Germany or nearby locations. No idea!

By using the Great circle distance formula, we can write the following SQL query to get distance of these two places.

mysql> SELECT
    ->     (6373 * ACOS(
    ->         COS( RADIANS(48.61670) )
    ->         * COS( RADIANS( 53.23746 ) )
    ->         * COS( RADIANS( 11.87077 ) - RADIANS(11.46670) )
    ->         + SIN( RADIANS( 48.61670) )
    ->         * SIN( RADIANS( 53.23746 ) )
    ->         )
    ->     ) AS distance;
+-------------------+
| distance          |
+-------------------+
| 514.7441485659912 |
+-------------------+
1 row in set (0.00 sec)

The result is about 514.7441 km approximately.

Similarly, switching to Haversine formular, we have this SQL query:

mysql> SELECT
    ->   2 * 6373 * ASIN(
    ->     SQRT(
    ->           POWER(SIN(RADIANS((48.61670 - 53.23746) / 2)), 2) +
    ->           COS(RADIANS(53.23746)) * COS(RADIANS(48.61670)) *
    ->           POWER(SIN(RADIANS((11.46670 - 11.87077) / 2)), 2)
    ->         )
    ->     ) AS distance;
+-------------------+
| distance          |
+-------------------+
| 514.7441485659951 |
+-------------------+
1 row in set (0.00 sec)

Wow, the result is so close to the previous calculation, which is roughly 514.7441 km.

Measure distance of one location against a list of locations

In reality, we often need to have result of measuring distance from one location versus a list of locations. For example, we might need to implement feature like showing list of stores by distance.

How to implement such a thing?

With power of SQL, we can iterate and compute one location against a list. Let say we want to compute for above location, the one with id=1 with the rest of table.

Here’s the take:

mysql> SELECT
    ->     z.id, z.zipcode,
    ->     6373 * ACOS(
    ->         COS( RADIANS(z.lat) )
    ->         * COS( RADIANS( 53.23746 ) )
    ->         * COS( RADIANS( 11.87077 ) - RADIANS(z.lon) )
    ->         + SIN( RADIANS(z.lat) )
    ->         * SIN( RADIANS( 53.23746 ) )
    ->     ) AS distance
    -> FROM zip_coordinates z
    -> ;
+----+---------+--------------------+
| id | zipcode | distance           |
+----+---------+--------------------+
|  2 | 85309   |  514.7441485659912 |
|  3 | 24790   | 181.66529356247128 |
|  4 | 98646   | 321.03084219275115 |
|  5 | 27336   | 171.40264694104494 |
|  6 | 19294   |  35.09071017216436 |
|  7 | 19395   |   32.0921683608484 |
|  8 | 99628   | 234.80056320704884 |
|  9 | 38486   |  89.71485884768443 |
| 10 | 72622   |  541.6571552381077 |
| 11 | 27324   |  184.8237421312911 |
| .. | .....   |  ........          |
+----+---------+--------------------+
17367 rows in set, 1 warning (0.05 sec)

It looks cool, we have complete distance result in just roughly 50ms. That is not too bad at all.

Moreover, we might need to have distance filtering, like: having interests in only places within 100km of location id=1.

mysql> SELECT
    ->     z.id, z.zipcode,
    ->     6373 * ACOS(
    ->         COS( RADIANS(z.lat) )
    ->         * COS( RADIANS( 53.23746 ) )
    ->         * COS( RADIANS( 11.87077 ) - RADIANS(z.lon) )
    ->         + SIN( RADIANS(z.lat) )
    ->         * SIN( RADIANS( 53.23746 ) )
    ->     ) AS distance
    -> FROM zip_coordinates z
    -> HAVING distance <= 100
    -> ;
+----+---------+--------------------+
| id | zipcode | distance           |
+----+---------+--------------------+
|  6 | 19294   |  35.09071017216436 |
|  7 | 19395   |   32.0921683608484 |
|  9 | 38486   |  89.71485884768443 |
| 13 | 19073   |  54.29949454514148 |
| 22 | 19071   | 63.403835501415756 |
| 27 | 17179   |  98.49471994784525 |
| 55 | 17039   |  99.14371334332706 |
| 63 | 23909   |  91.56830035985475 |
| 77 | 23968   |  83.36517472817539 |
| 88 | 39606   |  53.31920344352124 |
| .. | .....   |  ................. |
+----+---------+--------------------+
1715 rows in set (0.02 sec)

The filtering is being done right, but we need some kind of ordering from nearest to the farthest locations. So the query becomes:

mysql> SELECT
    ->     z.id, z.zipcode,
    ->     6373 * ACOS(
    ->         COS( RADIANS(z.lat) )
    ->         * COS( RADIANS( 53.23746 ) )
    ->         * COS( RADIANS( 11.87077 ) - RADIANS(z.lon) )
    ->         + SIN( RADIANS(z.lat) )
    ->         * SIN( RADIANS( 53.23746 ) )
    ->     ) AS distance
    -> FROM zip_coordinates z
    -> HAVING distance <= 100
    -> ORDER BY distance ASC
    -> ;
+-------+---------+--------------------+
| id    | zipcode | distance           |
+-------+---------+--------------------+
|  3835 | 19348   |  2.856191974267735 |
|   320 | 19372   | 4.7365965470299285 |
|   319 | 19300   | 4.7365965470299285 |
| 10723 | 19357   |  5.925404642212945 |
| 17053 | 19348   |  6.415856404417635 |
|  2465 | 19357   |   7.03089922544295 |
|  9655 | 19357   |  7.392084823674285 |
| 15575 | 19372   | 7.5748574148953995 |
|  6137 | 19348   |  8.505292617218123 |
| 12922 | 19357   |   9.15148868821331 |
| ..... | ....    | ...................|
+-------+---------+--------------------+
1715 rows in set (0.03 sec)

You can also provide the paging feature with LIMIT.

mysql> SELECT
    ->     z.id, z.zipcode,
    ->     6373 * ACOS(
    ->         COS( RADIANS(z.lat) )
    ->         * COS( RADIANS( 53.23746 ) )
    ->         * COS( RADIANS( 11.87077 ) - RADIANS(z.lon) )
    ->         + SIN( RADIANS(z.lat) )
    ->         * SIN( RADIANS( 53.23746 ) )
    ->     ) AS distance
    -> FROM zip_coordinates z
    -> HAVING distance <= 100
    -> ORDER BY distance ASC
    -> LIMIT 1, 10
    -> ;
+-------+---------+--------------------+
| id    | zipcode | distance           |
+-------+---------+--------------------+
|  3835 | 19348   |  2.856191974267735 |
|   320 | 19372   | 4.7365965470299285 |
|   319 | 19300   | 4.7365965470299285 |
| 10723 | 19357   |  5.925404642212945 |
| 17053 | 19348   |  6.415856404417635 |
|  2465 | 19357   |   7.03089922544295 |
|  9655 | 19357   |  7.392084823674285 |
| 15575 | 19372   | 7.5748574148953995 |
|  6137 | 19348   |  8.505292617218123 |
| 12922 | 19357   |   9.15148868821331 |
+-------+---------+--------------------+
10 rows in set (0.02 sec)

There you go, with power of database SQL, we can compute user distance in a very easy way.

Conclusion

You don’t need to use any extra functionality in order to compute and measure user distance in MySQL. It is already there with making use of some mathematic formulas.

Hope this article give you some ideas about geographic computation using MySQL.