SQL Editor Queries

In Acho Studio, you can use the "SQL action" to insert customized SQL query for your table. This page will explain all the SQL queries that Acho Studio currently supports. The tables we are going to use are all from the sample-project created when you first log into Acho Studio.

SELECT

Description

Select is the beginning keyword for every SQL query, it indicates that the user would like to view the following data.

SELECT * FROM mask_use_by_county;

SELECT DISTINCT

Description

DISTINCT function helps the user to identify unique values within a data field.

Example

SELECT DISTINCT county_fips_code FROM mask_use_by_county;

DISTINCT can also be used to identify unique combinations of multiple data fields.

SELECT DISTINCT state_name,county FROM us_counties;

WHERE

Description

WHERE is used to identify the beginning of filtering conditions being applied to the dataset. Within one table query section, there should only be one WHERE and that should be located after the “From” statement after the name of the table. WHERE should be followed immediately by the first filtering expression.

Example

SELECT * FROM us_counties where state_name="Missouri";

AND, OR, NOT

AND, OR, NOT are all supplementary expressions to help combine and refine filter expressions.

AND allows users to obtain the intersecting or overlapping sections of two filtered datasets;

SELECT * FROM us_counties 
WHERE state_name LIKE "M%" AND confirmed_cases >1000;

OR allows users to obtain the union of two filtered datasets;

SELECT DISTINCT state_name FROM us_counties 
WHERE deaths>100 OR confirmed_cases >1000;

NOT allows users to obtain the complement of a filtered dataset;

SELECT * FROM us_counties WHERE NOT state_name="Missouri";

The three filter operations can be easily combined.

‌SELECT * FROM us_counties 
WHERE (NOT state_name="Missouri") 
  AND (deaths>100 or confirmed_cases>1000);

SQL ORDER BY

Description

ORDER BY allow the user to sort the table based on one or multiple fields. The default sorting mechanism is ascending, alphabetical for string fields and from smallest to largest for numeric fields. ORDER BY can also be used in combination with DESC to reverse the sorting order into descending.

Example

SELECT DISTINCT state_name FROM us_counties ORDER BY state_name;
SELECT DISTINCT state_name, county FROM us_counties ORDER BY state_name DESC;

INSERT INTO

INSERT INTO can be used to add new records into a table. In other words, this is equivalent to make a manual entry. Note that Insert Into is not a query operation, the processed result will just be more records added to the table.

When using INSERT INTO, users can choose to outline the value for every field;

Or just specify the select fields that the new record will have value in, the un-called out fields will simply have “null” as value.

NULL

NULL is a special word in SQL to identify truly empty field values, please note that null is not equivalent to an empty string ("").

SELECT DISTINCT state_name, county, county_fips_code FROM us_counties 
WHERE county_fips_code IS NULL;

LIMIT

LIMIT is used to select a requested number of sample records. Sometimes the table is very large and a simple select * statement can take longer to process. Hence during data exploration Limit can be used to get a small sample of the data. Limit is usually written at the very end of a query statement.

SELECT * FROM us_counties LIMIT 5;

MIN, MAX

Min and Max are mathematical operators in SQL that will allow the user to obtain the minimum or maximum value of a specified field or set. It can be used on numeric values as well as strings, where the strings are ranked in alphabetically ascending order.

SELECT MAX(confirmed_cases) AS max_value FROM us_counties;
‌SELECT MIN(confirmed_cases) AS min_value FROM us_counties;

COUNT, AVG, SUM

COUNT, AVG, and SUM are mathematical operations that can be used to summarize field data. Note that COUNT does not need to be applied to a numeric field, but AVG and SUM should. All three expressions need to be followed by brackets, “()”, and there can be further operations within the brackets. COUNT(*) can be used to calculate the total number of records.

SELECT COUNT(*) AS number_of_records FROM us_states WHERE state_name = "Maine";
SELECT COUNT(DISTINCT county) AS num_of_counties 
FROM us_counties 
WHERE state_name = "Maine";
SELECT SUM(confirmed_cases) AS total_cases_0510 
FROM us_states 
WHERE date = '2021-05-10';
SELECT AVG(deaths/confirmed_cases) AS average_death_rate_0510
FROM us_states
WHERE date = '2021-05-10';

LIKE

LIKE is usually used in string expressions to find close lookalikes when exact match strings are cumbersome to express. It is often used together with the character wildcard % or _.

SELECT * FROM us_counties WHERE county LIKE "%City%";‌

Wildcards

There are several wildcards in SQL that can be used with LIKE to find strings that match a certain pattern.

  • % is the most popular wildcard. It can represent any or no character.

SELECT * FROM us_counties WHERE county LIKE "%City%";
  • _ represent one single character, having multiple _ in a row will represent that exact number of missing characters.

SELECT * FROM us_counties WHERE county LIKE "N__fo_k";

IN

IN can be used when specifying not one but multiple values, it can be useful in setting filters by combining several OR expression into one single expression. It can also be combined with NOT to form NOT IN which will reflect the inverse of multiple exclusions. IN always need to be followed by a bracket () where the selected matching values will be listed.

SELECT DISTINCT state_name 
FROM us_counties 
WHERE state_name IN ("Oregon","Washington","California");
SELECT DISTINCT state_name
FROM us_counties
WHERE state_name NOT IN ("Arizona", "California", "Delaware")
ORDER BY state_name;

BETWEEN

BETWEEN can be used to identify ranges of values, it can be applied to numeric values or strings, where strings are ranked alphabetically ascending. BETWEEN needs to be followed by an AND in between the specified upper and lower bound values.

SELECT state_name, date, confirmed_cases
FROM us_states 
WHERE confirmed_cases 
BETWEEN 0 AND 1000;
SELECT DISTINCT state_name
FROM us_states WHERE state_name between "California" and "Guam"
ORDER BY state_name;

AS

SQL has an easy alias function to assign nicknames to tables and or fields to save some typing within a query. The alias can be noted with a keyword AS . Or, you can simply skip the AS and directly type the nickname after the full expression.

SELECT state_name AS state,
       date,
       deaths / confirmed_cases * 100 AS death_rate
FROM us_states
ORDER BY state, date;
SELECT DISTINCT B.state_name, B.county, A.never, A.always
FROM mask_use_by_county AS A
LEFT JOIN us_counties AS B
ON A.county_fips_code = B.county_fips_code
ORDER BY B.state_name DESC, B.county DESC;

INNER JOIN

In SQL, the keyword JOIN is equivalent to INNER JOIN. Both expressions will work in a query. INNER JOIN will bring two tables together on records that match the identified key field(s) only. In one sense, this means the resulting table will be intercept of the two source tables, unmatched records from either side of the table will be dropped.

SELECT * 
FROM mask_use_by_county AS A 
INNER JOIN us_counties AS B 
ON A.county_fips_code = B.county_fips_code
WHERE B.date='2020-06-23'; 

Note that SELECT * in a JOIN statement will output all columns from both tables. If only selected columns are needed then the chosen columns should be specified after SELECT. Each column name should have a prefix of the table name, which indicates the column is from which source table.

SELECT B.state_name, B.county, B.date, A.never 
FROM mask_use_by_county AS A
INNER JOIN us_counties AS B
ON A.county_fips_code = B.county_fips_code;

LEFT JOIN

If we view a join statement as a line of text on paper, the first-named table will physically be located on the left side since we write left to right, and therefore can be referred to as the left table. A left join will keep the first-named table to be joined intact, no record will be dropped from this first table, even if it did not have a match from the second table through the join key. Unmatched records from the first or left table will be carried forward into the resulting table with the new fields from the right table having null values. Unmatched records from the second or right table will be dropped.

In SQL LEFT JOIN is equivalent to LEFT OUTER JOIN, both can be used as keywords in a query.

In this following example which is exactly the same as the one we saw in inner join except the join type has change from inner to left, records from mask_use_by_county table which had null value for county_fips_code will be kept, but records from us_counties table that had null county_fips_code will be dropped.

SELECT b.state_name,b.county,b.date,a.never FROM mask_use_by_county a

left join us_counties b on a.county_fips_code = b.county_fips_code;

RIGHT JOIN

RIGHT JOIN is the opposite of LEFT JOIN where the unmatched records from the second named or right table will be kept regardless but the unmatched records from the first table will be dropped. It is equivalent to RIGHT OUTER JOIN and will both work as keywords in a query.

SELECT B.state_name, B.county, B.date,a.never 
FROM mask_use_by_county AS A
RIGHT JOIN us_counties AS b 
ON A.county_fips_code = B.county_fips_code;

FULL JOIN

FULL JOIN is the no data loss join. In this case, unmatched records from both the left and right tables will be kept. That is, no records will be dropped from either side of the join. It is equivalent to FULL OUTER JOIN.

SELECT B.state_name, B.county, B.date, A.never
FROM mask_use_by_county AS A
FULL JOIN us_counties B 
ON A.county_fips_code = B.county_fips_code;

SELF JOIN

‌A SELF JOIN is an unconventional join statement expression since the keyword JOIN is actually not found in the query anywhere. It is one way to obtain combinations within the same dataset.

Note that in this case where the field names are identical from the two sides. Thus, remember to use AS to rename two tables and the outputted columns, so SQL will not show errors due to multiple columns with the same name.

SELECT A.state_name as higher_state,B.state_name as lower_state,A.deaths as higher_state_deaths,B.deaths as lower_state_deaths

FROM us_states A, us_states B

where A.state_name <> B.state_name

and A.confirmed_cases > B.confirmed_cases;

UNION ALL

While JOIN is the concept of merging multiple tables together horizontally (more added fields), UNION ALL is the concept of appending two separate tables together vertically, by combing the total amount of records from both sides. In a union the user must be very careful with specifying the fields to bring from each table and make sure the correct fields are consolidating together. In a union expression, the keyword UNION ALL is located in between two separate select statements.

SELECT DISTINCT state_name FROM us_states
UNION ALL
SELECT DISTINCT state_name FROM us_states
ORDER BY state_name;

‌GROUP BY

GROUP BY is an extremely useful function in SQL to aggregate data to a higher level. It can also generate a pivot table on top of datasets.

A GROUP BY statement needs to have two parts that are in sync. After the initial SELECT, a set of columns that will be used as aggregation key needs to be listed, followed by the calculated metrics that will be calculated at the aggregated level. Filters, joins, and other operations can follow just like in any regular query. After filtering conditions, a GROUP BY key word is needed followed by the exact same set of fields used as aggregation key.

SELECT date,state_name, SUM(deaths), SUM(confirmed_cases)
FROM us_counties
GROUP BY date,state_name; 

‌HAVING

HAVING is a very useful filtering mechanism that allow users to place filtering conditions on aggregated results after a group by query. In one sense this allows the user to do query on top of another query. This can be extremely useful in data exploration.

Note that the filtering conditions after HAVING must be elements of the top select statement.

SELECT date, state_name, SUM(deaths), SUM(confirmed_cases)
FROM us_counties
GROUP BY date,state_name 
HAVING SUM(deaths)>500;

‌EXISTS

Exists in SQL is a binary test, it will return True/False for a select statement if it contains records or not. It is another way to place an advanced filter in a query as the test performed is not limited to the current table. The key word “exist” is placed after where, and then following by a bracketed separate select statement which contains the conditions for the test.

SELECT DISTINCT state_name FROM us_states
WHERE EXISTS 
      (
       SELECT county 
       FROM us_counties 
       WHERE us_counties.state_name = us_states.state_name 
       AND deaths>100
       );

CASE

CASE is the SQL way of placing “If & else” logic into a query. It means generating a new field using case logic into the existing table, usually used to categorize records. Although the case statement itself can take several rows to complete, it is equivalent to one new field, therefore it usually goes immediately behind SELECT parallel to the fields to be displayed and in front of WHERE.

SELECT state_name, deaths,
       CASE WHEN deaths > 1000 THEN "High Death Rate"
            WHEN deaths > 500  THEN "Medium Death Rate"
            ELSE "Low Death Rate"
       END 
       AS Deat_Rate_Severity
FROM us_states;

COALESCE

The keyword COALESCE can be used to embed a simple case logic when encountering null values, it can be very useful in calculations as null values may easily skew or null out the calculated results. COALESCE is followed by a bracket with two parameters, the first parameter is the field where the null check will be performed, the second parameter is the value to replace NULL with.

SELECT state_name, COALESCE(deaths,0) / confirmed_cases FROM us_states;

‌SQL Stored Procedures can only do select in Acho.

‌Comments

Special character /* and */ can be used to indicate start and end of a comment or none-executable section of the query code. Within these two special characters, any content can be written, no syntax needs to be followed. Usually the comment function is used to leave written comments on complex queries, it can also be used to comment out sections of the query codes to allow user to do flexible testing.

Note that is only /* is used, all written content after /* will all be commented out and become none executable.

‌SELECT state_name, deaths,
/*case when deaths > 1000 then "High Death Rate"
    when deaths > 500 then "Medium Death Rate"
    else "Low Death Rate"
    end as Deat_Rate_Severity*/
FROM us_states;

SQL Operators

The following are arithmetic operators that can be used in SQL query:

The following comparison operators can be used in SQL query:

Furthermore, “!” can be used as a prefix to any of the comparison operators to achieve the inverse operation. For example “!=” means “not equal to”, “!>” means “not greater than” which is equivalent to “<=”.

Additionally, there exists another distinctive operator denoted as "<>", which serves the purpose of expressing "not equal to" or "different from." This operator is suitable for numeric fields as well as other field types.

Last updated