In Acho Studio, you can use the "SQL action" to insert customized SQL query for your table. This page will explain all the SQL queries that Acho Studio currently supports. The tables we are going to use are all from the sample-project created when you first log into Acho Studio.
Select is the beginning keyword for every SQL query, it indicates that the user would like to view the following data.
SELECT * FROM mask_use_by_county;
DISTINCT function helps the user to identify unique values within a data field.
SELECT DISTINCT county_fips_code FROM mask_use_by_county;
DISTINCT can also be used to identify unique combinations of multiple data fields.
SELECT DISTINCT state_name,county FROM us_counties;
WHERE is used to identify the beginning of filtering conditions being applied to the dataset. Within one table query section, there should only be one
WHERE and that should be located after the “From” statement after the name of the table.
WHERE should be followed immediately by the first filtering expression.
SELECT * FROM us_counties where state_name="Missouri";
NOT are all supplementary expressions to help combine and refine filter expressions.
AND allows users to obtain the intersecting or overlapping sections of two filtered datasets;
SELECT * FROM us_countiesWHERE state_name LIKE "M%" AND confirmed_cases >1000;
OR allows users to obtain the union of two filtered datasets;
SELECT DISTINCT state_name FROM us_countiesWHERE deaths>100 OR confirmed_cases >1000;
NOT allows users to obtain the complement of a filtered dataset;
SELECT * FROM us_counties WHERE NOT state_name="Missouri";
The three filter operations can be easily combined.
SELECT * FROM us_countiesWHERE (NOT state_name="Missouri")AND (deaths>100 or confirmed_cases>1000);
ORDER BY allow the user to sort the table based on one or multiple fields. The default sorting mechanism is ascending, alphabetical for string fields and from smallest to largest for numeric fields.
ORDER BY can also be used in combination with
DESC to reverse the sorting order into descending.
SELECT DISTINCT state_name FROM us_counties ORDER BY state_name;
SELECT DISTINCT state_name, county FROM us_counties ORDER BY state_name DESC;
INSERT INTO can be used to add new records into a table. In other words, this is equivalent to make a manual entry. Note that Insert Into is not a query operation, the processed result will just be more records added to the table.
INSERT INTO, users can choose to outline the value for every field;
Or just specify the select fields that the new record will have value in, the un-called out fields will simply have “null” as value.
NULL is a special word in SQL to identify truly empty field values, please note that null is not equivalent to an empty string ("").
SELECT DISTINCT state_name, county, county_fips_code FROM us_countiesWHERE county_fips_code IS NULL;
LIMIT is used to select a requested number of sample records. Sometimes the table is very large and a simple select * statement can take longer to process. Hence during data exploration Limit can be used to get a small sample of the data. Limit is usually written at the very end of a query statement.
SELECT * FROM us_counties LIMIT 5;
Max are mathematical operators in SQL that will allow the user to obtain the minimum or maximum value of a specified field or set. It can be used on numeric values as well as strings, where the strings are ranked in alphabetically ascending order.
SELECT MAX(confirmed_cases) AS max_value FROM us_counties;
SELECT MIN(confirmed_cases) AS min_value FROM us_counties;
SUM are mathematical operations that can be used to summarize field data. Note that
COUNT does not need to be applied to a numeric field, but
SUM should. All three expressions need to be followed by brackets, “()”, and there can be further operations within the brackets.
COUNT(*) can be used to calculate the total number of records.
SELECT COUNT(*) AS number_of_records FROM us_states WHERE state_name = "Maine";
SELECT COUNT(DISTINCT county) AS num_of_countiesFROM us_countiesWHERE state_name = "Maine";
SELECT SUM(confirmed_cases) AS total_cases_0510FROM us_statesWHERE date = '2021-05-10';
SELECT AVG(deaths/confirmed_cases) AS average_death_rate_0510FROM us_statesWHERE date = '2021-05-10';
LIKE is usually used in string expressions to find close lookalikes when exact match strings are cumbersome to express. It is often used together with the character wildcard
SELECT * FROM us_counties WHERE county LIKE "%City%";
There are several wildcards in SQL that can be used with
LIKE to find strings that match a certain pattern.
% is the most popular wildcard. It can represent any or no character.
SELECT * FROM us_counties WHERE county LIKE "%City%";
_ represent one single character, having multiple
_ in a row will represent that exact number of missing characters.
SELECT * FROM us_counties WHERE county LIKE "N__fo_k";
IN can be used when specifying not one but multiple values, it can be useful in setting filters by combining several
OR expression into one single expression. It can also be combined with
NOT to form
NOT IN which will reflect the inverse of multiple exclusions.
IN always need to be followed by a bracket
() where the selected matching values will be listed.
SELECT DISTINCT state_nameFROM us_countiesWHERE state_name IN ("Oregon","Washington","California");
SELECT DISTINCT state_nameFROM us_countiesWHERE state_name NOT IN ("Arizona", "California", "Delaware")ORDER BY state_name;
BETWEEN can be used to identify ranges of values, it can be applied to numeric values or strings, where strings are ranked alphabetically ascending.
BETWEEN needs to be followed by an
AND in between the specified upper and lower bound values.
SELECT state_name, date, confirmed_casesFROM us_statesWHERE confirmed_casesBETWEEN 0 AND 1000;
SELECT DISTINCT state_nameFROM us_states WHERE state_name between "California" and "Guam"ORDER BY state_name;
SQL has an easy alias function to assign nicknames to tables and or fields to save some typing within a query. The alias can be noted with a keyword
AS . Or, you can simply skip the
AS and directly type the nickname after the full expression.
SELECT state_name AS state,date,deaths / confirmed_cases * 100 AS death_rateFROM us_statesORDER BY state, date;
SELECT DISTINCT B.state_name, B.county, A.never, A.alwaysFROM mask_use_by_county AS ALEFT JOIN us_counties AS BON A.county_fips_code = B.county_fips_codeORDER BY B.state_name DESC, B.county DESC;
In SQL, the keyword
JOIN is equivalent to
INNER JOIN. Both expressions will work in a query.
INNER JOIN will bring two tables together on records that match the identified key field(s) only. In one sense, this means the resulting table will be intercept of the two source tables, unmatched records from either side of the table will be dropped.
SELECT *FROM mask_use_by_county AS AINNER JOIN us_counties AS BON A.county_fips_code = B.county_fips_codeWHERE B.date='2020-06-23';
SELECT * in a
JOIN statement will output all columns from both tables. If only selected columns are needed then the chosen columns should be specified after
SELECT. Each column name should have a prefix of the table name, which indicates the column is from which source table.
SELECT B.state_name, B.county, B.date, A.neverFROM mask_use_by_county AS AINNER JOIN us_counties AS BON A.county_fips_code = B.county_fips_code;
If we view a join statement as a line of text on paper, the first-named table will physically be located on the left side since we write left to right, and therefore can be referred to as the left table. A left join will keep the first-named table to be joined intact, no record will be dropped from this first table, even if it did not have a match from the second table through the join key. Unmatched records from the first or left table will be carried forward into the resulting table with the new fields from the right table having null values. Unmatched records from the second or right table will be dropped.
LEFT JOIN is equivalent to
LEFT OUTER JOIN, both can be used as keywords in a query.
In this following example which is exactly the same as the one we saw in inner join except the join type has change from inner to left, records from mask_use_by_county table which had null value for county_fips_code will be kept, but records from us_counties table that had null county_fips_code will be dropped.
SELECT b.state_name,b.county,b.date,a.never FROM mask_use_by_county a
left join us_counties b on a.county_fips_code = b.county_fips_code;
RIGHT JOIN is the opposite of
LEFT JOIN where the unmatched records from the second named or right table will be kept regardless but the unmatched records from the first table will be dropped. It is equivalent to
RIGHT OUTER JOIN and will both work as keywords in a query.
SELECT B.state_name, B.county, B.date,a.neverFROM mask_use_by_county AS ARIGHT JOIN us_counties AS bON A.county_fips_code = B.county_fips_code;
FULL JOIN is the no data loss join. In this case, unmatched records from both the left and right tables will be kept. That is, no records will be dropped from either side of the join. It is equivalent to
FULL OUTER JOIN.
SELECT B.state_name, B.county, B.date, A.neverFROM mask_use_by_county AS AFULL JOIN us_counties BON A.county_fips_code = B.county_fips_code;
SELF JOIN is an unconventional join statement expression since the keyword
JOIN is actually not found in the query anywhere. It is one way to obtain combinations within the same dataset.
Note that in this case where the field names are identical from the two sides. Thus, remember to use
AS to rename two tables and the outputted columns, so SQL will not show errors due to multiple columns with the same name.
SELECT A.state_name as higher_state,B.state_name as lower_state,A.deaths as higher_state_deaths,B.deaths as lower_state_deaths
FROM us_states A, us_states B
where A.state_name <> B.state_name
and A.confirmed_cases > B.confirmed_cases;
JOIN is the concept of merging multiple tables together horizontally (more added fields),
UNION ALL is the concept of appending two separate tables together vertically, by combing the total amount of records from both sides. In a union the user must be very careful with specifying the fields to bring from each table and make sure the correct fields are consolidating together. In a union expression, the keyword
UNION ALL is located in between two separate select statements.
SELECT DISTINCT state_name FROM us_statesUNION ALLSELECT DISTINCT state_name FROM us_statesORDER BY state_name;
GROUP BY is an extremely useful function in SQL to aggregate data to a higher level. It can also generate a pivot table on top of datasets.
GROUP BY statement needs to have two parts that are in sync. After the initial
SELECT, a set of columns that will be used as aggregation key needs to be listed, followed by the calculated metrics that will be calculated at the aggregated level. Filters, joins, and other operations can follow just like in any regular query. After filtering conditions, a
GROUP BY key word is needed followed by the exact same set of fields used as aggregation key.
SELECT date,state_name, SUM(deaths), SUM(confirmed_cases)FROM us_countiesGROUP BY date,state_name;
HAVING is a very useful filtering mechanism that allow users to place filtering conditions on aggregated results after a group by query. In one sense this allows the user to do query on top of another query. This can be extremely useful in data exploration.
Note that the filtering conditions after
HAVING must be elements of the top select statement.
SELECT date, state_name, SUM(deaths), SUM(confirmed_cases)FROM us_countiesGROUP BY date,state_nameHAVING SUM(deaths)>500;
Exists in SQL is a binary test, it will return True/False for a select statement if it contains records or not. It is another way to place an advanced filter in a query as the test performed is not limited to the current table. The key word “exist” is placed after where, and then following by a bracketed separate select statement which contains the conditions for the test.
SELECT DISTINCT state_name FROM us_statesWHERE EXISTS(SELECT countyFROM us_countiesWHERE us_counties.state_name = us_states.state_nameAND deaths>100);
CASE is the SQL way of placing “If & else” logic into a query. It means generating a new field using case logic into the existing table, usually used to categorize records. Although the case statement itself can take several rows to complete, it is equivalent to one new field, therefore it usually goes immediately behind
SELECT parallel to the fields to be displayed and in front of
SELECT state_name, deaths,CASE WHEN deaths > 1000 THEN "High Death Rate"WHEN deaths > 500 THEN "Medium Death Rate"ELSE "Low Death Rate"ENDAS Deat_Rate_SeverityFROM us_states;
COALESCE can be used to embed a simple case logic when encountering null values, it can be very useful in calculations as null values may easily skew or null out the calculated results.
COALESCE is followed by a bracket with two parameters, the first parameter is the field where the null check will be performed, the second parameter is the value to replace
SELECT state_name, COALESCE(deaths,0) / confirmed_cases FROM us_states;
SQL Stored Procedures can only do select in Acho.
Special character /* and */ can be used to indicate start and end of a comment or none-executable section of the query code. Within these two special characters, any content can be written, no syntax needs to be followed. Usually the comment function is used to leave written comments on complex queries, it can also be used to comment out sections of the query codes to allow user to do flexible testing.
Note that is only /* is used, all written content after /* will all be commented out and become none executable.
SELECT state_name, deaths,/*case when deaths > 1000 then "High Death Rate"when deaths > 500 then "Medium Death Rate"else "Low Death Rate"end as Deat_Rate_Severity*/FROM us_states;
The following are arithmetic operators that can be used in SQL query:
remainder or modulus
The following comparison operators can be used in SQL query:
greater than or equal to
less than or equal to
In addition, “!” can be used in front of any of the comparison operators to obtain the inverse operation. For example “!=” means “not equal to”, “!>” means “not greater than” which is equivalent to “<=”.
There is also another special operator “<>” which means “not equal to” or “different from”, it can be used on numeric fields as well as other field types.