D465 DATA APPLICATIONS
STUDY GUIDE COMPLETE
LATEST UPDATED WESTER N
GOVERNORS’ UNIVERSITY
Organizing data for analysis The 4 Phases of analysis
- Organize data
- Format and adjust data
- Ask for input
- Transform data—means identifying relationships and finding patterns
Sorting & Filtering Format for better analysis Data Validation: Allows you to control what can and can’t be entered in your work.• Add dropdown lists with predetermined options to choose from • Create custom check boxes • Protect structured data and formulas Conditional formatting- a spreadsheet tool that changes how cells appear when values meet specific conditions.Using SQL Order by sorts the query in Ascending order by default.DESC makes it descending order ascending order represents numbers from lowest to highest Descending order is highest to lowest To filter use the WHERE clause AND creates sub Filters CONCAT and CONCATENATE Join 2 or more text strings SQL Functions COUNT returns the number of rows in a specified range.COUNT DISTINCT only returns the distinct values in a specified range. This means it removes both duplicate rows containing the same data and null values from the result set.SQL functions are tools built into SQL to facilitate performing calculations AVG() function to calculate the average salary of employees Subquery, also called an inner or nested query, is a SQL query that is nested inside a larger query HAVING clause filters the results of a SQL query based on conditions applied after the grouping CASE provides conditional logic in SQL queries, similar to an 'if-else' structure in programming languages IF performs a simple conditional test and returns a value depending on the outcome COUNT performs a simple conditional test and returns a value depending on the outcome
LEN – Calculates the length of a string FIND – locates specific characters in a string RIGHT – selects a specific number of characters on the right side of a cell LEFT - selects a specific number of characters on the left of a cell CONCAT_WS – concat with separator ROUND – returns 2 decimal places LIMIT - returns a specific number of results Distinct – shows different values in same column INSERT INTO places a new table Sub Queries • Subqueries must be enclosed within parentheses.• A subquery can have only one column specified in the SELECT clause. If you want a subquery to compare multiple columns, those columns must be selected in the main query.• Subqueries that return more than one row can only be used with multiple value operators, such as the IN operator which allows you to specify multiple values in a WHERE clause.• A subquery can’t be nested in a SET command. The SET command is used with UPDATE to specify which columns (and values) are to be updated in a table.
When you get stuck • Ask your peers • Look online
Best practices for searching online:
Thinking skills
Data analytics terms Basic knowledge of tools When to use which tools Simple Spreadsheets – use pivot table Multiple data sets or large spreadsheets – use SQL For larger complex data – Use R Aggregate data for analysis Aggregation- collecting or gathering many separate pieces into a whole.VLOOKUP- stands for vertical lookup and is a function that searches for a certain value in a column to return a corresponding piece of information.Value function- turns text into numeric values.Trim- removes spaces JOINS JOIN – a SQL clause that is used to combine rows from two or more tables based on a related column.Common joins • INNER – A function that returns records with matching values in both tables • LEFT – a function that will return all the records from the left table and only the matching records from the right table.• RIGHT – A function that will return all records from the right table and only the matching records from the left.• OUTER – a function that combines RIGHT and LEFT join to return all matching records in both tables.