This assignment is designed to prepare you for the next phase of this class – data analysis and database design. Although you have been (inadvertently) using some of the ideas relevant to this next phase, this exercise arrives at those concepts from a practical standpoint. It is also a great way to get you to start thinking about the term project that will be due by end of teh term.
While it is handy to know how to retrieve data from a table, combine data from multiple tables, summarize it before reporting, it is just as important to look at the raw data and try to understand what it is communicating. Your analysis and reporting will depend on the accuracy and completeness of the data.
Look at the data provided in the following CSV files and determine if there are any issues that will require attention. Specifically, look for:
- Missing data
- Unique data
- unique data identified above in other tables (you can JOIN!)
- Misspelled or incomplete data (male vs mael vs M)
TASK: Use features in Excel to spot these errors. Common techniques include sorting and filtering.Links to an external site. They are powerful enough to help you deal with missing and misspelled values as well as outliers. There are many other functions in Excel (see resource below) that can help you with understanding your data.
Resource: Excel functions. I used to link to a specific article but that link is now broken. I encourage you to use your preferred search engine and find sites you like that can help you – BingLinks to an external site., GoogleLinks to an external site., BraveLinks to an external site..
Submit (tips): A well-documented summary of the problems in the data linked above addressing items 1 – 5 that you were asked to look for. A grid like the one shown below may be used to organize your findings. You can do this in Google sheets or Excel and attach it here as a PDF using the headings shown below
Data fileColumn name Data issue observed (missing, misspelled, outlier, Unique)% of values with this issue Importance of addressing this issue (risk for ignoring).