Blogs

Messy Data, Meet Computational Thinking (Powered by Microsoft Analyst)

By Anna Kourouniotis posted 09-14-2025 12:10 PM

  

** The following article was originally published on LinkedIn and its formatting was adjusted for this post. **

The Context

Working with institutional data often feels like wrestling with three different puzzles at once—each with missing pieces, mismatched edges, or extra duplicates. In my case, I needed to merge a CSV of student enrollments, a tab-delimited text file of course registrations, and an Excel file of advisor assignments, all including formatting quirks and inconsistencies. What could have been a tedious, error-prone task turned into an opportunity to see how AI (via Microsoft Analyst) not only automated the cleanup, but also illustrated Computational Thinking in action (Acevedo-Borrega et. al., 2022). From decomposition to algorithmic thinking, the very skills we try to cultivate in educational technology were on full display in this real-world data wrangling challenge. 

The Task

To combine:

  • A CSV of student enrollments (some IDs missing leading zeroes)
  • A tab-delimited text file of course registrations (delimiter misconfigured, missing some unique IDs)
  • An Excel file of advisor assignments (contains duplicated rows, IDs split across multiple columns)

The Goal

To create a unified data set to use as a data source in Tableau.

The AI-Driven Step-by-Step Solution using Analyst

My Prompt:

How can I use Analyst to merge three data sets (one csv, one tab delimited and one XLS file) where some of the files are missing unique identifiers, some are duplicated records, and others are missing leading zeroes? There might be additional issues with the data files that I have not yet seen. The ultimate goal here is to create a single data set that I can use as a source in my Tableau workbook. 

The Analyst’s Step-by-Step Solution


==== From this point forward, Analyst explains the steps it will take to help me with this problem (Microsoft, 2025) ====

Data Ingestion and Delimiter Correction
  • Analyst automatically detects that the tab file is mis-delimited (commas instead of tabs) and recommends or applies delimiter correction for accurate parsing.

ID Normalization and Missing Identifiers
  • Analyst reviews all ID fields; uses data type recognition to reformat IDs as text, preserving and even restoring leading zeroes where necessary (using techniques like text formatting and padding via the AI’s built-in rules).
  • For missing IDs, AI examines other fields (names, dates, email addresses) and probabilistically links records using fuzzy matching or composite keys to avoid dropping relevant records.

Duplicate Detection and Resolution
  • Analyst applies proprietary algorithms to find repeated entries across or within files, giving options to merge non-conflicting records, combine differing values (sometimes using delimiters within fields), and present ambiguous cases for human review.

Dataset Merging and Output
  • The system merges all files using cleaned and aligned keys. It automatically preserves data integrity (no dropped zeros, accurate merging of partial matches, deduplication, etc.) and generates a single, comprehensive output ready for visualization or analysis.

Analyst User Workflow

  • Import all files; configure detection for CSV, Tab and XLS formats.
  • Use AI-driven preview to flag delimiter and format issues.
  • Normalize key columns (IDs) to text, pad with zeros as needed.
  • Match records using both unique and fuzzy criteria.
  • Review duplicates and approve merges.
  • Export consolidated data for analysis—preserving all critical formatting.

Summary Table

This approach illustrates how advanced AI assistant tools, such as Analyst, automate and streamline what would otherwise be a very manual, error-prone data wrangling process in the higher ed context.

Final Thoughts

Following this analysis, I entered another prompt and included the real files for analysis. I won’t be sharing the final file with you here. I just wanted to share the logic and the process that Analyst took. I think it does a great job at breaking things down. The process that Analyst followed to clean, merge, and prepare my data for Tableau is a textbook example of Computational Thinking (CT) in action—especially from an educational technology perspective. For you ed tech nerds, here's how each core CT skill was applied:

1. Decomposition


Definition: Breaking down a complex problem into smaller, manageable parts.
Application:

  • Started with three files in different formats containing similar data.
  • Broke the task into subtasks: loading, cleaning, deduplicating, merging, and exporting.
  • Treated each issue (e.g., inconsistent types, blanks, duplicates) as a separate challenge to solve.


2. Pattern Recognition


Definition: Observing similarities, trends, or repeated elements in data or processes.
Application:

  • Identified common columns (ID, Career, School) across both files.
  • Noticed patterns in how missing data was represented (e.g., "None", "nan", empty strings).
  • Detected duplicate rows and standardized how they were handled.

3. Abstraction


Definition: Filtering out unnecessary details to focus on the essential aspects.
Application:

  • Focused on key fields relevant to the analysis and ignored irrelevant formatting differences.
  • Converted various representations of missing data into a unified format (NaN).
  • Simplified the merging process by treating both files as having the same schema.

4. Algorithmic Thinking


Definition: Developing a step-by-step solution or set of rules to solve a problem.
Application:

  • Designed a cleaning function to apply consistent rules across both datasets.
  • Used a repeatable process: read → clean → merge → deduplicate → export.
  • Ensured the final output was structured and ready for Tableau ingestion.

Final, Final Thought!


I also want to officially add Microsoft Analyst as an additional hero to my workflow! Is anyone else open to exploring this tool? If so, please do show and tell! 😝 

References:


Acevedo-Borrega, J., Valverde-Berrocoso, J., & Garrido-Arroyo, M. d. C. (2022). Computational Thinking and Educational Technology: A Scoping Review of the Literature. Education Sciences, 12(1), 39. https://doi.org/10.3390/educsci12010039 

Côté, P.-O., Nikanjam, A., Ahmed, N., Humeniuk, D., & Khomh, F. (2024). Data cleaning and machine learning: A systematic literature review. Automated Software Engineering, 31, Article 54. https://doi.org/10.1007/s10515-024-00453-w 

Microsoft. (2025). Copilot (GPT-4) [Generative AI assistant]. https://copilot.microsoft.com 

Molloy University. (2025, January 31). What is educational technology and why is it important? https://www.molloy.edu/news/what-is-educational-technology 

1 comment
86 views

Permalink

Comments

09-15-2025 08:39 AM

Great article, Anna! I haven't used this tool before but now I am excited to try something new!  Data Nerds - Unite!