HEUG AI Innovators Network

 View Only
  • 1.  Steps for Making Your Institution's Data "AI-Ready"

    Posted 05-20-2024 10:01 AM

    Community!

    Knowing that among the hottest discussions in all technology these days revolves around data and AI / Machine Learning (ML) and Large Language Models (LLM), we're beginning the process for making our institutional data "AI-Ready." 

    We certainly know there are questions and concerns related to data security, data privacy, and ambiguity related to what 'breadcrumbs' about your prompts and/or results may persist in the vector database even using retrieval-augmented generation (RAG). That is, questions about "how to make your data AI-ready safely."

    But conversely, we also have many questions raising "how to get the most out of AI by prepping data to assist your AI/LLM engine)."

    Have you all had any experience or opinions on considering steps to make your institutional data AI-ready?



    ------------------------------
    Joshua Vincent
    Executive Director, Data Engineering & Operations
    Vanderbilt University
    615-538-7540
    ------------------------------
    HEUG the mic


  • 2.  RE: Steps for Making Your Institution's Data "AI-Ready"

    Posted 05-23-2024 01:55 PM

    Hi Joshua,

    A tricky topic, indeed. You are spot on in the belief that this is an important topic. AI learns from data, so if the data is garbage, you will get garbage out. In this case your "data" can be in many forms:

    • Structured data such as your course catalog and schedules in PeopleSoft
    • Unstructured content such as your course descriptions online
    • Unstructured policies like PDFs and web pages for something like the grade appeal process
    • Structured data annotations or labels (for example, tagging a student records as "successful") -- most institutions won't have this data because they are new to AI (and that's ok!)

    It would be a wonderful world if all this data were clean, high-quality, accurate and timely. We rarely see that, however. You may be surprised how many institution web sites out there have inaccuracies. The challenge is one of chickens and eggs. Can you start AI without clean data or does starting AI allow you to better clean data?

    What I have seen work well is 1) start with a quick review of structured data to ensure it is ready for self-service. No abbreviations, no blank fields, no misspellings, etc. If that data were displayed on a webpage, would you be happy that a student saw it in the form it is today? 2) Once that quick pass is done, get the data you have identified and start feeding it into the AI. 3) Run a pilot and market it as a work in progress. This is a critical step because we can use real world interactions to inform where improvements are needed. You can also collect implicit and explicit user feedback to know where to focus. 

    This allows you to get started and then focus on the impact areas instead of trying to get everything perfect beforehand which can feel overwhelming and be a barrier to progress. The ole boil the ocean feeling. The above approach is one that fits naturally with AI. Allow it to learn, constantly refine data based on feedback, relearn, rinse and repeat.

    Here is a funny example of bad training data and LLMs specifically: https://x.com/petergyang/status/1793480607198323196?s=46&t=wW92vvqipSziyNV9rkjnVQ 



    ------------------------------
    Andrew Bediz
    Managing Director AI & UX
    7733577428
    ------------------------------

    Message from the HEUG Marketplace:
    ------------------------------
    Find, Review, and Engage with Higher Education-focused solution providers, products, and services using the HEUG Marketplace.
    ------------------------------

    HEUG the mic