Proposed By
Michael Olson
Number of Attendees
33
Where will the conversation continue?
Not likely
Summary
Discussed existing tools used to scan large volumes of data for restricted/sensitive information. Most are based on regular expressions and patterns. Goals is balancing sensitivity vs. false positives and adjudication of hits.
Notes
Michael Olsen - Stanford Libraries leading the session
- Libraries contain sensitive information
- Labs (2009+)
- RWC
- Green
- Labs (2009+)
- Scanning for High Risk data
- How to document results of scans?
- images from disks
- Problem: Want to archive data but limit access to sensitive data
- Tools
- see links below
- Volume
- 190TB on server sanctioned for high risk
UIT - using ProvePoint to actively scan
- scans emails already
- OnPrem - called Data Discovery
- scans UIT servers, etc..
- Cloud Based
- scans Google / Slack / Amazon / and a bunch more
- lots of rules for common API keys and tokens for cloud resources keys...
- How does it work?
- pattern matching with scoring
- If score is high, can automate owner notification
- Exact data matching
- matching MRNs next to data...
- Tool is paid for - if others want to use it, contact UIT for information
- INTERESTED IN PROVEPOINT - CONTACT UIT!
Other tools:
- Spirion (formerly Identity Finder)
- Similar to Prove Point
- On hit, encrypts and leaves text pointer with instructions.
Questions / Discussions:
- Do any tools offer API for incorporation into other tools?
- Doesn't seem like it.
- Working with students is a challenge as they are more likely to share sensitive information
- Common issues are employees placing personal information on Stanford Google Drive (such as tax documents) and then sharing outside of Stanford (e.g. accountant)
- Are there tools for finding faces to blur/fade out prior to sharing?
- Not sure...
- must be some ML tools out there for this purpose
- Is it possible to create 'guard rails' to encourage people
- goal of ProvePoint
- Dashlane can help get around post-it notes for passwords
- no audit data, no SSO, no expiration on shares
- Image scanning for text?
- On roadmap for ProvePoint
- DICOM images have been largely scrubbed for identifiers as part of 2019 project in ResearchIT (TDS)
- Google has multiple ML models for this
Stanford Libraries High Risk scanning tools:
- BitCurator
- Bulk Exactor
- multithreaded / customizable / pretty powerful / limited gui
- AccessData Forensic Toolkit
- Epadd
Tools the Libraries want to Test:
Record your name and contact info HERE if your interested in a 1.5 hour tour of the Libraries Digital Archeology Lab:
- Name, email, slack.
Year
2019