Privatized data sharing - Practical? Useful?

Sep8 '16

Given databases of phishing information, the LACE2 privacy+learning algorithm, will be applied to increasingly larger datasets and data collected from different sites. This will allow us to test if (a) sharing data from multiple sites enables better early warning for phishing attacks and (b) if such shared data can be privatized before sharing, while still enabling the creation of early warnings about security issues.

If successful, the work could become a “rallying call” to intelligence agencies along the lines of “sharing data should be the norm, and not some rare event” and “look what extra information can be gleaned via sharing data”. In theory, the results from this study would be widely applicable. While phishing will be the case study explored here, LACE2 is a general learning algorithm that can be used for other tasks; e.g. recognition of dangerous “spikes” in enemy email activity, early detection of biological attacks by recognizing anomalous reports of patients with strange viral diseases from multiple hospitals, etc.
That said, there is an open research issue about LACE2. Prior work with this system focused on relatively simpler data (dozen of columns). But phishing data has at least hundreds to thousands of columns e.g. (a) text mining data from the text of phishing emails;
(b) tokenization results of the binaries shared in the phishing attacks; (c) signatures of malware; (d) network meta data; (e) blog text; etc. LACE2 has yet to be tested on such higher dimensional data and, for that test, we request funding from this proposal.