Estimating weight sharing in multi-task networks via approximate Fisher information

Rich Harang

While deep neural networks provide high levels of performance in detecting malicious files, and do so with considerable space and time savings over conventional signature-based approaches, the size of a network on disk is not negligible. One potential solution is grouping related tasks with similar features into a single network, in hopes that weight-sharing will allow for a deployed network smaller than two individual task-specific networks. While the performance of these joint networks is straightforward to evaluate, the degree to which weight sharing is taking place is often less so. We explore the use of a simple approximation to the Fisher information measure that allows us to evaluate the degree to which such a network exploits redundancies in the representation across different layers. We also investigate the use of this measure in "right-sizing" of models, and suggest avenues for further research in light of recent work on progressive learning in networks.