VisionMonoSemanticity

A repository trying to implement Anthropic monosemantic interpretability approach in language transformers but for vision models. 1L sparse autoencoder implementation inspired by Neel Nanda's work.

Report

An early report about this project can be found here

Using the code

This code is still in development and is not yet ready to be used. However, if you want to try it, you can clone the repository. You should modify (at least):

train.py to point to where you store ImageNet
model.py if you want to use other models

TODOs

Add a requirements.txt
Transform that in a package for ease of use
Ensure that the code would function for other models
Add a demo notebook
Manage callbacks for checkpointing in a specific directory (different from the logging directory)
Clarify where configs files should be stored and how they should be retrieved

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
classes		classes
wordnet_data		wordnet_data
.gitignore		.gitignore
README.md		README.md
analysis.ipynb		analysis.ipynb
dictionnary_learning.ipynb		dictionnary_learning.ipynb
model.py		model.py
train.py		train.py
utils.txt		utils.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VisionMonoSemanticity

Report

Using the code

TODOs

About

Releases

Packages

Languages

david-heurtel-depeiges/VisionMonoSemanticity

Folders and files

Latest commit

History

Repository files navigation

VisionMonoSemanticity

Report

Using the code

TODOs

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages