hugging-mapper Documentation#

Hugging-Mapper

A lightweight python tool for easy text similarity scoring using Hugging Face models

PyPI - Python Version

GitHub issues GitHub license GitHub last commit

Table of Contents#

Installation#

pip install hugging-mapper

Features#

Easily compare how similar two pieces of text are
Customizable model selection at initialization
Works with Hugging Face models that create sentence embeddings
Batch scoring for lists of sentence pairs

Usage#

Embedding text using huggingface models

from hugger.mapper import HuggingMapper

# init
# default model_name is 'cambridgeltl/SapBERT-from-PubMedBERT-fulltext'
mapper = HuggingMapper(
    model_name="sentence-transformers/all-MiniLM-L6-v2"
)

# generate embedding
embedding = mapper.embed_text("I hope you'll find this helpful.")

Similarity search of given data

from hugger.mapper import NodeMapper
import pandas as pd

# demo data
data = pd.DataFrame({
    "id": ["node1", "node2", "node3"], 
    "text": ["Disease", "Gene", "Drug"]
})

# generate embeddings for data using (default) huggingface model
node_mapper = NodeMapper(data)

# get most similar 
# threshold 0 returns all data sorted by similarity to the given term
most_similar = node_mapper.get_similar("protein", threshold=0)

# get matching node
node_id, metadata = node_mapper.get_match("genetics", threshold=0.7)

Documentation#

Tutorials and documentation are available on Read the Docs :)

License#

This project is licensed under the MIT License.

Tutorials

Reference

hugger package

hugging-mapper Documentation

Contents