hugging-mapper Documentation#

img

Hugging-Mapper

A lightweight python tool for easy text similarity scoring using Hugging Face models

PyPI Python application Read the Docs PyPI - Python Version

GitHub issues GitHub license GitHub last commit GitHub stars

Table of Contents#

Installation#

pip install hugging-mapper

Features#

  • Easily compare how similar two pieces of text are

  • Customizable model selection at initialization

  • Works with Hugging Face models that create sentence embeddings

  • Batch scoring for lists of sentence pairs

Usage#

Embedding text using huggingface models

from hugger.mapper import HuggingMapper

# init
# default model_name is 'cambridgeltl/SapBERT-from-PubMedBERT-fulltext'
mapper = HuggingMapper(
    model_name="sentence-transformers/all-MiniLM-L6-v2"
)

# generate embedding
embedding = mapper.embed_text("I hope you'll find this helpful.")

Similarity search of given data

from hugger.mapper import NodeMapper
import pandas as pd

# demo data
data = pd.DataFrame({
    "id": ["node1", "node2", "node3"], 
    "text": ["Disease", "Gene", "Drug"]
})

# generate embeddings for data using (default) huggingface model
node_mapper = NodeMapper(data)

# get most similar 
# threshold 0 returns all data sorted by similarity to the given term
most_similar = node_mapper.get_similar("protein", threshold=0)

# get matching node
node_id, metadata = node_mapper.get_match("genetics", threshold=0.7)

Documentation#

Tutorials and documentation are available on Read the Docs :)

License#

This project is licensed under the MIT License.

Indices and tables#