A skilled virtual assistant for Obsidian.

  • By Paul Bricman
  • Last update: Dec 22, 2022
  • Comments: 17

⚠️ This project is in early alpha. Expect a bunch of rough edges. ⚠️

Dual

Learn more by reading the official write-up.

Installation (currently only available from source)

Download Dual.zip and unzip it in .obsidian/plugins/. Follow the instructions in the plugin settings tab to continue. Arm yourself with patience!

image

ℹ️ After Step 2 is complete, your file structure should look something like:

.. Dual::
.
|-- skeleton
|   |-- conversational_wrapper.py
|   |-- core.py
|   |-- requirements.txt
|   |-- server.py
|   |-- util.py
|-- essence
|   |-- config.json
|   |-- pytorch_model.bin
|   |-- training_args.bin
|-- main.js
|-- manifest.json
|-- ...

ℹ️ If you sync your vault with git, make sure to add the following in .gitignore after the install:

*.bin

Command Samples

Fluid Search

  • Find notes about topic.
  • Search for entries on topic.
  • Look up texts related to topic.

Descriptive Search

  • Find a entry which description.
  • Search for a note that description.
  • Look for a text which description.

Open Dialogue

  • question?

Github

https://github.com/paulbricman/dual-obsidian-client

Comments(17)

  • 1

    torch.embedding IndexError: index out of range in self

    File "...\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\torch\nn\functional.py", line 1916, in embedding return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse) IndexError: index out of range in self

    This happens sometimes on Windows in open dialogue. It only pops up with specific questions. My working hypothesis is that nasty characters in retrieved notes hurdle the generation process.

  • 2

    issue with executing server

    image

    python 3.9 MacOS Catalina 10.15.7

    not sure if this is related, but I couldn't find an essence.zip folder but I looked in the sample vault on the repo and made a copy of the config.json to the same directory on my other machine

    image

  • 3

    Server fail to start

    Console output:

    /Users/user/.obsidian/plugins/dual/skeleton/(master)> python3 server.py --path ~/.obsidian/plugins/Dual/skeleton/
    Loading skeleton...
    Cache file doesn't exist, creating a new one...
    Traceback (most recent call last):
      File "server.py", line 12, in <module>
        cw = ConversationalWrapper(args.path)
      File "/Users/user/.obsidian/plugins/Dual/skeleton/conversational_wrapper.py", line 7, in __init__
        self.core = Core(root_dir)
      File "/Users/user/.obsidian/plugins/Dual/skeleton/core.py", line 25, in __init__
        self.create_cache()
      File "/Users/user/.obsidian/plugins/Dual/skeleton/core.py", line 141, in create_cache
        pickle.dump(self.entries, open(self.cache_address, 'wb'))
    FileNotFoundError: [Errno 2] No such file or directory: '/Users/user/.obsidian/plugins/Dual/skeleton/.obsidian/plugins/Dual/skeleton/cache.pickle'
    
  • 4

    Implement argument parsing in new frontend

    Based on arguments detected in #52, such as *person* or *topic*, the values have to be extracted from the user query using text generation as describer here. Argument names and the query should go into a function, and a dictionary with the proper value attributions should come out.

  • 5

    UnicodeDecodeError in utils.py when running server.py

    (Python 3.8.4 on Windows 10 20H2 (64 bit))

    When I ran "python server.py --path /path/to/vault" inside my vault directory, I received the following output:

    Loading skeleton... Loading essence... Cache file doesn't exist, creating a new one... Traceback (most recent call last): File "server.py", line 12, in cw = ConversationalWrapper(args.path) File "C:\Users\jaden\OneDrive\Documents\Obsidian\obsidian-vault.obsidian\plugins\Dual\skeleton\conversational_wrapper.py", line 7, in init self.core = Core(root_dir) File "C:\Users\jaden\OneDrive\Documents\Obsidian\obsidian-vault.obsidian\plugins\Dual\skeleton\core.py", line 25, in init self.create_cache() File "C:\Users\jaden\OneDrive\Documents\Obsidian\obsidian-vault.obsidian\plugins\Dual\skeleton\core.py", line 135, in create_cache self.entry_contents = [md_to_text( File "C:\Users\jaden\OneDrive\Documents\Obsidian\obsidian-vault.obsidian\plugins\Dual\skeleton\core.py", line 135, in self.entry_contents = [md_to_text( File "C:\Users\jaden\OneDrive\Documents\Obsidian\obsidian-vault.obsidian\plugins\Dual\skeleton\util.py", line 8, in md_to_text content = open(file).read() File "C:\Users\jaden\AppData\Local\Programs\Python\Python38\lib\encodings\cp1252.py", line 23, in decode return codecs.charmap_decode(input,self.errors,decoding_table)[0] UnicodeDecodeError: 'charmap' codec can't decode byte 0x8f in position 4: character maps to

    A quick Google search got me to this StackOverflow question: https://stackoverflow.com/questions/9233027/unicodedecodeerror-charmap-codec-cant-decode-byte-x-in-position-y-character.

    One person suggested specifying the encoding when opening the file. So, inside skeleton/util.py, I changed content = open(file).read() (line 8) to content = open(file, "utf8").read() and this solved the problem. But I thought it was worth mentioning anyway as I didn't see anything in the documentation about file encoding.

  • 6

    Add Action and move source files

    To make it easier to understand for contributors, and generally tidy up a little, these changes move source files into common locations used for developing JS apps/tools.

    This is in preparation for a more formal release system.

  • 7

    Error deriveing the essence

    Google keeps giving me this error:

    ValueError                                Traceback (most recent call last)
    <ipython-input-16-3fe4c666eb07> in <module>()
    ----> 1 output = trainer.train()
    
    3 frames
    /usr/local/lib/python3.7/dist-packages/torch/utils/data/sampler.py in __init__(self, data_source, replacement, num_samples, generator)
       102         if not isinstance(self.num_samples, int) or self.num_samples <= 0:
       103             raise ValueError("num_samples should be a positive integer "
    --> 104                              "value, but got num_samples={}".format(self.num_samples))
       105 
       106     @property
    
    ValueError: num_samples should be a positive integer value, but got num_samples=0
    
  • 8

    Proposal for functionality changes and recipe framework design

    Progress towards solving existing issues and setting up a proper roadmap had been slowed in the past days by the fear of prematurely settling on an architecture and API design given that this space of conversational interfaces over personal knowledge bases is quite unexplored.

    The following describes a suggestion for heavily restructuring the functionality and the codebase, a tentative something in between a spec and a user story.

    Architecture

    Dual is based on two components: the backend and the frontend. The backend is a server which exposes two main endpoints:

    • /extract, which returns entries from one's knowledge base based on a natural language description, with some options
    • /generate, which generates text given a prompt, with some options

    However, the user doesn't usually interact with the endpoints directly. Rather, they use recipes. Recipes tell Dual how to answer certain commands. They can be predefined, user defined, or contributed by some other user. Recipes are simple Markdown files with the following structure:

    ---
    tags: "#dualrecipe"
    pattern: "What is the answer to the ultimate question of life, the universe, and everything?"
    ---
    
    42, naturally.
    

    If the user has this recipe in their vault as a note, then whenever they ask their Dual that question, they'll get the contents of the note as an answer.

    The pattern field of a recipe is a regex pattern. It can also house groups, which can then be referenced in the content.

    ---
    tags: "#dualrecipe"
    pattern: "My name is (.*)"
    ---
    
    Hi there, \1!
    

    With this recipe, if the user tells their Dual My name is John, it'll reply with Hi there, John!.

    All this is cute, but not all that useful or interesting. Among the recipes there's also this predefined recipe:

    ---
    tags: "#dualrecipe"
    pattern: "Find a note which (.*)"
    ---
    
    '''dual
    GET "/extract/This text \1"
    '''
    

    Now, this is good old descriptive search, expressed as a recipe which makes use of the /extract endpoint. When asking Find a note which describes a metaphor between machine learning and sociology, it'll answer with a list of results based on that GET HTTP call made behind the scene to the endpoint.

    But if you wanted to customize the command triggers even for this predefined command, you could just wrap a new recipe around it, or change the original one. Here's a wrapper recipe:

    ---
    tags: "#dualrecipe"
    pattern: "Yo show me a thing which (.*)"
    ---
    
    Here ya go:
    
    '''dual
    ASK "Find a note which \1"
    '''
    

    Cool, you just made your Dual a bit edgier.

    So this is how you can express good old descriptive search and fluid search as recipes. What about good old open dialogue?

    ---
    tags: "#dualrecipe"
    pattern: "^(([Ww]hy|[Ww]hat|[Ww]hen|[Ww]here|[Ww]ho|[Hh]ow).*)"
    ---
    
    '''dual
    GET "/extract/This text is about \1"
    '''
    
    Q: \1
    A:
    
    '''dual
    GET "/generate/"
    '''
    

    Now, when you ask it a question with that structure, Dual assembles the relevant notes in there, composes the prompt further with your query, and then generates the response. Good old open dialogue, but expressed as a recipe. Every command becomes a customizable recipe.

    Now you want to teach your Dual to come up with writing prompts, you create this recipe:

    ---
    tags: "#dualrecipe"
    pattern: "^[Cc]ome up with a writing prompt\.?"
    ---
    
    prompt: A sentient being has landed on your planet and your civilization's military has confronted it at the landing site of its ship. You are sent closer as a mediator and encounter a mass of energy that has no form but communicates with you in your language.
    
    prompt: Your spaceship has landed on an unknown planet and there is data showing lifeforms who have created artistic structures. There is an artist in your group who wants to make first contact with the beings through art.
    
    prompt: We discover that beneath its seemingly uninhabitable appearance, Mars has an entire race of subterranean alien lifeforms living on it. You are part of the team sent to explore this civilization.
    
    prompt: 
    
    '''dual
    GET "/generate/"
    '''
    

    You ask it Come up with a writing prompt and you get some in return.

    Sure, there are technicalities. The note contents until the generate call should be piped into it as the prompt. The endpoints are shorthand for localhost:5000/..., but you could perhaps change them to refer to a hosted instance at some point in the future. You could make calls to other people's instances through recipes. You could tap into any API through a recipe, turning Dual in a sort of conversational hub. Regex groups have to be entered when making calls. URL's have to be encoded properly because they contain text. Extract calls should know if to supply filenames or contents, through parameters probably. What should a recipe return, the entire contents or the result of the last call? Perhaps a metadata setting. A bunch of things still to settle on.

  • 9

    CrossEncoder.py IndexError: list index out of range

    I have this error

    [2021-07-16 20:33:05,137] ERROR in app: Exception on /query/Hello [GET]
    Traceback (most recent call last):
      File "*site-packages/flask/app.py", line 2447, in wsgi_app
        response = self.full_dispatch_request()
      File "*site-packages/flask/app.py", line 1952, in full_dispatch_request
        rv = self.handle_user_exception(e)
      File "*site-packages/flask_cors/extension.py", line 165, in wrapped_function
        return cors_after_request(app.make_response(f(*args, **kwargs)))
      File "*site-packages/flask/app.py", line 1821, in handle_user_exception
        reraise(exc_type, exc_value, tb)
      File "*site-packages/flask/_compat.py", line 39, in reraise
        raise value
      File "*site-packages/flask/app.py", line 1950, in full_dispatch_request
        rv = self.dispatch_request()
      File "*site-packages/flask/app.py", line 1936, in dispatch_request
        return self.view_functions[rule.endpoint](**req.view_args)
      File "*site-packages/flask_cors/decorator.py", line 128, in wrapped_function
        resp = make_response(f(*args, **kwargs))
      File "/vault/.obsidian/plugins/Dual/skeleton/server.py", line 17, in respond_query
        return cw.respond(query)
      File "/vault/.obsidian/plugins/Dual/skeleton/conversational_wrapper.py", line 45, in respond
        'output': self.core.fluid_search(query)
      File "/vault/.obsidian/plugins/Dual/skeleton/core.py", line 42, in fluid_search
        cross_scores = self.pair_encoder.predict([[query, self.entry_contents[hit['corpus_id']]] for hit in hits])
      File "*site-packages/sentence_transformers/cross_encoder/CrossEncoder.py", line 240, in predict
        if isinstance(sentences[0], str):  # Cast an individual sentence to a list with length 1
    IndexError: list index out of range
    

    When I debugging, I found this

    from core import Core c = Core('/vault-path/') print(len(c.entry_contents), len(c.entry_filenames)) 0 0

    The Alignment in Colab works fine and I have my essence folder in Dual path

    Do you know what fails here? Maybe my structure? I have 20k md files.

    I love the use case that Dual brings to Obsidian!

  • 10

    Switch models to GPT-Neo versions

    Similar models but with higher performance as they've been trained on more data. Hopefully they're still fine-tunable in a Colab notebook, at least the medium one.

  • 11

    Bundle skeleton in a self-contained binary

    Not sure what's the best way to go.

    • Docker + PyInstaller + Wine spitting out clean binaries for Linux/Windows sounds somewhat doable.
    • Or somehow turning Docker containers into binaries themselves? Those would be huge.
    • Several users contributing their binaries using PyInstaller on their own OS?
  • 12

    Future of this project?

    Hi @paulbricman,

    Great project - this is exactly what I was looking for in Obsidian 👏 thanks for all your work so far!

    Was curious whether you plan on further developing Dual? E.g., creating an official Obsidian plugin so that other Obsidian developers can help to extend, maintain and further improve Dual.

    Cheers and hope you're well!

  • 13

    Not responses generated by Dual other than "typing". No error in server, even in debug mode.

    Dual looks fantastic! I'd love to try it out. So far, I have been unable to get it to work. I followed the instruction on the readme and made it all the way to starting the server. However, I'm unable to get any response from my prompts. I don't see any error messages, so I'm not sure what to do.

    It looks like there is an error related to a missing header for CORS. I'm not sure what that means exactly, but I hope it's at least somewhat helpful.

    May this page is helpful?

    Any advice?

  • 14

    RuntimeError: CUDA out of memory

    Hey @paulbricman, super cool project! I ran it per instructions, but had issues with the training model:

    image

    Text:

    RuntimeError                              Traceback (most recent call last)
    
    <ipython-input-8-3fe4c666eb07> in <module>()
    ----> 1 output = trainer.train()
    
    10 frames
    
    /usr/local/lib/python3.7/dist-packages/torch/nn/functional.py in cross_entropy(input, target, weight, size_average, ignore_index, reduce, reduction)
       2822     if size_average is not None or reduce is not None:
       2823         reduction = _Reduction.legacy_get_string(size_average, reduce)
    -> 2824     return torch._C._nn.cross_entropy_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index)
       2825 
       2826 
    
    RuntimeError: CUDA out of memory. Tried to allocate 920.00 MiB (GPU 0; 11.17 GiB total capacity; 8.44 GiB already allocated; 377.81 MiB free; 10.30 GiB reserved in total by PyTorch)
    

    I set the settings to medium model (8+gb RAM) and my vault size is listed as "ideal".

    Any thoughts on what I can tweak?

  • 15

    Decoding error

    While running the skeleton with command python3 server.py --path "/Users/ophan/Documents/KTCB/KTCB Research/" getting error: return codecs.charmap_decode(input,self.errors,decoding_table)[0] UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 383: character maps to <undefined>

    I strongly suspect it's due to the fact I have Spanish characters in my notes. I really don't want to remove them, nor do I particularly want to go back through the previous steps. Is there a way I can get it to decode my Spanish characters? Characters used: ¿¡áéíóúñÁÉÍÓÚ

  • 16

    ModuleNotFoundError: no module named flask

    Hey there, when I run the command 'python3 server.py --path [path to skeleton]', it comes up with the error Traceback (most recent call last): File "C:\Users\Owner\iCloudDrive\iCloud~md~obsidian\Orangeo.obsidian\plugins\Dual\skeleton\server.py", line 1, in from flask import Flask ModuleNotFoundError: No module named 'flask'

    Anyone know how to fix this?

  • 17

    Unable to "Start Alignment"

    Hi! I just saw Obsidian plug-in in a Twitter thread and wanted to give it a try. I correctly downloaded/extracted and copied the plugin in .obsidian/my_vault_name/plugin, and I can see Dual settings in my Obsidian option panel. I downloaded/installed Python as per instructions. I created and copied the snapshot of my vault (a not-so-big test vault). But as I try to "Start Alignment" I can't open the webpage linked to the button. Google says I don't have the permissions...

    Running macOS 11.5.1 on a MacBook Air 2020 (not M1). I tried with Brave browser, Google Chrome and Safari, all logged in with my Gmail account: no way. I even tried with a Chrome private session: same result.

    What am I doing wrong? Thanks!!!