Countless tools exist to create a google doc from markdown. A few tools exist to edit google docs - but they are very limited. Either they replace the entire google doc - losing comments, formatting, images, colours or anything else that you added that markdown doesn't support. Or they are so limited that they only allow find-and-replace, or perhaps limited operations.
The core problem is that google docs' only provides a single API called batchUpdate. This is really painful to use - because the underlying model uses indexes - and it is very painful to keep track of indexes as you update the google docs.
Tools like gws and gogcli technically support batchUpdate. But because the underyling API is so painful - the agent / LLM cannot do anything more than basic updates.
My team uses google docs extensively. And these documents have their own life. Team members leave comments / feedback. Others perhaps apply formatting. Different people contribute to the document - that is the whole point of using google docs. In such a scenario, the agent must be able to play along well - it cannot mess up the document for others.
`extrasuite docs pull <url> <folder>` downloads the google docs and saves it as markdown files, one per tab in the google doc. It also saves a `comments.xml` - so that the agent can address your comments / feedback.
The agent then edits the markdown files. No special instructions - it knows how to edit markdown files. These can be simple edits, complex rewrites, adding new files, creating tables, adding images or anything that github flavoured markdown allows.
`extrasuite docs push <folder>` - then figures out what the agent changed by comparing locally. This is markdown to markdown diff. Then it figures out how to appy those exact changes to google docs and ultimately reconciles the google doc to match the markdown
ExtraSuite makes a few guarantees:
- It won't mess up formatting, images, headers/footers, styles, colours etc. They will continue to work properly
- In general, anything that cannot be represented in markdown won't be touched in the write process.
ExtraSuite has several other features:
- Leave comments in the document and ask your agent to work on them.
- Full support for tables, images, code blocks, and github flavoured markdown.
If you have either of gws or gogcli installed, you can directly start using extrasuite without any setup.
Our teams have been using this for several months now, and happy to answer any questions about it. Would love your feedback!
It handles it gracefully. The tool cannot edit them, but it will ensure it does not mess them up.
Some examples:
- Table of contents can't be updated via the API. But the tool will ensure that the ToC doesn't disappear
- Styling can't be represented in markdown. That's fine - new content in the same paragraph will inherit the styles. The tool won't change the styles.
The core approach - don't mess with anything that isn't represented in markdown.
The way it works: we take the original google document and only apply the specific changes that the author intended to make. So this ensures we don't change anything else.
Related. We have several third party web apps in use. These apps don't expose a public api, but they are all single page web apps. We wanted to connect claude code to these web apps for our limited use case.
We opened chrome, navigated the entire website, the downloaded the network tab as an har file. The asked claude to analyze and document the apis as an openapi json. Worked amazing.
Next step - we wrote a small python script. On one side, this script implements stdio mcp. On the other side, it calls the Internal apis exposed by the 3rd party app. Only thing missing is the auth headers..
This is the best part. When claude connects to the mcp, the mcp launches a playwright controlled browser and opens the target web apication. It detects if the user is logged in. Then it extracts the auth credentials using playwright, saves them to a local cache file and closes the browser. Then it accesses the apis directly - no browser needed thereafter.
In about an hour worth of tokens with claude, we get a mcp server that works locally with each users credentials in a fairly reliable manner. We have been able to get this working in otherwise locked down corporate environments.
Super cool. I think this is where most automation is heading . Would be curious if you could one-shot the auth flow using Kampala and completely ditch the browser. Also FWIW you can import HAR into Kampala and we have a few nice tools (like being able to a/b test payloads/replay requests) that meaningfully reduce integration time.
5 years ago I used a similar approach for one of GCPs internal APIs (I think they've since released a public API that covers the use case I had). Was a bit of a pain to do manually, so it's cool to see how trivial this has become for models now.
I am working on a way to edit google docs using markdown. Many tools exist to convert google docs to markdown and to import markdown to google docs - but none of them make in place edits.
The core logic is to convert the google docs to md. The user then edits the md. Then diff the markdown files, and apply the changes back to the source google docs. This way, features not represented in markdown do not get overwritten.
Lots of effort has gone into testing against real world docs. Its beta quality right now.
I am working on a declarative CLI for google docs/sheets/slides etc. The general idea is a "pull" command that converts the proprietary document into a local files like tsv or xml. The agent (claude code) then simply edits these files in place and calls "push". The library then figures out the diff and applies only the diff, taking care to preserve formatting and comments.
The hypothesis is that llms are better off getting the "big picture" by reading local files. They can then spend tokens to edit the document as per the business needs rather than spending tokens to figure out how to edit the document.
Another aspect is the security model. Extrasuite assigns a permission-less service account per employee. The agent gets this service account to make API calls. This means the agent only gets access to documents explicitly shared with it, and any changes it makes show up in version history separate from the user's changes.
It provides a git like pull/push workflow to edit sheets/docs/slides. `pull` converts the google file into a local folder with agent friendly files. For example, a google sheet becomes a folder with a .tsv, a formula.json and so on. The agent simply edits these files and `push`es the changes. Similarly, a google doc becomes an XML file that is pure content. The agent edits it and calls push - the tool figures out the right batchUpdate API calls to bring the document in sync.
None of the existing tools allow you to edit documents. Invoking batchUpdate directly is error prone and token inefficient. Extrasuite solves these issues.
In addition, Extrasuite also uses a unique service token that is 1:1 mapped to the user. This means that edits show up as "Alice's agent" in google drive version history. This is secure - agents can only access the specific files or folders you explicitly share with the agent.
This is still very much alpha - but we have been using this internally for our 100 member team. Google sheets, docs, forms and app scripts work great - all using the same pull/push metaphor. Google slides needs some work.
We have been using something similar for editing Confluence pages. Download XML, edit, upload. It is very effective, much better than direct edit commands. It’s a great pattern.
You can use the Copilot CLI with the atlassian mcp to super easily edit/create confluence pages. After having the agent complete a meaningful amount of work, I have it go create a confluence page documenting what has been done. Super useful.
I'm afraid I can't easily share this, as we have embedded a lot of company-specific information in our setup, particularly for cross-linking between confluence/jira/zendesk and other systems. I can try explain it though, and then Claude Code is great at implementing these simple CLI tools and writing the skills.
We wrote CLIs for Confluence, Jira, and Zendesk, with skills to match. We use a simple OAuth flow for users to login (e.g., they would run jira login). Then confluence/jira/zendesk each have REST APIs to query pages/issues/tickets and submit changes, which is what our CLIs would use. Claude Code was exceptional at finding the documentation for these and implementing them. Only took a couple days to set these up and Claude Code is now remarkably good at loading the skills and using the CLIs. We use the skills to embed a lot of domain-specific information about projects, organisation of pages, conventions, standard workflows, etc.
Being able to embed company-specific links between services has been remarkably useful. For example, we look for specific patterns in pages like AIT-553 or zd124132 and then can provide richer cross-links to Jira or Zendesk that help agents navigate between services. This has made agents really efficient at finding information, and it makes them much more likely to actually read from multiple systems. Before we made changes like this, they would often rabbit-hole only looking at confluence pages, or only looking at jira issues, even when there was a lot of very relevant information in other systems.
My favourite is the confluence integration though, as I like to record a lot of worklog-style information in there that I would previously write down as markdown files. It's nicer to have these in Confluence as then they are accessible no matter what repo I am working in, what region I am working in, or what branch or feature I'm working on. I've been meaning to try to set something similar up for my personal projects using the new Obsidian CLI.
We have been doing something similar but it sounds like you have come further along this way of working. We (with help from Claude) have built a similar tool that you describe to interface with our task- and project management system, and use it together with the Gitlab and Github CLI tools to allow agents to read tickets, formulate a plan and create solutions and create MR/PR to the relevant repos. For most of our knowledge base we use Markdown but some of it is tied up in Confluence, that's why I have an interest in that part. And, some is even in workflows are in Google Docs which makes the OP tool interesting as well -- currently our tool output Markdown and we just "paste from markdown" into Gdocs. We might be able to revise and improve that too.
Thank you! Sounds like a fantastic setup. Are the claude code agents acting autonomously from any trigger conditions or is this all manual work with them? And how do you manage write permissions for documents amongst team members/agents, presumably multiple people have access to this system?
(Not OP, but have been looking into setting up a system for a similar use case)
This is all manual, so people ask their agent to load Jira issues, edit Confluence pages, etc. Users sign-in using their own accounts using the CLIs, so the agents inherit their own permissions. Then we have the permissions in Claude Code setup so any write commands are in Ask, so it always prompts the user if it wants to run them.
Excellent project! I see that the agent modifies the google docs using an interesting technique: convert doc to html, AI operates over the HTML and then diff the original html with ai-modified html, send the diff as batchUpdate to gdocs.
IMO, this is a better approach than the one used by Anthropic docx editing skill.
1. Did you compare this one with other document editing agents? Did you have any other ideas on how to make AI see and make edits to documents?
2. What happens if the document is a big book? How do you manage context when loading big documents?
PS:I'm working on an AI agent for Zoho Writer(gdocs alternative) and I've landed on a similar html based approach. The difference is I ask the AI to use my minimal commands (addnode, replacenode, removenode) to operate over the HTML and convert them into ops.
re. comparing with other editing agents - actually, I didn't find any that could work with google docs. Many workflows were basically "replace the whole document" - and that was a non-starter.
re. what happens if its a big book - each "tab" in the google doc is a folder with its own document.xml. A top-level index.xml captures the table of contents across tabs. The agent reads index.xml and then decides what else to read. I am now improving this by giving it xpath expressions so it can directly pick the specific sections of interest.
Philosophically, we wanted "declarative" instead of "imperative". Our key design - the agent needs to "think" in terms of the business, and not worry about how to edit the document. We move all the reconcilliation logic in the library, and free the agent from worrying about the google doc. Same approach in other libraries as well.
Struggling with the same issues with junior developers. I've been asking for an implementation plan and iterating on it. Typical workflow is to commit the implementation plan and review it as part of a pr. It takes 2-3 iterations to get right. Then the developer asks claude code to implement the based on the markdown. I've seen good results with this.
Another thing I do is ask for the claude session log file. The inputs and thought they provided to claude give me a lot more insight than the output of claude. Quite often I am able to correct the thought process when I know how they are thinking. I've found junior developers treat claude like a sms - small ambiguous messages with very little context, hoping it would perform magic. By reviewing the claude session file, I try to fix this superficial prompting behaviour.
And third, I've realized claude works best of the code itself is structured well and has tests, tools to debug and documentation. So I spend more time on tooling so that claude can use these tools to investigate issues, write tests and iterate faster.
Still a far way to go, but this seems promising right now.
Take a look at JinjaSQL (https://github.com/hashedin/jinjasql). Supports conditional where clauses as well as in statements. It uses Jinja templates, so you get a complete template language to create the queries.
The core problem is that google docs' only provides a single API called batchUpdate. This is really painful to use - because the underlying model uses indexes - and it is very painful to keep track of indexes as you update the google docs.
Tools like gws and gogcli technically support batchUpdate. But because the underyling API is so painful - the agent / LLM cannot do anything more than basic updates.
My team uses google docs extensively. And these documents have their own life. Team members leave comments / feedback. Others perhaps apply formatting. Different people contribute to the document - that is the whole point of using google docs. In such a scenario, the agent must be able to play along well - it cannot mess up the document for others.
To solve these problems, I have been working on a CLI tool - ExtraSuite. See https://github.com/think41/extrasuite
The core workflow is just two commands:
`extrasuite docs pull <url> <folder>` downloads the google docs and saves it as markdown files, one per tab in the google doc. It also saves a `comments.xml` - so that the agent can address your comments / feedback.
The agent then edits the markdown files. No special instructions - it knows how to edit markdown files. These can be simple edits, complex rewrites, adding new files, creating tables, adding images or anything that github flavoured markdown allows.
`extrasuite docs push <folder>` - then figures out what the agent changed by comparing locally. This is markdown to markdown diff. Then it figures out how to appy those exact changes to google docs and ultimately reconciles the google doc to match the markdown
ExtraSuite makes a few guarantees:
- It won't mess up formatting, images, headers/footers, styles, colours etc. They will continue to work properly - In general, anything that cannot be represented in markdown won't be touched in the write process.
ExtraSuite has several other features: - Leave comments in the document and ask your agent to work on them. - Full support for tables, images, code blocks, and github flavoured markdown.
If you have either of gws or gogcli installed, you can directly start using extrasuite without any setup.
Our teams have been using this for several months now, and happy to answer any questions about it. Would love your feedback!
reply