init

2024-10-30 11:59:30 -04:00
commit 17031d8be8
8 changed files with 342 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -0,0 +1,136 @@
+# code-tokenizer-md
+
+Process git repository files into markdown with token counting and sensitive data redaction.
+
+## Overview
+
+`code-tokenizer-md` is a Node.js tool that processes git repository files, cleans code, redacts sensitive information, and generates markdown documentation with token counts.
+
+```mermaid
+graph TD
+   Start[Start] -->|Read| Git[Git Files]
+   Git -->|Clean| TC[TokenCleaner]
+   TC -->|Redact| Clean[Clean Code]
+   Clean -->|Generate| MD[Markdown]
+   MD -->|Count| Results[Token Counts]
+   style Start fill:#000000,stroke:#FFFFFF,stroke-width:4px,color:#ffffff
+   style Git fill:#222222,stroke:#FFFFFF,stroke-width:2px,color:#ffffff
+   style TC fill:#333333,stroke:#FFFFFF,stroke-width:2px,color:#ffffff
+   style Clean fill:#444444,stroke:#FFFFFF,stroke-width:2px,color:#ffffff
+   style MD fill:#555555,stroke:#FFFFFF,stroke-width:2px,color:#ffffff
+   style Results fill:#666666,stroke:#FFFFFF,stroke-width:2px,color:#ffffff
+```
+
+## Features
+
+### Data Processing
+- Reads files from git repository
+- Removes comments and unnecessary whitespace
+- Redacts sensitive information (API keys, tokens, etc.)
+- Counts tokens using llama3-tokenizer
+
+### Analysis Types
+- Token counting per file
+- Total token usage
+- File content analysis
+- Sensitive data detection
+
+### Data Presentation
+- Markdown formatted output
+- Code block formatting
+- Token count summaries
+- File organization hierarchy
+
+## Requirements
+
+- Node.js (>=14.0.0)
+- Git repository
+- npm or npx
+
+## Installation
+
+```shell
+npm install -g code-tokenizer-md
+```
+
+## Usage
+
+### Quick Start
+
+```shell
+npx code-tokenizer-md
+```
+
+### Programmatic Usage
+
+```javascript
+import { MarkdownGenerator } from 'code-tokenizer-md';
+
+const generator = new MarkdownGenerator({
+  dir: './project',
+  outputFilePath: './output.md'
+});
+
+const result = await generator.createMarkdownDocument();
+```
+
+## Project Structure
+
+```
+src/
+├── index.js              # Main exports
+├── TokenCleaner.js       # Code cleaning and redaction
+├── MarkdownGenerator.js  # Markdown generation logic
+└── cli.js               # CLI implementation
+```
+
+## Dependencies
+
+```json
+{
+  "dependencies": {
+    "llama3-tokenizer-js": "^1.0.0"
+  },
+  "peerDependencies": {
+    "node": ">=14.0.0"
+  }
+}
+```
+
+## Extending
+
+### Adding Custom Patterns
+
+```javascript
+const generator = new MarkdownGenerator({
+  customPatterns: [
+    { regex: /TODO:/g, replacement: '' }
+  ],
+  customSecretPatterns: [
+    { regex: /mySecret/g, replacement: '[REDACTED]' }
+  ]
+});
+```
+
+## Contributing
+
+1. Fork the repository
+2. Create a feature branch
+3. Commit your changes
+4. Push to the branch
+5. Open a Pull Request
+
+### Contribution Guidelines
+
+- Follow Node.js best practices
+- Include appropriate error handling
+- Add documentation for new features
+- Include tests for new functionality (this project needs a suite)
+- Update the README for significant changes
+
+## License
+MIT © 2024 Geoff Seemueller
+
+## Note
+
+This tool requires a git repository to function properly.