init
This commit is contained in:
136
README.md
Normal file
136
README.md
Normal file
@@ -0,0 +1,136 @@
|
||||
# code-tokenizer-md
|
||||
|
||||
Process git repository files into markdown with token counting and sensitive data redaction.
|
||||
|
||||
## Overview
|
||||
|
||||
`code-tokenizer-md` is a Node.js tool that processes git repository files, cleans code, redacts sensitive information, and generates markdown documentation with token counts.
|
||||
|
||||
```mermaid
|
||||
graph TD
|
||||
Start[Start] -->|Read| Git[Git Files]
|
||||
Git -->|Clean| TC[TokenCleaner]
|
||||
TC -->|Redact| Clean[Clean Code]
|
||||
Clean -->|Generate| MD[Markdown]
|
||||
MD -->|Count| Results[Token Counts]
|
||||
style Start fill:#000000,stroke:#FFFFFF,stroke-width:4px,color:#ffffff
|
||||
style Git fill:#222222,stroke:#FFFFFF,stroke-width:2px,color:#ffffff
|
||||
style TC fill:#333333,stroke:#FFFFFF,stroke-width:2px,color:#ffffff
|
||||
style Clean fill:#444444,stroke:#FFFFFF,stroke-width:2px,color:#ffffff
|
||||
style MD fill:#555555,stroke:#FFFFFF,stroke-width:2px,color:#ffffff
|
||||
style Results fill:#666666,stroke:#FFFFFF,stroke-width:2px,color:#ffffff
|
||||
```
|
||||
|
||||
## Features
|
||||
|
||||
### Data Processing
|
||||
- Reads files from git repository
|
||||
- Removes comments and unnecessary whitespace
|
||||
- Redacts sensitive information (API keys, tokens, etc.)
|
||||
- Counts tokens using llama3-tokenizer
|
||||
|
||||
### Analysis Types
|
||||
- Token counting per file
|
||||
- Total token usage
|
||||
- File content analysis
|
||||
- Sensitive data detection
|
||||
|
||||
### Data Presentation
|
||||
- Markdown formatted output
|
||||
- Code block formatting
|
||||
- Token count summaries
|
||||
- File organization hierarchy
|
||||
|
||||
## Requirements
|
||||
|
||||
- Node.js (>=14.0.0)
|
||||
- Git repository
|
||||
- npm or npx
|
||||
|
||||
## Installation
|
||||
|
||||
```shell
|
||||
npm install -g code-tokenizer-md
|
||||
```
|
||||
|
||||
## Usage
|
||||
|
||||
### Quick Start
|
||||
|
||||
```shell
|
||||
npx code-tokenizer-md
|
||||
```
|
||||
|
||||
### Programmatic Usage
|
||||
|
||||
```javascript
|
||||
import { MarkdownGenerator } from 'code-tokenizer-md';
|
||||
|
||||
const generator = new MarkdownGenerator({
|
||||
dir: './project',
|
||||
outputFilePath: './output.md'
|
||||
});
|
||||
|
||||
const result = await generator.createMarkdownDocument();
|
||||
```
|
||||
|
||||
## Project Structure
|
||||
|
||||
```
|
||||
src/
|
||||
├── index.js # Main exports
|
||||
├── TokenCleaner.js # Code cleaning and redaction
|
||||
├── MarkdownGenerator.js # Markdown generation logic
|
||||
└── cli.js # CLI implementation
|
||||
```
|
||||
|
||||
## Dependencies
|
||||
|
||||
```json
|
||||
{
|
||||
"dependencies": {
|
||||
"llama3-tokenizer-js": "^1.0.0"
|
||||
},
|
||||
"peerDependencies": {
|
||||
"node": ">=14.0.0"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Extending
|
||||
|
||||
### Adding Custom Patterns
|
||||
|
||||
```javascript
|
||||
const generator = new MarkdownGenerator({
|
||||
customPatterns: [
|
||||
{ regex: /TODO:/g, replacement: '' }
|
||||
],
|
||||
customSecretPatterns: [
|
||||
{ regex: /mySecret/g, replacement: '[REDACTED]' }
|
||||
]
|
||||
});
|
||||
```
|
||||
|
||||
## Contributing
|
||||
|
||||
1. Fork the repository
|
||||
2. Create a feature branch
|
||||
3. Commit your changes
|
||||
4. Push to the branch
|
||||
5. Open a Pull Request
|
||||
|
||||
### Contribution Guidelines
|
||||
|
||||
- Follow Node.js best practices
|
||||
- Include appropriate error handling
|
||||
- Add documentation for new features
|
||||
- Include tests for new functionality (this project needs a suite)
|
||||
- Update the README for significant changes
|
||||
|
||||
## License
|
||||
MIT © 2024 Geoff Seemueller
|
||||
|
||||
## Note
|
||||
|
||||
This tool requires a git repository to function properly.
|
Reference in New Issue
Block a user