Files
c4c-download/CLAUDE.md
afei A 84273a765e feat: Add multi-threaded concurrent download support
- Add ThreadPoolExecutor for parallel attachment downloads
- Add --max-workers parameter to control concurrency (default: 5)
- Implement thread-safe logging with Lock mechanism
- Refactor _do_download to use concurrent.futures
- Add _download_single_file and _download_single_link helper functions
- Update CLAUDE.md with multi-threading documentation

Performance improvements:
- File attachments (OData) now download in parallel
- Link attachments (Scrapling) now download in parallel
- Configurable worker threads for different network conditions
2026-03-12 13:01:13 +08:00

201 lines
7.1 KiB
Markdown

# CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
## Project Overview
This is a SAP C4C (Cloud for Customer) attachment downloader toolkit that retrieves attachments from ServiceRequest tickets and optionally uploads them to Synology DSM NAS. The project consists of:
- **Python script** (`sap-c4c-AttachmentFolder.py`): Core downloader using OData APIs and web scraping
- **Java wrapper** (`C4CAttachmentDownloader.java`): Java interface that calls the Python script via ProcessBuilder
- **DSM upload script** (`dsm-upload.py`): Standalone Synology NAS upload utility
## Architecture
### Python Script (`sap-c4c-AttachmentFolder.py`)
**Core functionality:**
1. Authenticates to SAP C4C using Basic Auth
2. Fetches ServiceRequest attachments via OData endpoints:
- `/sap/c4c/odata/v1/c4codata` - Standard C4C OData API
- `/sap/c4c/odata/cust/v1/custticketapi` - Custom ticket API
3. Downloads two types of attachments using **multi-threaded concurrent downloads**:
- **File attachments** (CategoryCode=2): Downloaded via OData `$value` endpoint
- **Link attachments** (CategoryCode=3): External Salesforce links scraped using Scrapling + Playwright
4. Handles XIssueItem-level attachments via `BO_XSRIssueItemAttachmentFolder`
5. Optionally uploads downloaded files to Synology DSM via FileStation API
**Key dependencies:**
- `requests` - HTTP client for OData/REST APIs
- `scrapling[all]` - Web scraping framework with stealth capabilities
- `playwright` - Browser automation for downloading Salesforce attachments
**Performance features:**
- Multi-threaded concurrent downloads (default: 5 threads, configurable via `--max-workers`)
- Thread-safe output logging with lock mechanism
- Parallel processing of both file and link attachments
**Output modes:**
- Human-readable console output (default)
- JSON mode (`--json`) for programmatic consumption
### Java Wrapper (`C4CAttachmentDownloader.java`)
Provides a type-safe Java API that:
- Invokes the Python script via `ProcessBuilder`
- Passes credentials via environment variables (more secure than CLI args)
- Parses JSON output into strongly-typed Java objects
- Supports timeout configuration (default: 30 minutes)
**Key classes:**
- `Result` - Top-level response containing all attachment metadata
- `Attachment` - Individual attachment metadata (UUID, filename, MIME type, category)
- `IssueItem` - XIssueItem with nested attachments
- `DownloadedFile` - Download result with local path and error info
- `DsmUploadEntry` - DSM upload result per file
### DSM Upload (`dsm-upload.py`)
Standalone script demonstrating Synology FileStation API usage:
1. Login via `SYNO.API.Auth` to obtain SID
2. Upload files via `SYNO.FileStation.Upload` with SID cookie
## Common Commands
### Python Script
```bash
# Install dependencies
pip install requests scrapling[all] playwright
python -m playwright install chromium
# Download attachments (credentials via CLI)
python sap-c4c-AttachmentFolder.py \
--tenant https://xxx.c4c.saphybriscloud.cn \
--user admin \
--password xxx \
--ticket 24588
# Download with custom thread count (default: 5)
python sap-c4c-AttachmentFolder.py \
--tenant https://xxx.c4c.saphybriscloud.cn \
--user admin \
--password xxx \
--ticket 24588 \
--max-workers 10
# Download with DSM upload
python sap-c4c-AttachmentFolder.py \
--tenant https://xxx.c4c.saphybriscloud.cn \
--user admin \
--password xxx \
--ticket 24588 \
--dsm-url http://10.0.10.235:5000 \
--dsm-user PLM \
--dsm-password 123456 \
--dsm-path /Newgonow/AU-SPFJ
# JSON mode (for Java/programmatic use)
python sap-c4c-AttachmentFolder.py --ticket 24588 --json
# List attachments only (no download)
python sap-c4c-AttachmentFolder.py --ticket 24588 --list-only
# Using environment variables for credentials
export C4C_TENANT=https://xxx.c4c.saphybriscloud.cn
export C4C_USERNAME=admin
export C4C_PASSWORD=xxx
export DSM_URL=http://10.0.10.235:5000
export DSM_USERNAME=PLM
export DSM_PASSWORD=123456
export DSM_PATH=/Newgonow/AU-SPFJ
python sap-c4c-AttachmentFolder.py --ticket 24588 --json
```
### Java Wrapper
```java
// Compile (requires Jackson for JSON parsing)
javac -cp jackson-databind.jar:jackson-core.jar:jackson-annotations.jar C4CAttachmentDownloader.java
// Basic usage
C4CAttachmentDownloader downloader = new C4CAttachmentDownloader(
"/path/to/sap-c4c-AttachmentFolder.py",
"https://xxx.c4c.saphybriscloud.cn",
"admin",
"password"
);
// List attachments only
C4CAttachmentDownloader.Result result = downloader.listAttachments("24588");
// Download to default directory
C4CAttachmentDownloader.Result result = downloader.download("24588");
// Download to specific directory
C4CAttachmentDownloader.Result result = downloader.download("24588", "/tmp/ticket_24588");
// Download with DSM upload
downloader.setDsmConfig("http://10.0.10.235:5000", "PLM", "123456", "/Newgonow/AU-SPFJ");
C4CAttachmentDownloader.Result result = downloader.download("24588", "/tmp/ticket_24588");
```
## Key Implementation Details
### Attachment Categories
SAP C4C uses `CategoryCode` to distinguish attachment types:
- **"2"** = File attachment (binary content stored in C4C, downloaded via OData `$value`)
- **"3"** = Link attachment (external URL, typically Salesforce links requiring web scraping)
### OData Navigation Paths
**ServiceRequest attachments:**
```
/ServiceRequestCollection('{ObjectID}')/ServiceRequestAttachmentFolder
```
**XIssueItem attachments (two-step navigation):**
```
1. /BO_XSRIssueItemAttachmentCollection?$filter=XIssueItemUUID eq guid'{uuid}'
2. /BO_XSRIssueItemAttachmentCollection('{ObjectID}')/BO_XSRIssueItemAttachmentFolder
```
### Scrapling Download Strategy
For CategoryCode=3 (link attachments), the script:
1. Opens the Salesforce link in a headless Chromium browser
2. Waits for `button.downloadbutton[title='Download']` selector
3. Clicks the button and captures the download
4. Saves with original or suggested filename
### Security Considerations
- Java wrapper passes credentials via **environment variables** (not CLI args) to avoid exposure in process lists
- Python script supports both CLI args and environment variables
- DSM API uses session-based authentication (SID cookie)
- SSL verification disabled (`verify=False`) - consider enabling in production
## File Structure
```
.
├── C4CAttachmentDownloader.java # Java wrapper with typed API
├── sap-c4c-AttachmentFolder.py # Core Python downloader
├── dsm-upload.py # Standalone DSM upload example
└── downloads/ # Default output directory
```
## Troubleshooting
**Playwright not installed:**
```bash
python -m playwright install chromium
```
**Timeout errors:** Increase timeout in Java wrapper constructor (default 30 minutes) or adjust Scrapling timeout parameters.
**DSM upload fails:** Verify DSM URL, credentials, and that target path exists or `create_parents=true` is set.
**Link download fails:** Check that Salesforce page structure matches expected selector (`button.downloadbutton[title='Download']`). Update `download_link_via_scrapling()` if page structure changes.