CSV/Excel Data Wrangler Skill
Clean, filter, join, pivot, and export CSV/XLSX data reliably with reproducible steps. Transform messy spreadsheets into production-ready datasets with pandas. Handle encoding issues, data type conversion, missing values, duplicates, and complex merges.
Open the source and read safety notes before installing.
Prerequisites
- Python 3.11+
- pandas
- openpyxl
- pyarrow (optional for Parquet)
- File system read/write access for input CSV/Excel files and output processed files
- Sufficient memory for data processing (minimum 2GB RAM recommended for large files, use chunked processing for files >100MB)
Schema details
- Install type
- package
- Reading time
- 4 min
- Difficulty score
- 94
- Troubleshooting
- Yes
- Breaking changes
- No
- Download URL
- /downloads/skills/csv-excel-data-wrangler.zip
- Package verified
- Yes
- SHA-256
- 7a075dd66f661d1f928dff7924159040bd6ab2c898f782cbb2862ae79658178e
- Skill type
- general
- Skill level
- advanced
- Verification
- draft
- Verified at
- 2025-10-15
| Platform | Support | Install path |
|---|---|---|
| claude-code | Native | .claude/skills/<skill-name>/SKILL.md |
| codex | Native | .agents/skills/<skill-name>/SKILL.md |
| windsurf | Native | .windsurf/skills/<skill-name>/SKILL.md |
| gemini | Native | .gemini/skills/<skill-name>/SKILL.md or .agents/skills/<skill-name>/SKILL.md |
| cursor | Adapter | .cursor/rules/<skill-name>.mdc |
| cli | Manual | AGENTS.md or tool-specific context file |
Full copyable content
import pandas as pd
customers = pd.read_csv('customers.csv', dtype=str)
orders = pd.read_excel('orders.xlsx')
# Normalize and dedupe
customers['email'] = customers['email'].str.strip().str.lower()
customers = customers.drop_duplicates(subset=['email'])
# Join and summarize
df = orders.merge(customers, on='customer_id', how='left')
sales_by_region = df.groupby('region', dropna=False)['total'].sum().reset_index()
sales_by_region.to_excel('sales_by_region.xlsx', index=False)About this resource
What This Skill Enables
Claude can clean, transform, analyze, and merge CSV and Excel files with pandas. Upload messy spreadsheets and get production-ready data pipelines, statistical summaries, and formatted exports.
Compatibility
Native
- Claude Code / Claude: native skill usage via
SKILL.md. - Codex/OpenAI workflows: compatible with Agent Skills-style
SKILL.mdcontent as reusable workflow instructions.
Manual Adaptation
- Gemini CLI: native skill usage via
.gemini/skills/<skill-name>/SKILL.mdor.agents/skills/<skill-name>/SKILL.mdwhere supported. - Cursor: use the generated
.cursor/rules/*.mdcadapter for project rules. - OpenClaw and similar agents: use the same skill content as a reusable prompt/workflow file when native skill import is unavailable.
Prerequisites
Required:
- Claude Pro subscription
- Code Interpreter feature enabled
- CSV or Excel file uploaded to conversation
What Claude handles:
- Installing pandas, openpyxl, and data processing libraries
- Detecting file encodings and formats
- Type inference and conversion
- Memory-efficient processing of large files
How to Use This Skill
Quick Data Cleaning
Prompt: "Clean this CSV file: remove duplicates, fix missing values, standardize column names, and export as clean.csv"
Claude will:
- Load and analyze the file structure
- Identify data quality issues
- Apply cleaning transformations
- Export cleaned version
Data Merging & Joining
Prompt: "Merge customers.csv and orders.csv on customer_id. Show me the combined data and export as customer_orders.xlsx"
Claude will:
- Load both files
- Detect join keys
- Perform the merge (inner/left/right/outer)
- Validate results
- Export formatted Excel file
Data Analysis & Summaries
Prompt: "Analyze this sales data: show me summary statistics, identify top products, calculate monthly trends, and create a pivot table by region."
Claude will:
- Generate descriptive statistics
- Perform aggregations
- Create pivot tables
- Calculate trends
- Present insights
Format Conversion
Prompt: "Convert this Excel workbook to CSV files, one per sheet, with UTF-8 encoding."
Claude will:
- Read all Excel sheets
- Export each as separate CSV
- Handle encoding properly
- Preserve data types where possible
Common Workflows
CRM Data Cleanup
"Clean this customer export:
1. Remove duplicate emails (keep most recent)
2. Standardize phone numbers to (XXX) XXX-XXXX format
3. Fill missing company names with 'Unknown'
4. Split full_name into first_name and last_name
5. Export as customers_clean.xlsx"
Sales Report Generation
"Analyze this sales data:
1. Calculate total revenue by product category
2. Identify top 10 customers by revenue
3. Show month-over-month growth
4. Create a pivot table: rows=salesperson, columns=month, values=revenue
5. Export summary as sales_report.xlsx with formatted numbers"
Data Validation
"Validate this CSV:
1. Check for duplicate IDs
2. Identify rows with missing required fields (name, email, phone)
3. Flag invalid email formats
4. Report data quality issues
5. Export clean rows and error rows separately"
Multi-File Consolidation
"Combine all CSV files I upload into one master file:
1. Ensure columns match (add missing ones)
2. Add a 'source_file' column
3. Remove duplicates across all files
4. Sort by date column
5. Export as consolidated_data.csv"
Tips for Best Results
- Be Specific About Columns: Name the exact columns you want to work with
- Describe Your Data: Mention what each column represents for better context
- Specify Output Format: Tell Claude exactly how you want the result formatted
- Handle Missing Data: Be explicit about how to handle nulls (drop, fill with value, forward-fill, etc.)
- Large Files: For files >100MB, ask Claude to process in chunks or sample first
- Date Formats: Specify your expected date format (MM/DD/YYYY vs DD/MM/YYYY)
- Encoding Issues: If you see garbled text, ask Claude to try different encodings (UTF-8, latin-1, etc.)
Advanced Operations
Complex Transformations
- Unpivoting (melt) wide data to long format
- Creating calculated columns with business logic
- Grouping and aggregating with custom functions
- Handling multi-index data
- Time series resampling and rolling windows
Data Quality Checks
- Outlier detection and reporting
- Referential integrity validation
- Format consistency checks
- Statistical anomaly detection
Troubleshooting
Issue: File encoding errors or garbled characters Solution: Ask Claude to detect encoding or try: "Read this with UTF-8-SIG encoding" or "Try latin-1 encoding"
Issue: Memory errors on large files Solution: "Process this file in 10,000 row chunks" or "Sample 10% of rows first to test"
Issue: Wrong data types (dates as strings, numbers as text) Solution: Be explicit: "Convert created_at column to datetime" or "Cast price to float"
Issue: Merge produces unexpected results Solution: Ask Claude to show sample rows before/after merge and explain the join type used
Issue: Excel export loses formatting Solution: "Export with formatted numbers, bold headers, and auto-column-width"
Learn More
- Pandas Documentation - Comprehensive data manipulation guide
- Excel to Pandas Mapping - Translate Excel operations
- Data Cleaning Best Practices - Common data issues and solutions
- Claude Code Interpreter Guide - How Claude processes data
Features
- Import/export with explicit schema control
- Deduplicate and null-safe transformations
- Join/merge/pivot with predictable results
- Encoding-aware IO with UTF-8/UTF-8-SIG handling
- Parquet round-trips for performance
- Memory-efficient processing for large files (chunksize, Parquet)
- Data validation and quality checks (outlier detection, referential integrity)
- Multi-file consolidation and batch processing
Use Cases
- Clean messy CRM exports
- Join sales and marketing datasets
- Generate analyst-ready summary tables
- Data migration and format conversion workflows
- ETL pipelines for business intelligence
- Automated data quality reporting and validation
Source citations
Signals
Loading live community signals…
A short, calm digest of reviewed Claude resources. Unsubscribe any time.