Productivity

clean-data-xls

anthropics/financial-services-plugins · updated Apr 8, 2026

$npx skills add https://github.com/anthropics/financial-services-plugins --skill clean-data-xls
summary

Clean messy data in the active sheet or a specified range.

skill.md

Clean Data

Clean messy data in the active sheet or a specified range.

Environment

  • If running inside Excel (Office Add-in / Office JS): Use Office JS directly (Excel.run(async (context) => {...})). Read via range.values, write helper-column formulas via range.formulas = [["=TRIM(A2)"]]. The in-place vs helper-column decision still applies.
  • If operating on a standalone .xlsx file: Use Python/openpyxl.

Workflow

Step 1: Scope

  • If a range is given (e.g. A1:F200), use it
  • Otherwise use the full used range of the active sheet
  • Profile each column: detect its dominant type (text / number / date) and identify outliers

Step 2: Detect issues

Issue What to look for
Whitespace leading/trailing spaces, double spaces
Casing inconsistent casing in categorical columns (usa / USA / Usa)
Number-as-text numeric values stored as text; stray $, ,, % in number cells
Dates mixed formats in the same column (3/8/26, 2026-03-08, March 8 2026)
Duplicates exact-duplicate rows and near-duplicates (case/whitespace differences)
Blanks empty cells in otherwise-populated columns
Mixed types a column that's 98% numbers but has 3 text entries
Encoding mojibake (é, ’), non-printing characters
Errors #REF!, #N/A, #VALUE!, #DIV/0!

Step 3: Propose fixes

Show a summary table before changing anything:

Column Issue Count Proposed Fix

Step 4: Apply

  • Prefer formulas over hardcoded cleaned values — where the cleaned output can be expressed as a formula (e.g. =TRIM(A2), =VALUE(SUBSTITUTE(B2,"$","")), =UPPER(C2), =DATEVALUE(D2)), write the formula in an adjacent helper column rather than computing the result in Python and overwriting the original. This keeps the transformation transparent and auditable.
  • Only overwrite in place with computed values when the user explicitly asks for it, or when no sensible formula equivalent exists (e.g. encoding/mojibake repair)
  • For destructive operations (removing duplicates, filling blanks, overwriting originals), confirm with the user first
  • After each category of fix (whitespace → casing → number conversion → dates → dedup), show the user a sample of what changed and get confirmation before moving to the next category
  • Report a before/after summary of what changed
general reviews

Ratings

4.510 reviews
  • Shikha Mishra· Oct 10, 2024

    clean-data-xls is among the better-maintained entries we tried; worth keeping pinned for repeat workflows.

  • Piyush G· Sep 9, 2024

    Keeps context tight: clean-data-xls is the kind of skill you can hand to a new teammate without a long onboarding doc.

  • Chaitanya Patil· Aug 8, 2024

    Registry listing for clean-data-xls matched our evaluation — installs cleanly and behaves as described in the markdown.

  • Sakshi Patil· Jul 7, 2024

    clean-data-xls reduced setup friction for our internal harness; good balance of opinion and flexibility.

  • Ganesh Mohane· Jun 6, 2024

    I recommend clean-data-xls for anyone iterating fast on agent tooling; clear intent and a small, reviewable surface area.

  • Oshnikdeep· May 5, 2024

    Useful defaults in clean-data-xls — fewer surprises than typical one-off scripts, and it plays nicely with `npx skills` flows.

  • Dhruvi Jain· Apr 4, 2024

    clean-data-xls has been reliable in day-to-day use. Documentation quality is above average for community skills.

  • Rahul Santra· Mar 3, 2024

    Solid pick for teams standardizing on skills: clean-data-xls is focused, and the summary matches what you get after install.

  • Pratham Ware· Feb 2, 2024

    We added clean-data-xls from the explainx registry; install was straightforward and the SKILL.md answered most questions upfront.

  • Yash Thakker· Jan 1, 2024

    clean-data-xls fits our agent workflows well — practical, well scoped, and easy to wire into existing repos.