xlsx
Utilice esta habilidad siempre que un archivo de hoja de cálculo sea la entrada o salida principal. Esto significa cualquier tarea en la que el usuario quiera: abrir, leer, editar o arreglar un archivo.xlsx,.xlsm,.csv o.tsv existente (por ejemplo, agregar columnas, calcular fórmulas, formatear, crear gráficos, limpiar datos desordenados); crear una nueva hoja de cálculo desde cero o a partir de otras fuentes de datos; o convertir entre formatos de archivos tabulares. Se activa especialmente cuando el usuario hace referencia a un archivo de hoja de cálculo por nombre o ruta, incluso de manera casual (como "el xlsx en mis descargas"), y quiere que se le haga algo o se produzca algo a partir de él. También permite limpiar o reestructurar archivos de datos tabulares desordenados (filas mal formadas, encabezados mal colocados, datos basura) en hojas de cálculo adecuadas. El entregable debe ser un archivo de hoja de cálculo. NO active cuando el entregable principal sea un documento de Word, un informe HTML, un script de Python independiente, una canalización de base de datos o la integración de la API de Google Sheets, incluso si se trata de datos tabulares.
Fuente: Contenido adaptado de anthropics/skills (MIT).
Todos los archivos de Excel
Fuente profesional
- Utilice una fuente uniforme y profesional (por ejemplo, Arial, Times New Roman) para todos los entregables, a menos que el usuario indique lo contrario.
Errores de fórmula cero
- Cada modelo de Excel DEBE entregarse con CERO errores de fórmula (#REF!, #DIV/0!, #VALUE!, #N/A, #NAME?)
Conservar plantillas existentes (al actualizar plantillas)
- Estudie y combine EXACTAMENTE el formato, el estilo y las convenciones existentes al modificar archivos
- Nunca impongas formatos estandarizados en archivos con patrones establecidos
- Las convenciones de plantillas existentes SIEMPRE anulan estas pautas
Modelos financieros
Estándares de codificación de colores
A menos que el usuario o la plantilla existente indiquen lo contrario
Convenciones de color estándar de la industria
- Texto azul (RGB: 0,0,255): entradas codificadas y números que los usuarios cambiarán para los escenarios
- Texto negro (RGB: 0,0,0): TODAS las fórmulas y cálculos
- Texto verde (RGB: 0,128,0): enlaces extraídos de otras hojas de trabajo dentro del mismo libro
- Texto rojo (RGB: 255,0,0): Enlaces externos a otros archivos
- Fondo amarillo (RGB: 255,255,0): Suposiciones clave que requieren atención o celdas que deben actualizarse
Estándares de formato numérico
Reglas de formato requeridas
- Años: formato como cadenas de texto (p. ej., "2024", no "2024")
- Moneda: Utilice el formato $#,##0; SIEMPRE especifique las unidades en los encabezados ("Ingresos ($mm)")
- Ceros: utilice el formato numérico para convertir todos los ceros en "-", incluidos los porcentajes (por ejemplo, "$#,##0;($#,##0);-")
- Porcentajes: formato predeterminado de 0,0% (un decimal)
- Múltiplos: formato 0,0x para múltiplos de valoración (EV/EBITDA, P/E)
- Números negativos: utilice paréntesis (123) y no menos -123
Reglas de construcción de fórmulas.
Colocación de supuestos
- Coloque TODOS los supuestos (tasas de crecimiento, márgenes, múltiplos, etc.) en celdas de supuestos separadas.
- Utilice referencias de celda en lugar de valores codificados en fórmulas
- Ejemplo: Utilice =B5*(1+$B$6) en lugar de =B5*1.05
Prevención de errores de fórmula
- Verifique que todas las referencias de celda sean correctas
- Compruebe si hay errores uno por uno en rangos
- Garantizar fórmulas consistentes en todos los períodos de proyección
- Prueba con casos extremos (valores cero, números negativos)
- Verifique que no haya referencias circulares no deseadas
Requisitos de documentación para códigos duros
- Comente o en las celdas al lado (si está al final de la tabla). Formato: "Fuente: [Sistema/Documento], [Fecha], [Referencia específica], [URL si corresponde]"
- Ejemplos:
- "Fuente: Compañía 10-K, año fiscal 2024, página 45, nota de ingresos, [URL de SEC EDGAR]"
- "Fuente: Compañía 10-Q, segundo trimestre de 2025, Anexo 99.1, [SEC EDGAR URL]"
- "Fuente: Bloomberg Terminal, 15/8/2025, AAPL US Equity"
- "Fuente: FactSet, 20/8/2025, pantalla de estimaciones de consenso"
Creación, edición y análisis de XLSX
Descripción general
Un usuario puede pedirle que cree, edite o analice el contenido de un archivo.xlsx. Tienes diferentes herramientas y flujos de trabajo disponibles para diferentes tareas.
Requisitos importantes
Se requiere LibreOffice para volver a calcular las fórmulas: puede asumir que LibreOffice está instalado para volver a calcular los valores de las fórmulas utilizando el scriptscripts/recalc.py. El script configura automáticamente LibreOffice en la primera ejecución, incluso en entornos aislados donde los sockets Unix están restringidos (gestionado porscripts/office/soffice.py)
Leer y analizar datos.
Análisis de datos con pandas.
Para análisis de datos, visualización y operaciones básicas, utilice pandas que proporciona potentes capacidades de manipulación de datos:
import pandas as pd
# Read Excel
df = pd.read_excel('file.xlsx') # Default: first sheet
all_sheets = pd.read_excel('file.xlsx', sheet_name=None) # All sheets as dict
# Analyze
df.head() # Preview data
df.info() # Column info
df.describe() # Statistics
# Write Excel
df.to_excel('output.xlsx', index=False)Flujos de trabajo de archivos de Excel
CRÍTICO: Utilice fórmulas, no valores codificados
Utilice siempre fórmulas de Excel en lugar de calcular valores en Python y codificarlos. Esto garantiza que la hoja de cálculo siga siendo dinámica y actualizable.
INCORRECTO: codificación de valores calculados
# Bad: Calculating in Python and hardcoding result
total = df['Sales'].sum()
sheet['B10'] = total # Hardcodes 5000
# Bad: Computing growth rate in Python
growth = (df.iloc[-1]['Revenue'] - df.iloc[0]['Revenue']) / df.iloc[0]['Revenue']
sheet['C5'] = growth # Hardcodes 0.15
# Bad: Python calculation for average
avg = sum(values) / len(values)
sheet['D20'] = avg # Hardcodes 42.5CORRECTO: uso de fórmulas de Excel
# Good: Let Excel calculate the sum
sheet['B10'] = '=SUM(B2:B9)'
# Good: Growth rate as Excel formula
sheet['C5'] = '=(C4-C2)/C2'
# Good: Average using Excel function
sheet['D20'] = '=AVERAGE(D2:D19)'Esto se aplica a TODOS los cálculos: totales, porcentajes, proporciones, diferencias, etc. La hoja de cálculo debería poder volver a calcular cuando cambian los datos de origen.
Flujo de trabajo común
- Elija herramienta: pandas para datos, openpyxl para fórmulas/formato
- Crear/Cargar: crea un nuevo libro de trabajo o carga un archivo existente
- Modificar: agregar/editar datos, fórmulas y formato
- Guardar: escribir en el archivo
- Recalcular fórmulas (OBLIGATORIO SI SE UTILIZA FÓRMULAS): Utilice el script scripts/recalc.py
python scripts/recalc.py output.xlsx - Verifique y corrija cualquier error:
- El script devuelve JSON con detalles del error.
- Si
statuseserrors_found, verifiqueerror_summarypara conocer ubicaciones y tipos de errores específicos. - Corrija los errores identificados y vuelva a calcular
- Errores comunes para corregir:
#REF!: Referencias de celda no válidas#DIV/0!: División por cero#VALUE!: Tipo de datos incorrecto en la fórmula#NAME?: nombre de fórmula no reconocido
Creando nuevos archivos de Excel
# Using openpyxl for formulas and formatting
from openpyxl import Workbook
from openpyxl.styles import Font, PatternFill, Alignment
wb = Workbook()
sheet = wb.active
# Add data
sheet['A1'] = 'Hello'
sheet['B1'] = 'World'
sheet.append(['Row', 'of', 'data'])
# Add formula
sheet['B2'] = '=SUM(A1:A10)'
# Formatting
sheet['A1'].font = Font(bold=True, color='FF0000')
sheet['A1'].fill = PatternFill('solid', start_color='FFFF00')
sheet['A1'].alignment = Alignment(horizontal='center')
# Column width
sheet.column_dimensions['A'].width = 20
wb.save('output.xlsx')Editar archivos de Excel existentes
# Using openpyxl to preserve formulas and formatting
from openpyxl import load_workbook
# Load existing file
wb = load_workbook('existing.xlsx')
sheet = wb.active # or wb['SheetName'] for specific sheet
# Working with multiple sheets
for sheet_name in wb.sheetnames:
sheet = wb[sheet_name]
print(f"Sheet: {sheet_name}")
# Modify cells
sheet['A1'] = 'New Value'
sheet.insert_rows(2) # Insert row at position 2
sheet.delete_cols(3) # Delete column 3
# Add new sheet
new_sheet = wb.create_sheet('NewSheet')
new_sheet['A1'] = 'Data'
wb.save('modified.xlsx')Recalcular fórmulas
Los archivos de Excel creados o modificados por openpyxl contienen fórmulas como cadenas pero no valores calculados. Utilice el scriptscripts/recalc.pyproporcionado para volver a calcular fórmulas:
python scripts/recalc.py <excel_file> [timeout_seconds]Ejemplo:
python scripts/recalc.py output.xlsx 30El guión:
- Configura automáticamente la macro de LibreOffice en la primera ejecución
- Vuelve a calcular todas las fórmulas en todas las hojas.
- Escanea TODAS las celdas en busca de errores de Excel (#REF!, #DIV/0!, etc.)
- Devuelve JSON con recuentos y ubicaciones de errores detallados
- Funciona tanto en Linux como en macOS
Lista de verificación de verificación de fórmula
Comprobaciones rápidas para garantizar que las fórmulas funcionen correctamente:
Verificación esencial
- [] Prueba 2-3 referencias de muestra: Verifique que obtengan los valores correctos antes de construir el modelo completo
- [] Asignación de columnas: Confirme que las columnas de Excel coincidan (por ejemplo, columna 64 = BL, no BK)
- [] Desplazamiento de fila: recuerde que las filas de Excel están indexadas en 1 (fila 5 del marco de datos = fila 6 de Excel)
Errores comunes
- [] Manejo de NaN: Verifique valores nulos con
pd.notna() - [] Columnas de extrema derecha: datos del año fiscal a menudo en columnas 50+
- [] Múltiples coincidencias: busque todas las apariciones, no solo la primera
- División por cero: Verifique los denominadores antes de usar
/en fórmulas (#DIV/0!) - [] Referencias incorrectas: Verifique que todas las referencias de celda apunten a las celdas deseadas (#¡REF!)
- [] Referencias entre hojas: use el formato correcto (Hoja1!A1) para vincular hojas
Estrategia de prueba de fórmulas
- Empiece poco a poco: pruebe las fórmulas en 2 o 3 celdas antes de aplicarlas de forma amplia
- [] Verificar dependencias: comprobar que existen todas las celdas a las que se hace referencia en las fórmulas
- [] Casos extremos de prueba: incluya valores cero, negativos y muy grandes
Interpretación de scripts/salida recalc.py
El script devuelve JSON con detalles del error:
{
"status": "success", // or "errors_found"
"total_errors": 0, // Total error count
"total_formulas": 42, // Number of formulas in file
"error_summary": { // Only present if errors found
"#REF!": {
"count": 2,
"locations": ["Sheet1!B5", "Sheet1!C10"]
}
}
}Mejores prácticas
Selección de biblioteca
- pandas: Ideal para análisis de datos, operaciones masivas y exportación de datos simples
- openpyxl: Ideal para formatos complejos, fórmulas y funciones específicas de Excel
Trabajando con openpyxl
- Los índices de celda están basados en 1 (fila=1, columna=1 se refiere a la celda A1)
- Utilice
data_only=Truepara leer los valores calculados:load_workbook('file.xlsx', data_only=True) - Advertencia: Si se abre con
data_only=Truey se guarda, las fórmulas se reemplazan con valores y se pierden permanentemente. - Para archivos grandes: utilice
read_only=Truepara leer owrite_only=Truepara escribir. - Las fórmulas se conservan pero no se evalúan: use scripts/recalc.py para actualizar los valores
Trabajando con pandas
- Especifique tipos de datos para evitar problemas de inferencia:
pd.read_excel('file.xlsx', dtype={'id': str}) - Para archivos grandes, lea columnas específicas:
pd.read_excel('file.xlsx', usecols=['A', 'C', 'E']) - Manejar las fechas correctamente:
pd.read_excel('file.xlsx', parse_dates=['date_column'])
Pautas de estilo de código
IMPORTANTE: Al generar código Python para operaciones de Excel:
- Escriba código Python mínimo y conciso sin comentarios innecesarios
- Evite nombres de variables detallados y operaciones redundantes
- Evite declaraciones impresas innecesarias
Para archivos de Excel:
- Agregue comentarios a celdas con fórmulas complejas o suposiciones importantes
- Documentar fuentes de datos para valores codificados
- Incluir notas para cálculos clave y secciones del modelo.
Archivos de recursos
LICENCIA.txt
Recurso binario
scripts/office/helpers/init.py
Descargar scripts/office/helpers/init.py
Recurso binario
scripts/office/helpers/merge_runs.py
Descargar scripts/office/helpers/merge_runs.py
"""Merge adjacent runs with identical formatting in DOCX.
Merges adjacent <w:r> elements that have identical <w:rPr> properties.
Works on runs in paragraphs and inside tracked changes (<w:ins>, <w:del>).
Also:
- Removes rsid attributes from runs (revision metadata that doesn't affect rendering)
- Removes proofErr elements (spell/grammar markers that block merging)
"""
from pathlib import Path
import defusedxml.minidom
def merge_runs(input_dir: str) -> tuple[int, str]:
doc_xml = Path(input_dir) / "word" / "document.xml"
if not doc_xml.exists():
return 0, f"Error: {doc_xml} not found"
try:
dom = defusedxml.minidom.parseString(doc_xml.read_text(encoding="utf-8"))
root = dom.documentElement
_remove_elements(root, "proofErr")
_strip_run_rsid_attrs(root)
containers = {run.parentNode for run in _find_elements(root, "r")}
merge_count = 0
for container in containers:
merge_count += _merge_runs_in(container)
doc_xml.write_bytes(dom.toxml(encoding="UTF-8"))
return merge_count, f"Merged {merge_count} runs"
except Exception as e:
return 0, f"Error: {e}"
def _find_elements(root, tag: str) -> list:
results = []
def traverse(node):
if node.nodeType == node.ELEMENT_NODE:
name = node.localName or node.tagName
if name == tag or name.endswith(f":{tag}"):
results.append(node)
for child in node.childNodes:
traverse(child)
traverse(root)
return results
def _get_child(parent, tag: str):
for child in parent.childNodes:
if child.nodeType == child.ELEMENT_NODE:
name = child.localName or child.tagName
if name == tag or name.endswith(f":{tag}"):
return child
return None
def _get_children(parent, tag: str) -> list:
results = []
for child in parent.childNodes:
if child.nodeType == child.ELEMENT_NODE:
name = child.localName or child.tagName
if name == tag or name.endswith(f":{tag}"):
results.append(child)
return results
def _is_adjacent(elem1, elem2) -> bool:
node = elem1.nextSibling
while node:
if node == elem2:
return True
if node.nodeType == node.ELEMENT_NODE:
return False
if node.nodeType == node.TEXT_NODE and node.data.strip():
return False
node = node.nextSibling
return False
def _remove_elements(root, tag: str):
for elem in _find_elements(root, tag):
if elem.parentNode:
elem.parentNode.removeChild(elem)
def _strip_run_rsid_attrs(root):
for run in _find_elements(root, "r"):
for attr in list(run.attributes.values()):
if "rsid" in attr.name.lower():
run.removeAttribute(attr.name)
def _merge_runs_in(container) -> int:
merge_count = 0
run = _first_child_run(container)
while run:
while True:
next_elem = _next_element_sibling(run)
if next_elem and _is_run(next_elem) and _can_merge(run, next_elem):
_merge_run_content(run, next_elem)
container.removeChild(next_elem)
merge_count += 1
else:
break
_consolidate_text(run)
run = _next_sibling_run(run)
return merge_count
def _first_child_run(container):
for child in container.childNodes:
if child.nodeType == child.ELEMENT_NODE and _is_run(child):
return child
return None
def _next_element_sibling(node):
sibling = node.nextSibling
while sibling:
if sibling.nodeType == sibling.ELEMENT_NODE:
return sibling
sibling = sibling.nextSibling
return None
def _next_sibling_run(node):
sibling = node.nextSibling
while sibling:
if sibling.nodeType == sibling.ELEMENT_NODE:
if _is_run(sibling):
return sibling
sibling = sibling.nextSibling
return None
def _is_run(node) -> bool:
name = node.localName or node.tagName
return name == "r" or name.endswith(":r")
def _can_merge(run1, run2) -> bool:
rpr1 = _get_child(run1, "rPr")
rpr2 = _get_child(run2, "rPr")
if (rpr1 is None) != (rpr2 is None):
return False
if rpr1 is None:
return True
return rpr1.toxml() == rpr2.toxml()
def _merge_run_content(target, source):
for child in list(source.childNodes):
if child.nodeType == child.ELEMENT_NODE:
name = child.localName or child.tagName
if name != "rPr" and not name.endswith(":rPr"):
target.appendChild(child)
def _consolidate_text(run):
t_elements = _get_children(run, "t")
for i in range(len(t_elements) - 1, 0, -1):
curr, prev = t_elements[i], t_elements[i - 1]
if _is_adjacent(prev, curr):
prev_text = prev.firstChild.data if prev.firstChild else ""
curr_text = curr.firstChild.data if curr.firstChild else ""
merged = prev_text + curr_text
if prev.firstChild:
prev.firstChild.data = merged
else:
prev.appendChild(run.ownerDocument.createTextNode(merged))
if merged.startswith(" ") or merged.endswith(" "):
prev.setAttribute("xml:space", "preserve")
elif prev.hasAttribute("xml:space"):
prev.removeAttribute("xml:space")
run.removeChild(curr)scripts/office/helpers/simplify_redlines.py
Descargar scripts/office/helpers/simplify_redlines.py
"""Simplify tracked changes by merging adjacent w:ins or w:del elements.
Merges adjacent <w:ins> elements from the same author into a single element.
Same for <w:del> elements. This makes heavily-redlined documents easier to
work with by reducing the number of tracked change wrappers.
Rules:
- Only merges w:ins with w:ins, w:del with w:del (same element type)
- Only merges if same author (ignores timestamp differences)
- Only merges if truly adjacent (only whitespace between them)
"""
import xml.etree.ElementTree as ET
import zipfile
from pathlib import Path
import defusedxml.minidom
WORD_NS = "http://schemas.openxmlformats.org/wordprocessingml/2006/main"
def simplify_redlines(input_dir: str) -> tuple[int, str]:
doc_xml = Path(input_dir) / "word" / "document.xml"
if not doc_xml.exists():
return 0, f"Error: {doc_xml} not found"
try:
dom = defusedxml.minidom.parseString(doc_xml.read_text(encoding="utf-8"))
root = dom.documentElement
merge_count = 0
containers = _find_elements(root, "p") + _find_elements(root, "tc")
for container in containers:
merge_count += _merge_tracked_changes_in(container, "ins")
merge_count += _merge_tracked_changes_in(container, "del")
doc_xml.write_bytes(dom.toxml(encoding="UTF-8"))
return merge_count, f"Simplified {merge_count} tracked changes"
except Exception as e:
return 0, f"Error: {e}"
def _merge_tracked_changes_in(container, tag: str) -> int:
merge_count = 0
tracked = [
child
for child in container.childNodes
if child.nodeType == child.ELEMENT_NODE and _is_element(child, tag)
]
if len(tracked) < 2:
return 0
i = 0
while i < len(tracked) - 1:
curr = tracked[i]
next_elem = tracked[i + 1]
if _can_merge_tracked(curr, next_elem):
_merge_tracked_content(curr, next_elem)
container.removeChild(next_elem)
tracked.pop(i + 1)
merge_count += 1
else:
i += 1
return merge_count
def _is_element(node, tag: str) -> bool:
name = node.localName or node.tagName
return name == tag or name.endswith(f":{tag}")
def _get_author(elem) -> str:
author = elem.getAttribute("w:author")
if not author:
for attr in elem.attributes.values():
if attr.localName == "author" or attr.name.endswith(":author"):
return attr.value
return author
def _can_merge_tracked(elem1, elem2) -> bool:
if _get_author(elem1) != _get_author(elem2):
return False
node = elem1.nextSibling
while node and node != elem2:
if node.nodeType == node.ELEMENT_NODE:
return False
if node.nodeType == node.TEXT_NODE and node.data.strip():
return False
node = node.nextSibling
return True
def _merge_tracked_content(target, source):
while source.firstChild:
child = source.firstChild
source.removeChild(child)
target.appendChild(child)
def _find_elements(root, tag: str) -> list:
results = []
def traverse(node):
if node.nodeType == node.ELEMENT_NODE:
name = node.localName or node.tagName
if name == tag or name.endswith(f":{tag}"):
results.append(node)
for child in node.childNodes:
traverse(child)
traverse(root)
return results
def get_tracked_change_authors(doc_xml_path: Path) -> dict[str, int]:
if not doc_xml_path.exists():
return {}
try:
tree = ET.parse(doc_xml_path)
root = tree.getroot()
except ET.ParseError:
return {}
namespaces = {"w": WORD_NS}
author_attr = f"{{{WORD_NS}}}author"
authors: dict[str, int] = {}
for tag in ["ins", "del"]:
for elem in root.findall(f".//w:{tag}", namespaces):
author = elem.get(author_attr)
if author:
authors[author] = authors.get(author, 0) + 1
return authors
def _get_authors_from_docx(docx_path: Path) -> dict[str, int]:
try:
with zipfile.ZipFile(docx_path, "r") as zf:
if "word/document.xml" not in zf.namelist():
return {}
with zf.open("word/document.xml") as f:
tree = ET.parse(f)
root = tree.getroot()
namespaces = {"w": WORD_NS}
author_attr = f"{{{WORD_NS}}}author"
authors: dict[str, int] = {}
for tag in ["ins", "del"]:
for elem in root.findall(f".//w:{tag}", namespaces):
author = elem.get(author_attr)
if author:
authors[author] = authors.get(author, 0) + 1
return authors
except (zipfile.BadZipFile, ET.ParseError):
return {}
def infer_author(modified_dir: Path, original_docx: Path, default: str = "Claude") -> str:
modified_xml = modified_dir / "word" / "document.xml"
modified_authors = get_tracked_change_authors(modified_xml)
if not modified_authors:
return default
original_authors = _get_authors_from_docx(original_docx)
new_changes: dict[str, int] = {}
for author, count in modified_authors.items():
original_count = original_authors.get(author, 0)
diff = count - original_count
if diff > 0:
new_changes[author] = diff
if not new_changes:
return default
if len(new_changes) == 1:
return next(iter(new_changes))
raise ValueError(
f"Multiple authors added new changes: {new_changes}. "
"Cannot infer which author to validate."
)scripts/office/pack.py
Descargar scripts/office/pack.py
"""Pack a directory into a DOCX, PPTX, or XLSX file.
Validates with auto-repair, condenses XML formatting, and creates the Office file.
Usage:
python pack.py <input_directory> <output_file> [--original <file>] [--validate true|false]
Examples:
python pack.py unpacked/ output.docx --original input.docx
python pack.py unpacked/ output.pptx --validate false
"""
import argparse
import sys
import shutil
import tempfile
import zipfile
from pathlib import Path
import defusedxml.minidom
from validators import DOCXSchemaValidator, PPTXSchemaValidator, RedliningValidator
def pack(
input_directory: str,
output_file: str,
original_file: str | None = None,
validate: bool = True,
infer_author_func=None,
) -> tuple[None, str]:
input_dir = Path(input_directory)
output_path = Path(output_file)
suffix = output_path.suffix.lower()
if not input_dir.is_dir():
return None, f"Error: {input_dir} is not a directory"
if suffix not in {".docx", ".pptx", ".xlsx"}:
return None, f"Error: {output_file} must be a .docx, .pptx, or .xlsx file"
if validate and original_file:
original_path = Path(original_file)
if original_path.exists():
success, output = _run_validation(
input_dir, original_path, suffix, infer_author_func
)
if output:
print(output)
if not success:
return None, f"Error: Validation failed for {input_dir}"
with tempfile.TemporaryDirectory() as temp_dir:
temp_content_dir = Path(temp_dir) / "content"
shutil.copytree(input_dir, temp_content_dir)
for pattern in ["*.xml", "*.rels"]:
for xml_file in temp_content_dir.rglob(pattern):
_condense_xml(xml_file)
output_path.parent.mkdir(parents=True, exist_ok=True)
with zipfile.ZipFile(output_path, "w", zipfile.ZIP_DEFLATED) as zf:
for f in temp_content_dir.rglob("*"):
if f.is_file():
zf.write(f, f.relative_to(temp_content_dir))
return None, f"Successfully packed {input_dir} to {output_file}"
def _run_validation(
unpacked_dir: Path,
original_file: Path,
suffix: str,
infer_author_func=None,
) -> tuple[bool, str | None]:
output_lines = []
validators = []
if suffix == ".docx":
author = "Claude"
if infer_author_func:
try:
author = infer_author_func(unpacked_dir, original_file)
except ValueError as e:
print(f"Warning: {e} Using default author 'Claude'.", file=sys.stderr)
validators = [
DOCXSchemaValidator(unpacked_dir, original_file),
RedliningValidator(unpacked_dir, original_file, author=author),
]
elif suffix == ".pptx":
validators = [PPTXSchemaValidator(unpacked_dir, original_file)]
if not validators:
return True, None
total_repairs = sum(v.repair() for v in validators)
if total_repairs:
output_lines.append(f"Auto-repaired {total_repairs} issue(s)")
success = all(v.validate() for v in validators)
if success:
output_lines.append("All validations PASSED!")
return success, "\n".join(output_lines) if output_lines else None
def _condense_xml(xml_file: Path) -> None:
try:
with open(xml_file, encoding="utf-8") as f:
dom = defusedxml.minidom.parse(f)
for element in dom.getElementsByTagName("*"):
if element.tagName.endswith(":t"):
continue
for child in list(element.childNodes):
if (
child.nodeType == child.TEXT_NODE
and child.nodeValue
and child.nodeValue.strip() == ""
) or child.nodeType == child.COMMENT_NODE:
element.removeChild(child)
xml_file.write_bytes(dom.toxml(encoding="UTF-8"))
except Exception as e:
print(f"ERROR: Failed to parse {xml_file.name}: {e}", file=sys.stderr)
raise
if __name__ == "__main__":
parser = argparse.ArgumentParser(
description="Pack a directory into a DOCX, PPTX, or XLSX file"
)
parser.add_argument("input_directory", help="Unpacked Office document directory")
parser.add_argument("output_file", help="Output Office file (.docx/.pptx/.xlsx)")
parser.add_argument(
"--original",
help="Original file for validation comparison",
)
parser.add_argument(
"--validate",
type=lambda x: x.lower() == "true",
default=True,
metavar="true|false",
help="Run validation with auto-repair (default: true)",
)
args = parser.parse_args()
_, message = pack(
args.input_directory,
args.output_file,
original_file=args.original,
validate=args.validate,
)
print(message)
if "Error" in message:
sys.exit(1)scripts/office/schemas/ISO-IEC29500-4_2016/dml-chart.xsd
Descargar scripts/office/schemas/ISO-IEC29500-4_2016/dml-chart.xsd
Recurso binario
scripts/office/schemas/ISO-IEC29500-4_2016/dml-chartDrawing.xsd
Descargar scripts/office/schemas/ISO-IEC29500-4_2016/dml-chartDrawing.xsd
Recurso binario
scripts/office/schemas/ISO-IEC29500-4_2016/dml-diagram.xsd
Descargar scripts/office/schemas/ISO-IEC29500-4_2016/dml-diagram.xsd
Recurso binario
scripts/office/schemas/ISO-IEC29500-4_2016/dml-lockedCanvas.xsd
Descargar scripts/office/schemas/ISO-IEC29500-4_2016/dml-lockedCanvas.xsd
Recurso binario
scripts/office/schemas/ISO-IEC29500-4_2016/dml-main.xsd
Descargar scripts/office/schemas/ISO-IEC29500-4_2016/dml-main.xsd
Recurso binario
scripts/office/schemas/ISO-IEC29500-4_2016/dml-picture.xsd
Descargar scripts/office/schemas/ISO-IEC29500-4_2016/dml-picture.xsd
Recurso binario
scripts/office/schemas/ISO-IEC29500-4_2016/dml-spreadsheetDrawing.xsd
Descargar scripts/office/schemas/ISO-IEC29500-4_2016/dml-spreadsheetDrawing.xsd
Recurso binario
scripts/office/schemas/ISO-IEC29500-4_2016/dml-wordprocessingDrawing.xsd
Descargar scripts/office/schemas/ISO-IEC29500-4_2016/dml-wordprocessingDrawing.xsd
Recurso binario
scripts/office/schemas/ISO-IEC29500-4_2016/pml.xsd
Descargar scripts/office/schemas/ISO-IEC29500-4_2016/pml.xsd
Recurso binario
scripts/office/schemas/ISO-IEC29500-4_2016/shared-additionalCharacteristics.xsd
Descargar scripts/office/schemas/ISO-IEC29500-4_2016/shared-additionalCharacteristics.xsd
Recurso binario
scripts/office/schemas/ISO-IEC29500-4_2016/shared-bibliography.xsd
Descargar scripts/office/schemas/ISO-IEC29500-4_2016/shared-bibliography.xsd
Recurso binario
scripts/office/schemas/ISO-IEC29500-4_2016/shared-commonSimpleTypes.xsd
Descargar scripts/office/schemas/ISO-IEC29500-4_2016/shared-commonSimpleTypes.xsd
Recurso binario
scripts/office/schemas/ISO-IEC29500-4_2016/shared-customXmlDataProperties.xsd
Descargar scripts/office/schemas/ISO-IEC29500-4_2016/shared-customXmlDataProperties.xsd
Recurso binario
scripts/office/schemas/ISO-IEC29500-4_2016/shared-customXmlSchemaProperties.xsd
Descargar scripts/office/schemas/ISO-IEC29500-4_2016/shared-customXmlSchemaProperties.xsd
Recurso binario
scripts/office/schemas/ISO-IEC29500-4_2016/shared-documentPropertiesCustom.xsd
Descargar scripts/office/schemas/ISO-IEC29500-4_2016/shared-documentPropertiesCustom.xsd
Recurso binario
scripts/office/schemas/ISO-IEC29500-4_2016/shared-documentPropertiesExtended.xsd
Descargar scripts/office/schemas/ISO-IEC29500-4_2016/shared-documentPropertiesExtended.xsd
Recurso binario
scripts/office/schemas/ISO-IEC29500-4_2016/shared-documentPropertiesVariantTypes.xsd
Descargar scripts/office/schemas/ISO-IEC29500-4_2016/shared-documentPropertiesVariantTypes.xsd
Recurso binario
scripts/office/schemas/ISO-IEC29500-4_2016/shared-math.xsd
Descargar scripts/office/schemas/ISO-IEC29500-4_2016/shared-math.xsd
Recurso binario
scripts/office/schemas/ISO-IEC29500-4_2016/shared-relationshipReference.xsd
Descargar scripts/office/schemas/ISO-IEC29500-4_2016/shared-relationshipReference.xsd
Recurso binario
scripts/office/schemas/ISO-IEC29500-4_2016/sml.xsd
Descargar scripts/office/schemas/ISO-IEC29500-4_2016/sml.xsd
Recurso binario
scripts/office/schemas/ISO-IEC29500-4_2016/vml-main.xsd
Descargar scripts/office/schemas/ISO-IEC29500-4_2016/vml-main.xsd
Recurso binario
scripts/office/schemas/ISO-IEC29500-4_2016/vml-officeDrawing.xsd
Descargar scripts/office/schemas/ISO-IEC29500-4_2016/vml-officeDrawing.xsd
Recurso binario
scripts/office/schemas/ISO-IEC29500-4_2016/vml-presentationDrawing.xsd
Descargar scripts/office/schemas/ISO-IEC29500-4_2016/vml-presentationDrawing.xsd
Recurso binario
scripts/office/schemas/ISO-IEC29500-4_2016/vml-spreadsheetDrawing.xsd
Descargar scripts/office/schemas/ISO-IEC29500-4_2016/vml-spreadsheetDrawing.xsd
Recurso binario
scripts/office/schemas/ISO-IEC29500-4_2016/vml-wordprocessingDrawing.xsd
Descargar scripts/office/schemas/ISO-IEC29500-4_2016/vml-wordprocessingDrawing.xsd
Recurso binario
scripts/office/schemas/ISO-IEC29500-4_2016/wml.xsd
Descargar scripts/office/schemas/ISO-IEC29500-4_2016/wml.xsd
Recurso binario
scripts/office/schemas/ISO-IEC29500-4_2016/xml.xsd
Descargar scripts/office/schemas/ISO-IEC29500-4_2016/xml.xsd
Recurso binario
scripts/office/schemas/ecma/cuarta edición/opc-contentTypes.xsd
Descargar scripts/office/schemas/ecma/fouth-edition/opc-contentTypes.xsd
Recurso binario
scripts/office/schemas/ecma/cuarta-edición/opc-coreProperties.xsd
Descargar scripts/office/schemas/ecma/fouth-edition/opc-coreProperties.xsd
Recurso binario
scripts/office/schemas/ecma/cuarta-edición/opc-digSig.xsd
Descargar scripts/office/schemas/ecma/fouth-edition/opc-digSig.xsd
Recurso binario
scripts/office/schemas/ecma/cuarta-edición/opc-relationships.xsd
Descargar scripts/office/schemas/ecma/fouth-edition/opc-relationships.xsd
Recurso binario
scripts/office/schemas/mce/mc.xsd
Descargar scripts/office/schemas/mce/mc.xsd
Recurso binario
scripts/office/schemas/microsoft/wml-2010.xsd
Descargar scripts/office/schemas/microsoft/wml-2010.xsd
Recurso binario
scripts/office/schemas/microsoft/wml-2012.xsd
Descargar scripts/office/schemas/microsoft/wml-2012.xsd
Recurso binario
scripts/office/schemas/microsoft/wml-2018.xsd
Descargar scripts/office/schemas/microsoft/wml-2018.xsd
Recurso binario
scripts/office/schemas/microsoft/wml-cex-2018.xsd
Descargar scripts/office/schemas/microsoft/wml-cex-2018.xsd
Recurso binario
scripts/office/schemas/microsoft/wml-cid-2016.xsd
Descargar scripts/office/schemas/microsoft/wml-cid-2016.xsd
Recurso binario
scripts/office/schemas/microsoft/wml-sdtdatahash-2020.xsd
Descargar scripts/office/schemas/microsoft/wml-sdtdatahash-2020.xsd
Recurso binario
scripts/office/schemas/microsoft/wml-symex-2015.xsd
Descargar scripts/office/schemas/microsoft/wml-symex-2015.xsd
Recurso binario
scripts/office/soffice.py
Descargar scripts/office/soffice.py
"""
Helper for running LibreOffice (soffice) in environments where AF_UNIX
sockets may be blocked (e.g., sandboxed VMs). Detects the restriction
at runtime and applies an LD_PRELOAD shim if needed.
Usage:
from office.soffice import run_soffice, get_soffice_env
# Option 1 – run soffice directly
result = run_soffice(["--headless", "--convert-to", "pdf", "input.docx"])
# Option 2 – get env dict for your own subprocess calls
env = get_soffice_env()
subprocess.run(["soffice", ...], env=env)
"""
import os
import socket
import subprocess
import tempfile
from pathlib import Path
def get_soffice_env() -> dict:
env = os.environ.copy()
env["SAL_USE_VCLPLUGIN"] = "svp"
if _needs_shim():
shim = _ensure_shim()
env["LD_PRELOAD"] = str(shim)
return env
def run_soffice(args: list[str], **kwargs) -> subprocess.CompletedProcess:
env = get_soffice_env()
return subprocess.run(["soffice"] + args, env=env, **kwargs)
_SHIM_SO = Path(tempfile.gettempdir()) / "lo_socket_shim.so"
def _needs_shim() -> bool:
try:
s = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM)
s.close()
return False
except OSError:
return True
def _ensure_shim() -> Path:
if _SHIM_SO.exists():
return _SHIM_SO
src = Path(tempfile.gettempdir()) / "lo_socket_shim.c"
src.write_text(_SHIM_SOURCE)
subprocess.run(
["gcc", "-shared", "-fPIC", "-o", str(_SHIM_SO), str(src), "-ldl"],
check=True,
capture_output=True,
)
src.unlink()
return _SHIM_SO
_SHIM_SOURCE = r"""
#define _GNU_SOURCE
#include <dlfcn.h>
#include <errno.h>
#include <signal.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/socket.h>
#include <unistd.h>
static int (*real_socket)(int, int, int);
static int (*real_socketpair)(int, int, int, int[2]);
static int (*real_listen)(int, int);
static int (*real_accept)(int, struct sockaddr *, socklen_t *);
static int (*real_close)(int);
static int (*real_read)(int, void *, size_t);
/* Per-FD bookkeeping (FDs >= 1024 are passed through unshimmed). */
static int is_shimmed[1024];
static int peer_of[1024];
static int wake_r[1024]; /* accept() blocks reading this */
static int wake_w[1024]; /* close() writes to this */
static int listener_fd = -1; /* FD that received listen() */
__attribute__((constructor))
static void init(void) {
real_socket = dlsym(RTLD_NEXT, "socket");
real_socketpair = dlsym(RTLD_NEXT, "socketpair");
real_listen = dlsym(RTLD_NEXT, "listen");
real_accept = dlsym(RTLD_NEXT, "accept");
real_close = dlsym(RTLD_NEXT, "close");
real_read = dlsym(RTLD_NEXT, "read");
for (int i = 0; i < 1024; i++) {
peer_of[i] = -1;
wake_r[i] = -1;
wake_w[i] = -1;
}
}
/* ---- socket ---------------------------------------------------------- */
int socket(int domain, int type, int protocol) {
if (domain == AF_UNIX) {
int fd = real_socket(domain, type, protocol);
if (fd >= 0) return fd;
/* socket(AF_UNIX) blocked – fall back to socketpair(). */
int sv[2];
if (real_socketpair(domain, type, protocol, sv) == 0) {
if (sv[0] >= 0 && sv[0] < 1024) {
is_shimmed[sv[0]] = 1;
peer_of[sv[0]] = sv[1];
int wp[2];
if (pipe(wp) == 0) {
wake_r[sv[0]] = wp[0];
wake_w[sv[0]] = wp[1];
}
}
return sv[0];
}
errno = EPERM;
return -1;
}
return real_socket(domain, type, protocol);
}
/* ---- listen ---------------------------------------------------------- */
int listen(int sockfd, int backlog) {
if (sockfd >= 0 && sockfd < 1024 && is_shimmed[sockfd]) {
listener_fd = sockfd;
return 0;
}
return real_listen(sockfd, backlog);
}
/* ---- accept ---------------------------------------------------------- */
int accept(int sockfd, struct sockaddr *addr, socklen_t *addrlen) {
if (sockfd >= 0 && sockfd < 1024 && is_shimmed[sockfd]) {
/* Block until close() writes to the wake pipe. */
if (wake_r[sockfd] >= 0) {
char buf;
real_read(wake_r[sockfd], &buf, 1);
}
errno = ECONNABORTED;
return -1;
}
return real_accept(sockfd, addr, addrlen);
}
/* ---- close ----------------------------------------------------------- */
int close(int fd) {
if (fd >= 0 && fd < 1024 && is_shimmed[fd]) {
int was_listener = (fd == listener_fd);
is_shimmed[fd] = 0;
if (wake_w[fd] >= 0) { /* unblock accept() */
char c = 0;
write(wake_w[fd], &c, 1);
real_close(wake_w[fd]);
wake_w[fd] = -1;
}
if (wake_r[fd] >= 0) { real_close(wake_r[fd]); wake_r[fd] = -1; }
if (peer_of[fd] >= 0) { real_close(peer_of[fd]); peer_of[fd] = -1; }
if (was_listener)
_exit(0); /* conversion done – exit */
}
return real_close(fd);
}
"""
if __name__ == "__main__":
import sys
result = run_soffice(sys.argv[1:])
sys.exit(result.returncode)scripts/office/unpack.py
Descargar scripts/office/unpack.py
"""Unpack Office files (DOCX, PPTX, XLSX) for editing.
Extracts the ZIP archive, pretty-prints XML files, and optionally:
- Merges adjacent runs with identical formatting (DOCX only)
- Simplifies adjacent tracked changes from same author (DOCX only)
Usage:
python unpack.py <office_file> <output_dir> [options]
Examples:
python unpack.py document.docx unpacked/
python unpack.py presentation.pptx unpacked/
python unpack.py document.docx unpacked/ --merge-runs false
"""
import argparse
import sys
import zipfile
from pathlib import Path
import defusedxml.minidom
from helpers.merge_runs import merge_runs as do_merge_runs
from helpers.simplify_redlines import simplify_redlines as do_simplify_redlines
SMART_QUOTE_REPLACEMENTS = {
"\u201c": "“",
"\u201d": "”",
"\u2018": "‘",
"\u2019": "’",
}
def unpack(
input_file: str,
output_directory: str,
merge_runs: bool = True,
simplify_redlines: bool = True,
) -> tuple[None, str]:
input_path = Path(input_file)
output_path = Path(output_directory)
suffix = input_path.suffix.lower()
if not input_path.exists():
return None, f"Error: {input_file} does not exist"
if suffix not in {".docx", ".pptx", ".xlsx"}:
return None, f"Error: {input_file} must be a .docx, .pptx, or .xlsx file"
try:
output_path.mkdir(parents=True, exist_ok=True)
with zipfile.ZipFile(input_path, "r") as zf:
zf.extractall(output_path)
xml_files = list(output_path.rglob("*.xml")) + list(output_path.rglob("*.rels"))
for xml_file in xml_files:
_pretty_print_xml(xml_file)
message = f"Unpacked {input_file} ({len(xml_files)} XML files)"
if suffix == ".docx":
if simplify_redlines:
simplify_count, _ = do_simplify_redlines(str(output_path))
message += f", simplified {simplify_count} tracked changes"
if merge_runs:
merge_count, _ = do_merge_runs(str(output_path))
message += f", merged {merge_count} runs"
for xml_file in xml_files:
_escape_smart_quotes(xml_file)
return None, message
except zipfile.BadZipFile:
return None, f"Error: {input_file} is not a valid Office file"
except Exception as e:
return None, f"Error unpacking: {e}"
def _pretty_print_xml(xml_file: Path) -> None:
try:
content = xml_file.read_text(encoding="utf-8")
dom = defusedxml.minidom.parseString(content)
xml_file.write_bytes(dom.toprettyxml(indent=" ", encoding="utf-8"))
except Exception:
pass
def _escape_smart_quotes(xml_file: Path) -> None:
try:
content = xml_file.read_text(encoding="utf-8")
for char, entity in SMART_QUOTE_REPLACEMENTS.items():
content = content.replace(char, entity)
xml_file.write_text(content, encoding="utf-8")
except Exception:
pass
if __name__ == "__main__":
parser = argparse.ArgumentParser(
description="Unpack an Office file (DOCX, PPTX, XLSX) for editing"
)
parser.add_argument("input_file", help="Office file to unpack")
parser.add_argument("output_directory", help="Output directory")
parser.add_argument(
"--merge-runs",
type=lambda x: x.lower() == "true",
default=True,
metavar="true|false",
help="Merge adjacent runs with identical formatting (DOCX only, default: true)",
)
parser.add_argument(
"--simplify-redlines",
type=lambda x: x.lower() == "true",
default=True,
metavar="true|false",
help="Merge adjacent tracked changes from same author (DOCX only, default: true)",
)
args = parser.parse_args()
_, message = unpack(
args.input_file,
args.output_directory,
merge_runs=args.merge_runs,
simplify_redlines=args.simplify_redlines,
)
print(message)
if "Error" in message:
sys.exit(1)scripts/office/validate.py
Descargar scripts/office/validate.py
"""
Command line tool to validate Office document XML files against XSD schemas and tracked changes.
Usage:
python validate.py <path> [--original <original_file>] [--auto-repair] [--author NAME]
The first argument can be either:
- An unpacked directory containing the Office document XML files
- A packed Office file (.docx/.pptx/.xlsx) which will be unpacked to a temp directory
Auto-repair fixes:
- paraId/durableId values that exceed OOXML limits
- Missing xml:space="preserve" on w:t elements with whitespace
"""
import argparse
import sys
import tempfile
import zipfile
from pathlib import Path
from validators import DOCXSchemaValidator, PPTXSchemaValidator, RedliningValidator
def main():
parser = argparse.ArgumentParser(description="Validate Office document XML files")
parser.add_argument(
"path",
help="Path to unpacked directory or packed Office file (.docx/.pptx/.xlsx)",
)
parser.add_argument(
"--original",
required=False,
default=None,
help="Path to original file (.docx/.pptx/.xlsx). If omitted, all XSD errors are reported and redlining validation is skipped.",
)
parser.add_argument(
"-v",
"--verbose",
action="store_true",
help="Enable verbose output",
)
parser.add_argument(
"--auto-repair",
action="store_true",
help="Automatically repair common issues (hex IDs, whitespace preservation)",
)
parser.add_argument(
"--author",
default="Claude",
help="Author name for redlining validation (default: Claude)",
)
args = parser.parse_args()
path = Path(args.path)
assert path.exists(), f"Error: {path} does not exist"
original_file = None
if args.original:
original_file = Path(args.original)
assert original_file.is_file(), f"Error: {original_file} is not a file"
assert original_file.suffix.lower() in [".docx", ".pptx", ".xlsx"], (
f"Error: {original_file} must be a .docx, .pptx, or .xlsx file"
)
file_extension = (original_file or path).suffix.lower()
assert file_extension in [".docx", ".pptx", ".xlsx"], (
f"Error: Cannot determine file type from {path}. Use --original or provide a .docx/.pptx/.xlsx file."
)
if path.is_file() and path.suffix.lower() in [".docx", ".pptx", ".xlsx"]:
temp_dir = tempfile.mkdtemp()
with zipfile.ZipFile(path, "r") as zf:
zf.extractall(temp_dir)
unpacked_dir = Path(temp_dir)
else:
assert path.is_dir(), f"Error: {path} is not a directory or Office file"
unpacked_dir = path
match file_extension:
case ".docx":
validators = [
DOCXSchemaValidator(unpacked_dir, original_file, verbose=args.verbose),
]
if original_file:
validators.append(
RedliningValidator(unpacked_dir, original_file, verbose=args.verbose, author=args.author)
)
case ".pptx":
validators = [
PPTXSchemaValidator(unpacked_dir, original_file, verbose=args.verbose),
]
case _:
print(f"Error: Validation not supported for file type {file_extension}")
sys.exit(1)
if args.auto_repair:
total_repairs = sum(v.repair() for v in validators)
if total_repairs:
print(f"Auto-repaired {total_repairs} issue(s)")
success = all(v.validate() for v in validators)
if success:
print("All validations PASSED!")
sys.exit(0 if success else 1)
if __name__ == "__main__":
main()scripts/office/validators/init.py
Descargar scripts/office/validators/init.py
"""
Validation modules for Word document processing.
"""
from .base import BaseSchemaValidator
from .docx import DOCXSchemaValidator
from .pptx import PPTXSchemaValidator
from .redlining import RedliningValidator
__all__ = [
"BaseSchemaValidator",
"DOCXSchemaValidator",
"PPTXSchemaValidator",
"RedliningValidator",
]scripts/office/validadores/base.py
Descargar scripts/office/validators/base.py
Recurso binario
scripts/office/validadores/docx.py
Descargar scripts/office/validators/docx.py
Recurso binario
scripts/office/validadores/pptx.py
Descargar scripts/office/validators/pptx.py
Recurso binario
scripts/office/validators/redlining.py
Descargar scripts/office/validators/redlining.py
Recurso binario
scripts/recalc.py
"""
Excel Formula Recalculation Script
Recalculates all formulas in an Excel file using LibreOffice
"""
import json
import os
import platform
import subprocess
import sys
from pathlib import Path
from office.soffice import get_soffice_env
from openpyxl import load_workbook
MACRO_DIR_MACOS = "~/Library/Application Support/LibreOffice/4/user/basic/Standard"
MACRO_DIR_LINUX = "~/.config/libreoffice/4/user/basic/Standard"
MACRO_FILENAME = "Module1.xba"
RECALCULATE_MACRO = """<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE script:module PUBLIC "-//OpenOffice.org//DTD OfficeDocument 1.0//EN" "module.dtd">
<script:module xmlns:script="http://openoffice.org/2000/script" script:name="Module1" script:language="StarBasic">
Sub RecalculateAndSave()
ThisComponent.calculateAll()
ThisComponent.store()
ThisComponent.close(True)
End Sub
</script:module>"""
def has_gtimeout():
try:
subprocess.run(
["gtimeout", "--version"], capture_output=True, timeout=1, check=False
)
return True
except (FileNotFoundError, subprocess.TimeoutExpired):
return False
def setup_libreoffice_macro():
macro_dir = os.path.expanduser(
MACRO_DIR_MACOS if platform.system() == "Darwin" else MACRO_DIR_LINUX
)
macro_file = os.path.join(macro_dir, MACRO_FILENAME)
if (
os.path.exists(macro_file)
and "RecalculateAndSave" in Path(macro_file).read_text()
):
return True
if not os.path.exists(macro_dir):
subprocess.run(
["soffice", "--headless", "--terminate_after_init"],
capture_output=True,
timeout=10,
env=get_soffice_env(),
)
os.makedirs(macro_dir, exist_ok=True)
try:
Path(macro_file).write_text(RECALCULATE_MACRO)
return True
except Exception:
return False
def recalc(filename, timeout=30):
if not Path(filename).exists():
return {"error": f"File {filename} does not exist"}
abs_path = str(Path(filename).absolute())
if not setup_libreoffice_macro():
return {"error": "Failed to setup LibreOffice macro"}
cmd = [
"soffice",
"--headless",
"--norestore",
"vnd.sun.star.script:Standard.Module1.RecalculateAndSave?language=Basic&location=application",
abs_path,
]
if platform.system() == "Linux":
cmd = ["timeout", str(timeout)] + cmd
elif platform.system() == "Darwin" and has_gtimeout():
cmd = ["gtimeout", str(timeout)] + cmd
result = subprocess.run(cmd, capture_output=True, text=True, env=get_soffice_env())
if result.returncode != 0 and result.returncode != 124:
error_msg = result.stderr or "Unknown error during recalculation"
if "Module1" in error_msg or "RecalculateAndSave" not in error_msg:
return {"error": "LibreOffice macro not configured properly"}
return {"error": error_msg}
try:
wb = load_workbook(filename, data_only=True)
excel_errors = [
"#VALUE!",
"#DIV/0!",
"#REF!",
"#NAME?",
"#NULL!",
"#NUM!",
"#N/A",
]
error_details = {err: [] for err in excel_errors}
total_errors = 0
for sheet_name in wb.sheetnames:
ws = wb[sheet_name]
for row in ws.iter_rows():
for cell in row:
if cell.value is not None and isinstance(cell.value, str):
for err in excel_errors:
if err in cell.value:
location = f"{sheet_name}!{cell.coordinate}"
error_details[err].append(location)
total_errors += 1
break
wb.close()
result = {
"status": "success" if total_errors == 0 else "errors_found",
"total_errors": total_errors,
"error_summary": {},
}
for err_type, locations in error_details.items():
if locations:
result["error_summary"][err_type] = {
"count": len(locations),
"locations": locations[:20],
}
wb_formulas = load_workbook(filename, data_only=False)
formula_count = 0
for sheet_name in wb_formulas.sheetnames:
ws = wb_formulas[sheet_name]
for row in ws.iter_rows():
for cell in row:
if (
cell.value
and isinstance(cell.value, str)
and cell.value.startswith("=")
):
formula_count += 1
wb_formulas.close()
result["total_formulas"] = formula_count
return result
except Exception as e:
return {"error": str(e)}
def main():
if len(sys.argv) < 2:
print("Usage: python recalc.py <excel_file> [timeout_seconds]")
print("\nRecalculates all formulas in an Excel file using LibreOffice")
print("\nReturns JSON with error details:")
print(" - status: 'success' or 'errors_found'")
print(" - total_errors: Total number of Excel errors found")
print(" - total_formulas: Number of formulas in the file")
print(" - error_summary: Breakdown by error type with locations")
print(" - #VALUE!, #DIV/0!, #REF!, #NAME?, #NULL!, #NUM!, #N/A")
sys.exit(1)
filename = sys.argv[1]
timeout = int(sys.argv[2]) if len(sys.argv) > 2 else 30
result = recalc(filename, timeout)
print(json.dumps(result, indent=2))
if __name__ == "__main__":
main()Ver en GitHub
Pptx
Utilice esta habilidad cada vez que un archivo.pptx esté involucrado de alguna manera: como entrada, salida o ambas. Esto incluye: crear conjuntos de diapositivas, presentaciones o presentaciones; leer, analizar o extraer texto de cualquier archivo.pptx (incluso si el contenido extraído se utilizará en otro lugar, como en un correo electrónico o un resumen); editar, modificar o actualizar presentaciones existentes; combinar o dividir archivos de diapositivas; trabajar con plantillas, diseños, notas del orador o comentarios. Se activa cada vez que el usuario menciona "plataforma", "diapositivas", "presentación" o hace referencia a un nombre de archivo.pptx, independientemente de lo que planee hacer con el contenido después. Si es necesario abrir, crear o tocar un archivo.pptx, utilice esta habilidad.
Pautas de marca
Manual de habilidades del agente para ensamblar la voz de la marca, reglas visuales y activos reutilizables para que Claude Skills produzca entregas creativas consistentes.
claudeskills Docs