我有一个包含数千个 div class =date / div ul … / ul的HTML文件代码块如下:!DOCTYPE htmlhtmlhead/headbodydiv class=dateWed May 23 2018/divulliDo laundryulliGet coins/li...
我有一个包含数千个< div class ='date'>< / div>< ul> …< / ul>的HTML文件代码块如下:
<!DOCTYPE html>
<html>
<head>
</head>
<body>
<div class="date">Wed May 23 2018</div>
<ul>
<li>
Do laundry
<ul>
<li>
Get coins
</li>
</ul>
</li>
<li>
Wash the dishes
</li>
</ul>
<div class='date'>Thu May 24 2018</div>
<ul>
<li>
Solve the world's hunger problem
<ul>
<li>
Don't tell anyone
</li>
</ul>
</li>
<li>
Get something to wear
</li>
</ul>
<div class='date'>Fri May 25 2018</div>
<ul>
<li>
Modify the website according to GDPR
</li>
<li>
Watch YouTube
</li>
</ul>
</body>
</html>
每个< div>和相应的< ul>元素是针对特定日期的. < div class ='date'>< / div>< ul> …< / ul>的块按升序排序,即较新的日期位于文件的底部.我打算按降序排列它们,以便较新的日期位于文件的顶部,如下所示:
<!DOCTYPE html>
<html>
<head>
</head>
<body>
<div class='date'>Fri May 25 2018</div>
<ul>
<li>
Modify the website according to GDPR
</li>
<li>
Watch YouTube
</li>
</ul>
<div class='date'>Thu May 24 2018</div>
<ul>
<li>
Solve the world's hunger problem
<ul>
<li>
Don't tell anyone
</li>
</ul>
</li>
<li>
Get something to wear
</li>
</ul>
<div class="date">Wed May 23 2018</div>
<ul>
<li>
Do laundry
<ul>
<li>
Get coins
</li>
</ul>
</li>
<li>
Wash the dishes
</li>
</ul>
</body>
</html>
我不确定什么是正确的工具,是shell脚本吗?是awk吗?是Python吗?还有什么其他可能更快更方便的?
解决方法:
扩展Python解决方案:
sort_html_by_date.py脚本:
from bs4 import BeautifulSoup
from datetime import datetime
with open('input.html') as html_doc: # replace with your actual html file name
soup = BeautifulSoup(html_doc, 'lxml')
divs = {}
for div in soup.find_all('div', 'date'):
divs[datetime.strptime(div.string, '%a %B %d %Y')] = \
str(div) + '\n' + div.find_next_sibling('ul').prettify()
soup.body.clear()
for el in sorted(divs, reverse=True):
soup.body.append(divs[el])
print(soup.prettify(formatter=None))
用法:
python sort_html_by_date.py
输出:
<!DOCTYPE html>
<html>
<head>
</head>
<body>
<div class="date">Fri May 25 2018</div>
<ul>
<li>
Modify the website according to GDPR
</li>
<li>
Watch YouTube
</li>
</ul>
<div class="date">Thu May 24 2018</div>
<ul>
<li>
Solve the world's hunger problem
<ul>
<li>
Don't tell anyone
</li>
</ul>
</li>
<li>
Get something to wear
</li>
</ul>
<div class="date">Wed May 23 2018</div>
<ul>
<li>
Do laundry
<ul>
<li>
Get coins
</li>
</ul>
</li>
<li>
Wash the dishes
</li>
</ul>
</body>
</html>
二手模块:
beautifulsoup – https://www.crummy.com/software/BeautifulSoup/bs4/doc/
datetime – https://docs.python.org/3.3/library/datetime.html#module-datetime
本文标题为:shell-script – 用于反转HTML文件中数千个元素的排序顺序的正确工具
基础教程推荐
- Vue中Element-UI日历无法缩小的问题 2023-10-08
- js中toString方法3个作用 2023-08-08
- 使用display:none时隐藏DOM元素无法获取实际宽高的解决方法 2022-11-23
- [vue] 关于性能优化 2023-10-08
- Struts2和Ajax数据交互示例详解 2023-02-14
- Vue拖拽自定义顺序之draggable 2023-10-08
- ajax文件上传成功 解决浏览器兼容问题 2022-12-28
- vue-自定义属性 2023-10-08
- Ajax配合Spring实现文件上传功能代码 2023-02-01
- Ajax打开新窗口被浏览器拦截的两种解决办法 2023-01-26
