作者: Jet L

  • 【Python】对本地网页进行元素提取并输出Excel

    一些网页通过加载Js来保护页面元素,当我们突破Js得到本地页面时,可以使用BS4库对页面进行分析,提取对应的元素来综合有价值的内容。

    示例代码:

    import requests
    from bs4 import BeautifulSoup
    import pandas as pd
    
    # 发送请求并获取网页内容
    url = 'your_local_or_online_page_url'
    response = requests.get(url)
    soup = BeautifulSoup(response.text, 'html.parser')
    
    # 定义一个空的列表来存储提取的数据
    data = []
    
    # 遍历页面中的项目列表,假设项目数据都在某个父元素中
    projects = soup.find_all('tr', class_='project-id')  # 根据实际情况修改选择器
    
    # 提取每个项目的各项数据
    for project in projects:
        # 获取项目ID
        project_id = project.find('td', class_='project-id-class').get_text(strip=True)  # 修改为实际选择器
        
        # 将提取的数据添加到列表中
        data.append([project_id]) # 按实际修改
    
    # 创建 DataFrame 并保存为 Excel
    df = pd.DataFrame(data, columns=['ID']) # 按实际修改
    df.to_excel('projects_data.xlsx', index=False)
    
    print("Data has been successfully extracted and saved to 'projects_data.xlsx'.")

    主要用到了BS4库。

    示意代码:

    from bs4 import BeautifulSoup
    
    # 假设有一个HTML文档
    html_doc = """
    <html>
      <head><title>Example Page</title></head>
      <body>
        <p class="title"><b>Sample Page</b></p>
        <p class="story">This is a test story. <a href="http://example.com/1" class="link">link1</a> <a href="http://example.com/2" class="link">link2</a></p>
        <p class="story">Another test story.</p>
      </body>
    </html>
    """
    
    # 使用 BeautifulSoup 解析 HTML 文档
    soup = BeautifulSoup(html_doc, 'html.parser')
    
    # 提取<title>标签的内容
    title = soup.title.string
    print(f"Title: {title}")
    
    # 提取所有的链接(<a> 标签)
    links = soup.find_all('a')
    for link in links:
        print(f"Link text: {link.string}, URL: {link['href']}")
    
    # 查找特定类的<p>标签
    story_paragraphs = soup.find_all('p', class_='story')
    for p in story_paragraphs:
        print(f"Story paragraph: {p.get_text()}")

  • 【HTML】iframe小工具——提取嵌入链接并重设参数

    一般类似YouTube、Bilibili的分享链接,都设置了各自网站的相应参数,为了快速提取其src内容并自定义部分参数,可以使用该小工具进行快速设置。

    操作区
    预览展示区
    查看代码

    <style>
            textarea, input, select {
                width: 100%;
                margin-bottom: 10px;
                padding: 5px;
                font-size: 14px;
            }
            button {
                font-size: 16px;
                margin-bottom: 10px;
                padding: 5px 10px;
            }
            .link-container {
                margin-top: 10px;
            }
            .link-item {
                margin-bottom: 5px;
            }
            .iframe-preview {
                margin-top: 20px;
                padding: 10px;
                border: 1px solid #ddd;
                background: #f9f9f9;
            }
            .iframe-preview pre {
                font-size: 14px;
                background: #e9e9e9;
                padding: 10px;
                border-radius: 5px;
            }
            .row {
                display: flex;
                flex-wrap: wrap;
                gap: 10px;
            }
            .col {
                flex: 1 1 20%;
            }
            .col input, .col select {
                width: 100%;
            }
            .unit-select {
                width: 10px; /*缩小单位选择框宽度*/
            }
            .empty-option {
                font-size: 14px;
            }
            #iframePreviewContainer {
                margin-top: 20px;
            }
        </style>
    </head>
    <body>
        <!-- 输入框 -->
        <textarea id="iframeInput" placeholder="在此输入多个 iframe 代码"></textarea>
        <button id="extractButton">提取链接</button>
    
        <!-- 链接展示区 -->
        <div id="result" class="link-container"></div>
    
        <!-- 单个链接操作区 -->
        <h2>操作区</h2>
        <input id="selectedLink" type="text" placeholder="点击复制按钮后,链接将填入此处" readonly>
        
        <!-- iframe 属性设置 -->
        <div class="row">
            <div class="col">
                <label for="iframeTitle">标题:</label>
                <input id="iframeTitle" type="text" placeholder="请输入 iframe 标题">
            </div>
            <div class="col">
                <label for="iframeWidth">宽度:</label>
                <input id="iframeWidth" type="text" placeholder="例如 560">
            </div>
            <div class="col">
                <label for="iframeWidthUnit">宽度单位:</label>
                <select id="iframeWidthUnit" class="unit-select">
                    <option value="px">px</option>
                    <option value="%">%</option>
                    <option value="vw">vw</option>
                </select>
            </div>
            <div class="col">
                <label for="iframeHeight">高度:</label>
                <input id="iframeHeight" type="text" placeholder="例如 315">
            </div>
            <div class="col">
                <label for="iframeHeightUnit">高度单位:</label>
                <select id="iframeHeightUnit" class="unit-select">
                    <option value="px">px</option>
                    <option value="%">%</option>
                    <option value="vh">vh</option>
                </select>
            </div>
        </div>
    
        <div class="row">
            <div class="col">
                <label for="iframeFullscreen">允许全屏:</label>
                <select id="iframeFullscreen">
                    <option value="allowfullscreen">是</option>
                    <option value="">否</option>
                </select>
            </div>
            <div class="col">
                <label for="iframeReferrer">Referrer Policy:</label>
                <select id="iframeReferrer">
                    <option value="no-referrer">不发送</option>
                    <option value="no-referrer-when-downgrade">仅同源</option>
                    <option value="origin">仅发送源</option>
                    <option value="origin-when-cross-origin">跨源时发送源</option>
                    <option value="same-origin">同源发送完整路径</option>
                    <option value="strict-origin">严格同源发送源</option>
                    <option value="strict-origin-when-cross-origin">默认(严格同源)</option>
                    <option value="unsafe-url">发送完整 URL</option>
                    <option value="">保持空值</option>
                </select>
            </div>
            <div class="col">
                <label for="iframeLoading">加载方式:</label>
                <select id="iframeLoading">
                    <option value="eager">立即加载</option>
                    <option value="lazy">懒加载</option>
                    <option value="">保持空值</option>
                </select>
            </div>
            <div class="col">
                <label for="iframeAutoplay">自动播放:</label>
                <select id="iframeAutoplay">
                    <option value="autoplay">是</option>
                    <option value="">否</option>
                </select>
            </div>
        </div>
    
        <div class="row">
            <div class="col">
                <label for="iframeEncrypted">加密媒体:</label>
                <select id="iframeEncrypted">
                    <option value="encrypted-media">是</option>
                    <option value="">否</option>
                </select>
            </div>
            <div class="col">
                <label for="iframePictureInPicture">画中画:</label>
                <select id="iframePictureInPicture">
                    <option value="picture-in-picture">是</option>
                    <option value="">否</option>
                </select>
            </div>
            <div class="col">
                <label for="iframeWebShare">Web分享:</label>
                <select id="iframeWebShare">
                    <option value="web-share">是</option>
                    <option value="">否</option>
                </select>
            </div>
        </div>
    
        <!-- 生成 iframe 和复制按钮 -->
        <div>
            <button id="generateIframeButton">生成 iframe 嵌入代码</button>
            <button id="copyIframeButton">复制生成代码</button>
        </div>
    
        <!-- iframe 代码展示 -->
        <div id="generatedIframe" class="iframe-preview">
            <textarea id="iframeCodeText" readonly rows="10"></textarea>
        </div>
    
        <!-- iframe 预览展示区 -->
        <div id="iframePreviewContainer" class="iframe-preview">
            <h2>预览展示区</h2>
            <iframe id="iframePreview" src="" width="560" height="315" style="border: none;"></iframe>
        </div>
    
        <script>
            // 缓存常用的 DOM 元素
            const iframeWidthInput = document.getElementById('iframeWidth');
            const iframeHeightInput = document.getElementById('iframeHeight');
            const iframeWidthUnit = document.getElementById('iframeWidthUnit');
            const iframeHeightUnit = document.getElementById('iframeHeightUnit');
            const selectedLinkInput = document.getElementById('selectedLink');
            const iframeFullscreenSelect = document.getElementById('iframeFullscreen');
            const iframeReferrerSelect = document.getElementById('iframeReferrer');
            const iframeLoadingSelect = document.getElementById('iframeLoading');
            const iframeAutoplaySelect = document.getElementById('iframeAutoplay');
            const iframeEncryptedSelect = document.getElementById('iframeEncrypted');
            const iframePictureInPictureSelect = document.getElementById('iframePictureInPicture');
            const iframeWebShareSelect = document.getElementById('iframeWebShare');
            const iframeTitleInput = document.getElementById('iframeTitle');
            const generateIframeButton = document.getElementById('generateIframeButton');
            const copyIframeButton = document.getElementById('copyIframeButton');
            const iframeCodeText = document.getElementById('iframeCodeText');
            const iframePreview = document.getElementById('iframePreview');
            const resultDiv = document.getElementById('result');
    
            // 提取 iframe src 链接
            function extractSrc() {
                const input = document.getElementById('iframeInput').value;
                resultDiv.innerHTML = ''; // 清空之前的结果
    
                const srcMatches = [...input.matchAll(/src="([^"]+)"/g)];
                if (srcMatches.length > 0) {
                    const srcLinks = srcMatches.map(match => match[1]);
    
                    srcLinks.forEach(link => {
                        const linkItem = createLinkItem(link);
                        resultDiv.appendChild(linkItem);
                    });
                } else {
                    resultDiv.textContent = '没有找到有效的 iframe 链接';
                }
            }
    
            // 创建链接项
            function createLinkItem(link) {
                const div = document.createElement('div');
                div.classList.add('link-item');
    
                const textNode = document.createTextNode(link);
                const copyButton = document.createElement('button');
                copyButton.textContent = '复制';
                copyButton.onclick = function() {
                    selectedLinkInput.value = link;
                };
    
                div.appendChild(textNode);
                div.appendChild(copyButton);
                return div;
            }
    
            // 生成 iframe 代码
            function generateIframeCode() {
                const width = iframeWidthInput.value;
                const height = iframeHeightInput.value;
                const widthUnit = iframeWidthUnit.value;
                const heightUnit = iframeHeightUnit.value;
                const title = iframeTitleInput.value;
                const src = selectedLinkInput.value;
                const fullscreen = iframeFullscreenSelect.value;
                const referrer = iframeReferrerSelect.value;
                const loading = iframeLoadingSelect.value;
                const autoplay = iframeAutoplaySelect.value;
                const encrypted = iframeEncryptedSelect.value;
                const pictureInPicture = iframePictureInPictureSelect.value;
                const webShare = iframeWebShareSelect.value;
    
                let iframeCode = `<iframe src="${src}"`;
    
                if (title) {
                    iframeCode += ` title="${title}"`;
                }
                iframeCode += ` width="${width}${widthUnit}" height="${height}${heightUnit}"`;
    
                if (fullscreen) {
                    iframeCode += ` ${fullscreen}`;
                }
    
                if (referrer) {
                    iframeCode += ` referrerpolicy="${referrer}"`;
                }
    
                if (loading) {
                    iframeCode += ` loading="${loading}"`;
                }
    
                if (autoplay) {
                    iframeCode += ` ${autoplay}`;
                }
    
                if (encrypted) {
                    iframeCode += ` ${encrypted}`;
                }
    
                if (pictureInPicture) {
                    iframeCode += ` ${pictureInPicture}`;
                }
    
                if (webShare) {
                    iframeCode += ` ${webShare}`;
                }
    
                iframeCode += '></iframe>';
    
                iframeCodeText.value = iframeCode;
                iframePreview.src = src;
            }
    
            // 复制代码到剪贴板
            function copyIframeCode() {
                iframeCodeText.select();
                document.execCommand('copy');
            }
    
            // 事件监听
            document.getElementById('extractButton').addEventListener('click', extractSrc);
            generateIframeButton.addEventListener('click', generateIframeCode);
            copyIframeButton.addEventListener('click', copyIframeCode);
        </script>
    </body>
  • 【摄影特辑】青岛2023

    【摄影特辑】青岛2023

    镜头:Panasonic 30mm Macro

    相机:Panasonic G9M2

    后期:Snapseed

    很荣幸,第二张图片还被松下影像翻了牌子