JavaMail实战——内容解析,如何去除历史内容,只收取本次内容 - 高飞网
40 人阅读

JavaMail实战——内容解析,如何去除历史内容,只收取本次内容

2016-11-29 15:09:07

    这个问题不太好描述,因此还原一下场景:

    看下面的邮件,这封邮件是对之前一封邮件的回复,因此在内容上就把之前邮件的内容也附加上了,那如果想只取本次邮件内容,该怎么做呢?


    笔者在JavaMail API和邮件协议上都没有找到好的解决办法,有对邮件协议深刻了解的同学可以赐教,本文通过对内容分析,结构整理,“总结”出一套解决方案,但方案并不完美。

1. 原始内容用blockquote标签包裹

实现即代码中的remove1

2. 原始内容用includetail标签包裹


实现即代码中的remove2

3. 邮件初始内容是纯文本不含有html标签

实现即 代码中的remove0


4. 通过与原始内容的连接点关键词

如“发件人”分析连接点,去除后面的内容,实现即代码中的remove3


完整代码如下

    public String getSimpleBodyText() {
        if (this.bodyText != null) {
            return remove(bodyText);
        }
        return bodyText;
    }

    public static String remove(final String content) {
        String content0 = content;
        content0 = remove1(content0);
        content0 = remove2(content0);
        if (content.equals(content0)) {
            content0 = remove0(content0);
        }
        content0 = remove3(content0);
        return content0;
    }

    public static String remove1(String content) {
        int index1 = content.indexOf("<blockquote");
        int index2 = content.lastIndexOf("blockquote>");
        if (index1 != -1 && index2 != -1) {
            logger.debug("remove1-blockquote:" + index1 + "," + index2);
            return content.substring(0, index1) + content.substring(index2 + "blockquote>".length());
        }
        return content;
    }

    public static String remove0(String content) {
        if (!content.trim().startsWith("<")) {
            logger.debug("remove0:");
            return content.substring(0, content.indexOf("<"));
        }
        return content;
    }

    public static String remove2(String content) {
        int index1 = content.indexOf("<includetail");
        int index2 = content.lastIndexOf("includetail>");
        if (index1 != -1 && index2 != -1) {
            logger.debug("remove2-includetail:" + index1 + "," + index2);
            return content.substring(0, index1) + content.substring(index2 + "includetail>".length());
        }
        return content;
    }

    public static String remove3(String content) {
        int index1 = -1;
        int index2 = -1;
        try {
            Parser parser = new Parser(content);
            NodeFilter pFilter = new TagNameFilter("div");
            NodeList nodeList = parser.parse(pFilter);
            SimpleNodeIterator elements = nodeList.elements();
            while (elements.hasMoreNodes()) {
                Node node = elements.nextNode();
                String html = node.toHtml();
                if (node.toString().contains("WordSection1")) {
                    index2 = node.getStartPosition() + html.length();
                    continue;
                }
                if (node.toString().contains("Section1")) {
                    index2 = node.getStartPosition() + html.length();
                    continue;
                }
                if (node.toString().contains("mailContentContainer")) {
                    index2 = node.getStartPosition() + html.length();
                    continue;
                }
                if (html.contains("发件人") || html.contains("From")) {
                    if (node.getStartPosition() > 0) {
                        index1 = node.getStartPosition();
                        if (index2 == -1) {
                            if (node.getParent() != null && node.getParent().getLastChild() != null) {
                                Node lastChild = node.getParent().getLastChild();
                                index2 = lastChild.getStartPosition() + lastChild.toHtml().length();
                            }
                        }
                        break;
                    }
                }
            }
        } catch (ParserException e) {
            e.printStackTrace();
        }
        if (index1 != -1 && index2 != -1) {
            logger.debug("remove3-发件人/From:" + index1 + "," + index2);
            return content.substring(0, index1) + content.substring(index2);
        }
        return content;
    }


还没有评论!
54.147.247.194