biblioshiny: the shiny app for bibliometrix

Biblioshiny 5.0 now includes Biblio AI – a powerful AI assistant for your science mapping analyses.

biblioshiny and bibliometrix are open-source and freely available for use, distributed under the MIT license.

When they are used in a publication, we ask that authors to cite the following reference:

Aria, M., & Cuccurullo, C. (2017). bibliometrix: An R-tool for comprehensive

science mapping analysis. Journal of Informetrics , 11(4), 959-975.

Failure to properly cite the software is considered a violation of the license.

For an introduction and live examples, visit the bibliometrix website.

SAAS Workflow

Search - Appraisal - Analysis - Synthesis

About the SAAS Workflow

The SAAS workflow represents the comprehensive process of bibliometrix and biblioshiny for conducting scientific bibliometric analysis. Each phase is designed to ensure methodological rigor and reliable results:

Search: Systematic collection of bibliographic data from academic databases
Appraisal: Quality assessment and filtering of collected data
Analysis: Application of advanced bibliometric techniques and AI
Synthesis: Results synthesis and scientific report generation

The iterative cycle allows continuous refinement of the analysis by returning to previous phases based on obtained results.

SAAS Workflow developed by:
Massimo Aria & Corrado Cuccurullo
University of Naples Federico II, Italy

Aria, M., & Cuccurullo, C. (2017). bibliometrix: An R-tool for comprehensive science mapping analysis. Journal of Informetrics, 11(4), 959-975.

🧠 Biblio AI: AI-Powered Bibliometric Analysis

Starting from version 5.0, Biblioshiny introduces Biblio AI, a new suite of features powered by Google's Gemini models. This integration allows users to receive automatic interpretations, critical insights, and narrative summaries of their bibliometric results – directly within the platform.

Note: Biblio AI requires a Chrome-based browser (such as Google Chrome or Microsoft Edge) installed on your computer to work correctly.

✨ What does Biblio AI do?

Biblio AI enhances the core analytical modules of Biblioshiny by providing contextual, AI-generated commentary on several results, such as:

Overview: High-level summaries of key bibliometric indicators and collection features.
Three-Field Plot: Interpretation of the connections among sources, authors, and keywords.
Authors' Production over Time: Insights on temporal dynamics and productivity patterns of key authors.
Corresponding Author's Countries Collaboration: Discussion of international scientific collaboration patterns.
Most Local Cited Documents: Evaluation of the most influential documents within the dataset.
Reference Publication Year Spectroscopy: Identification and interpretation of historical citation peaks.
Trend Topics: Explanation of thematic evolution and detection of emerging research trends.
Knowledge Structures: Analysis of conceptual maps and networks such as co-citation and co-word analysis.
Country Collaboration World Map: AI-assisted reading of global co-authorship and geographical patterns.

In each of these sections, users can activate the Biblio AI panel to access dynamic text explanations, perfect for use in scientific writing, presentations, or reporting.

🔧 How to enable Biblio AI?

To enable Biblio AI, follow these simple steps:

Register at Google AI Studio (free access available).
Generate an API Key enabled for Gemini model access (Free Tier supported).
Enter your API Key in the Settings section of Biblioshiny.

The interface will guide you through the secure and local setup. Your API key is used only on your device to interact with the AI model.

🎯 Why use Biblio AI?

Reduces time spent interpreting complex outputs.
Supports scientific writing and research reporting.
Helps users better understand bibliometric patterns and dynamics.
Delivers explanations in natural language, accessible to both experts and newcomers.

📚 Supported Bibliographic Databases and Suggested File Formats

Biblioshiny imports and analyzes collections exported from the following bibliographic databases:

Web of Science, Scopus, and OpenAlex allow users to export the complete set of metadata, making it possible to perform all analyses implemented in Biblioshiny.

Some other databases, such as Dimensions, PubMed, and Cochrane Library, provide only a limited set of metadata. This may impose restrictions on the range of analyses that can be conducted using those datasets.

The following table (not included here) reports, for each supported database:

The file formats supported by the export interface
The types of metadata contained in each export option
The suggested file format to use with Biblioshiny

📖 Main Authors' References (Bibliometrics)

Aria, M., & Cuccurullo, C. (2017). bibliometrix: An R-tool for comprehensive science mapping analysis. Journal of Informetrics, 11(4), 959–975. https://doi.org/10.1016/j.joi.2017.08.007
Aria, M., Cuccurullo, C., D'Aniello, L., Misuraca, M., & Spano, M. (2024). Comparative science mapping: a novel conceptual structure analysis with metadata. Scientometrics. https://doi.org/10.1007/s11192-024-05161-6
Aria, M., Le, T., Cuccurullo, C., Belfiore, A., & Choe, J. (2023). openalexR: An R-Tool for Collecting Bibliometric Data from OpenAlex. R Journal, 15(4). https://doi.org/10.32614/rj-2023-089
Aria, M., Misuraca, M., & Spano, M. (2020). Mapping the evolution of social research and data science on 30 years of Social Indicators Research. Social Indicators Research. https://doi.org/10.1007/s11205-020-02281-3
Aria, M., Cuccurullo, C., D'Aniello, L., Misuraca, M., & Spano, M. (2022). Thematic Analysis as a New Culturomic Tool: The Social Media Coverage on COVID-19 Pandemic in Italy. Sustainability, 14(6), 3643. https://doi.org/10.3390/su14063643
Aria, M., Alterisio, A., Scandurra, A., Pinelli, C., & D'Aniello, B. (2021). The scholar's best friend: research trends in dog cognitive and behavioural studies. Animal Cognition. https://doi.org/10.1007/s10071-020-01448-2
Cuccurullo, C., Aria, M., & Sarto, F. (2016). Foundations and trends in performance management: A twenty-five years bibliometric analysis in business and public administration domains. Scientometrics. https://doi.org/10.1007/s11192-016-1948-8
Cuccurullo, C., Aria, M., & Sarto, F. (2015). Twenty years of research on performance management in business and public administration domains. Presented at CARME 2015. Link
Sarto, F., Cuccurullo, C., & Aria, M. (2014). Exploring healthcare governance literature: systematic review and paths for future research. Mecosan. Link
Cuccurullo, C., Aria, M., & Sarto, F. (2013). Twenty years of research on performance management in business and public administration domains. Academy of Management Proceedings, Vol. 2013, No. 1, p. 14270. https://doi.org/10.5465/AMBPP.2013.14270abstract
Belfiore, A., Salatino, A., & Osborne, F. (2022). Characterising Research Areas in the field of AI. arXiv preprint. https://doi.org/10.48550/arXiv.2205.13471
Belfiore, A., Cuccurullo, C., & Aria, M. (2022). IoT in healthcare: A scientometric analysis. Technological Forecasting and Social Change, 184, 122001. https://doi.org/10.1016/j.techfore.2022.122001
D'Aniello, L., Spano, M., Cuccurullo, C., & Aria, M. (2022). Academic Health Centers' configurations, scientific productivity, and impact: insights from the Italian setting. Health Policy. https://doi.org/10.1016/j.healthpol.2022.09.007
Belfiore, A., Scaletti, A., Lavorato, D., & Cuccurullo, C. (2022). The long process by which HTA became a paradigm: A longitudinal conceptual structure analysis. Health Policy. https://doi.org/10.1016/j.healthpol.2022.12.006

Import or Load
Info & References

Converting data to Bibliometrix format

Import or Load

Please, choose what to do

The use of bibliometric approaches in business and management disciplines.

Dataset 'Management'

A collection of scientific articles about the use of bibliometric approaches in business and management disciplines.

Period: 1985 - 2020, Source WoS.

Database

Author Name Format

Load a collection in XLSX or R format previously exported from bibliometrix

Choose a file

Browse...

Export Collection

Save as:

📥 Import or Load: Building Your Bibliometric Collection

The Import or Load module is the starting point for any bibliometric analysis in Biblioshiny. This section allows users to build their bibliographic collection by either importing raw files from supported databases or loading pre-processed bibliometrix files saved in previous sessions.

📂 Three Import Options

Biblioshiny offers three flexible ways to create or load a bibliographic collection:

1. Import Raw File(s)

Import bibliographic data directly from supported databases in their native export formats.

Supported Databases:

Web of Science (.txt, .bib format)
Scopus (.bib, .csv format)
OpenAlex (via API integration or pre-downloaded files)
Dimensions (.csv, .xlsx format)
Lens (.csv format)
PubMed (.txt format)
Cochrane Library (.txt format)

Import Process:

Click Browse to select one or more raw export files from your computer
Biblioshiny automatically detects the database format and parses the metadata
The system converts the raw data into a standardized bibliometrix data frame
A Conversion Results summary displays the number of documents successfully imported
View a preview table showing key metadata fields (DOI, Authors, Title, Journal, etc.)

Important Notes:

Files from different databases can be merged later using the Merge Collections module
For best results, export the full record with cited references from the source database
Some databases (e.g., Web of Science, Scopus) have export limits—download data in batches if necessary
Always check the file format requirements in the Info section before exporting from databases

2. Load Bibliometrix File(s)

Resume work on a previously processed collection by loading .rdata or .xlsx files generated by Biblioshiny or the bibliometrix R package.

Use Cases:

Continue analysis from a previous session
Load collections pre-processed using the bibliometrix R package
Share standardized datasets with collaborators
Work with large collections that have already undergone data cleaning and filtering

Supported Formats:

.rdata: R Data Serialization format (preserves full metadata and structure)
.xlsx: Excel format (compatible with bibliometrix exports)

3. Use a Sample Collection

Perfect for testing and learning Biblioshiny's features without preparing your own data.

Select from pre-loaded example datasets covering various research domains
Ideal for exploring the platform's analytical capabilities
No file upload required—start analyzing immediately

🔍 Post-Import Features

After successfully importing or loading a collection, you can:

View Collection Metadata: Preview document details in a sortable, filterable table
Add Brief Description: Write a custom description of your collection for documentation purposes
Export Collection: Save your processed collection as .rdata, .xlsx, or .csv for backup or sharing
Start Analysis: Click the blue Start button to proceed to filtering and analysis modules

💾 Exporting Collections

Once your collection is loaded, you can export it in multiple formats:

.rdata: Recommended for preserving all metadata and R-specific structures
.xlsx: Excel-compatible format for sharing with non-R users

⚠️ Best Practices

Always save your processed collections after importing raw files to avoid re-conversion
Use descriptive filenames when exporting (e.g., management_wos_1990-2020.rdata)
Check conversion results carefully—some database exports may have formatting issues that require manual correction
For large collections (>5,000 documents), consider applying filters early to improve performance

📚 References

Aria, M., & Cuccurullo, C. (2017). bibliometrix: An R-tool for comprehensive science mapping analysis. Journal of Informetrics, 11(4), 959–975. https://doi.org/10.1016/j.joi.2017.08.007

OpenAlex Data Collection

Search tips: Use quotes for exact phrases (e.g., "science mapping"). Boolean operators AND, OR, NOT must be UPPERCASE.

Date Range Filter

From Year:

To Year:

Document Type:

Language:

Max Records:

Downloaded Collection

Export to Excel Export to RData

PubMed Data Collection

Search tips: Use quotes for exact phrases (e.g., "machine learning"). Boolean operators AND, OR, NOT must be UPPERCASE. Learn more at PubMed Help.

Date Range Filter

From Year:

To Year:

Document Type:

Language:

Max Records:

Downloaded Collection

Export to Excel Export to RData

Merge Collections
Info & References

Load Collections

Merge collections in Excel or R format coming from different DBs

Select collection files

Browse...

Export collection

Save as:

🔀 Merge Collections: Combining Data from Multiple Sources

The Merge Collections module allows users to combine bibliographic datasets from different databases (Web of Science, Scopus, OpenAlex, PubMed, etc.) into a single unified collection. This functionality is essential for comprehensive literature reviews, cross-database validation, and maximizing metadata coverage by leveraging the strengths of multiple sources.

🎯 Why Merge Collections?

Broader Coverage: Different databases index different journals and document types—merging increases the comprehensiveness of your dataset
Complementary Metadata: Scopus may provide detailed affiliation data, while Web of Science offers comprehensive citation links—combining them enriches your analysis
Validation: Cross-referencing records from multiple sources improves data quality and identifies discrepancies
Deduplication: Automatically removes duplicate records that appear in multiple databases

🔧 How to Merge Collections

The merge process in Biblioshiny is straightforward:

Navigate to Merge Collections: Select Data > Merge Collections from the main menu
Select Collection Files: Click Browse and select two or more bibliometrix files to merge:
- Supported formats: .rdata, .xlsx
- Files can originate from different databases (e.g., wos_collection.rdata + scopus_collection.xlsx)
- Files must be valid bibliometrix data frames (created via Import or Load, or R package)
Configure Merge Options:
- Remove Duplicates: Enable (recommended) to automatically detect and remove duplicate records
- Verbose Output: Enable to display detailed information about the merge process and duplicates removed
Click Start: The merge algorithm combines the collections, standardizes metadata fields, and removes duplicates
Review Results: A summary displays the total number of documents and how many duplicates were removed
Export Merged Collection: Save the unified dataset for future analysis

🔬 Merge Algorithm Overview

The merge process follows a sophisticated multi-stage algorithm implemented by the mergeDbSources() function:

Stage 1: Database Identification and Ordering

Each collection is tagged with its source database (DB field: ISI, SCOPUS, OPENALEX, LENS, DIMENSIONS, PUBMED, COCHRANE)
Collections are ordered by database priority to preserve the most reliable metadata when conflicts arise
Order: Web of Science (ISI) > Scopus > OpenAlex > Lens > Dimensions > PubMed > Cochrane

Stage 2: Field Alignment

Common metadata fields are identified and aligned across databases (e.g., TI = Title, AU = Authors, DI = DOI)
Database-specific fields are preserved when possible
Missing fields in one database are filled from another when duplicates are detected
A unified KW_Merged field is created by combining keywords from all sources

Stage 3: Duplicate Detection

Duplicates are identified using a two-step matching strategy:

Step 3.1: DOI-Based Matching

Documents with identical DOIs are flagged as duplicates
This is the most reliable method, as DOIs are unique identifiers
Empty or missing DOIs ('' or NA) are ignored to avoid false positives
Only the first occurrence is retained; subsequent matches are removed

Step 3.2: Title-Year Matching

For records without DOIs, duplicates are detected using normalized titles and publication years
Title Normalization:
- Remove all punctuation and special characters
- Convert to lowercase
- Remove extra whitespace
- Example: 'Science Mapping: A Review' → 'science mapping a review'
Matching Criterion: Two documents are duplicates if they have:
- Identical normalized titles AND
- Identical publication years (PY)
This method captures ~95% of duplicates but may miss records with minor title variations

Stage 4: Author Name Standardization

When merging collections from multiple databases, author name formats are standardized:

Format: LASTNAME INITIALS (e.g., 'Aria M; Cuccurullo C')
Commas in author names are removed to ensure consistency
Middle initials are condensed to single letters
This standardization improves author-based analyses (e.g., collaboration networks, productivity rankings)

Stage 5: Metadata Integration

The DB_Original field stores each document's source database
The DB field is set to 'ISI' (Web of Science format) for compatibility with downstream analyses
Cited references (CR) are preserved but stored in CR_raw to allow re-processing later
A unique SR (Short Reference) identifier is generated for each document

📊 Merge Statistics and Validation

After merging, the system provides detailed statistics:

Total Documents Before Merge: Sum of all input collections
Duplicates Removed: Number of records eliminated (broken down by DOI matches and title-year matches)
Total Documents After Merge: Final collection size
Coverage by Database: Proportion of documents from each source (visible in DB_Original field)

Example Output:


    Merging 3 collections:

    - WoS: 1,500 documents

    - Scopus: 1,800 documents

    - OpenAlex: 2,000 documents

    Total: 5,300 documents

    

    Removing duplicates...

    - 450 duplicates removed by DOI

    - 320 duplicates removed by title-year match

    

    Final collection: 4,530 documents

📌 Best Practices

Always enable duplicate removal unless you have a specific reason to retain duplicates
Prioritize Web of Science or Scopus as the primary source—these databases generally have the most complete metadata
Use OpenAlex to supplement coverage for open-access publications or gray literature
Validate merge results by checking the distribution of DB_Original values—extreme imbalances may indicate incomplete data from one source
Save merged collections immediately to avoid re-processing

⚠️ Important Considerations

Citation Data: Merged collections reset the CR (Cited References) field—you'll need to run Reference Matching again after merging
Field Coverage: Some databases provide richer metadata than others—merging doesn't 'fill in' missing fields unless duplicates are detected
Large Collections: Merging collections >10,000 documents may take several minutes—be patient and avoid interrupting the process
Database-Specific Analyses: Some analyses are database-specific—merged collections may lose this granularity

🔍 Example Use Cases

Systematic Literature Review: Combine Web of Science, Scopus, and PubMed to ensure no relevant publications are missed
Open Science Research: Merge OpenAlex with traditional databases to include preprints and institutional repositories
Validation Study: Compare overlap between databases to assess index coverage and bias
Longitudinal Analysis: Merge historical Web of Science data with recent OpenAlex records to extend temporal coverage

📚 References

Aria, M., & Cuccurullo, C. (2017). bibliometrix: An R-tool for comprehensive science mapping analysis. Journal of Informetrics, 11(4), 959–975. https://doi.org/10.1016/j.joi.2017.08.007

Visser, M., van Eck, N. J., & Waltman, L. (2021). Large-scale comparison of bibliographic data sources: Scopus, Web of Science, Dimensions, Crossref, and Microsoft Academic. Quantitative Science Studies, 2(1), 20–41. https://doi.org/10.1162/qss_a_00112

Martín-Martín, A., Thelwall, M., Orduna-Malea, E., & Delgado López-Cózar, E. (2021). Google Scholar, Microsoft Academic, Scopus, Dimensions, Web of Science, and OpenCitations' COCI: a multidisciplinary comparison of coverage via citations. Scientometrics, 126(1), 871–906. https://doi.org/10.1007/s11192-020-03690-4

Reference Matching
Info & References

Reference Matching

This tool helps identify and merge duplicate citations in your bibliographic dataset. It uses string similarity algorithms to find variants of the same reference, allowing you to clean and standardize your data for more accurate analysis.

Matching Statistics

Manual Merge

Click on a row in the table below, then click 'Toggle Selection' to mark it for merging. Repeat for all citations to merge, then click 'Confirm Merge'.

Top Cited References (After Normalization)

Click on a row to view its citation variants below.

Citation Variants Examples

The table below shows all variants of the selected citation that were matched together.

Matching Options

Configure the citation matching algorithm parameters.

Similarity Threshold:

Guidelines:
• 0.90-0.95: Conservative (fewer false positives)
• 0.85-0.90: Balanced (recommended)
• 0.75-0.80: Aggressive (more matching)

Distance Method:

Matching in progress...

Please wait while citations are being normalized.

Apply to Data

Apply the normalized citations to your bibliometric data. This will update the CR field in your dataset.

Reset will restore the original CR field from your initial dataset.

Export Results

Save the bibliographic collection with normalized citations.

Export Format:

Excel (.xlsx)

R Data (.RData)

Both formats

Filename (without extension):

Download Normalized Data

The exported data will contain the bibliometric data with normalized citations in the CR field.

Advanced Options

Keep original CR field as CR_original

Add matching statistics columns

Download Detailed Report

🔗 Reference Matching: Algorithm and Usage

The Reference Matching module implements an advanced algorithm to identify and link cited references within a bibliometric collection to the actual documents present in the dataset. This process enables accurate citation network analysis, co-citation studies, and identification of highly-cited works within the collection.

🔬 Algorithm Overview

The reference matching algorithm follows a multi-step procedure designed to maximize accuracy while handling noisy and incomplete bibliographic data:

Reference Extraction: Cited references are parsed from the reference list (CR field) of each document in the collection. Each reference string is decomposed into structured components: first author surname, publication year, journal/source, volume, page, and DOI (when available).
Data Normalization: Both references and documents undergo extensive normalization to reduce variability:
- Author names are standardized (e.g., removing accents, abbreviations, and middle initials)
- Journal titles are normalized using abbreviation lookup tables and string similarity methods
- Years, volumes, and page numbers are cleaned and formatted uniformly
Blocking Strategy: To improve computational efficiency, references are grouped into blocks based on first author surname and publication year. Only references and documents within the same block are compared, reducing the search space significantly.
Similarity Computation: For each reference-document pair within a block, a matching score is calculated using a weighted combination of similarity measures:
- DOI matching (if available): exact match = 100% confidence
- First author similarity: string distance (Jaro-Winkler or Levenshtein)
- Year match: exact or within ±1 year tolerance
- Journal/source similarity: string distance between normalized titles
- Volume and page matching: exact or fuzzy comparison
Threshold-Based Assignment: A reference is matched to a document if the combined similarity score exceeds a predefined threshold (typically 0.85–0.95). The threshold can be adjusted by the user to balance precision and recall.
Ambiguity Resolution: In cases where a reference matches multiple documents, the algorithm selects the candidate with the highest similarity score. If scores are nearly identical, the match is flagged for manual review.

💡 Usage in Biblioshiny

To perform reference matching in Biblioshiny, follow these steps:

Load your bibliographic collection: Ensure that your dataset includes the CR (Cited References) field, which is available in full exports from Web of Science, Scopus, and OpenAlex.
Navigate to the Reference Matching module: Access the module from the Analysis menu or the Citation Network section.
Configure matching parameters:
- Similarity threshold: Adjust the matching threshold to control precision (higher values = stricter matching, fewer false positives).
- Normalization options: Enable or disable specific normalization rules (e.g., journal abbreviation matching, fuzzy year tolerance).
Run the algorithm: Click Start Matching to initiate the process. Depending on the collection size, this may take several minutes.
Review results: The output includes:
- A summary table of matched and unmatched references
- A list of ambiguous matches for manual inspection
- Network visualization options (e.g., co-citation network, historiograph)
Export matched data: The matched citation network can be exported for further analysis in external tools (e.g., Gephi, Pajek) or used directly in Biblioshiny for advanced network analysis.

⚙️ Key Parameters and Options

Matching Threshold: Minimum similarity score (0–1) required for a match. Default: 0.90. Lower values increase recall but may introduce false positives.
Fuzzy Year Matching: Allows matches within ±1 year (useful for handling publication date discrepancies). Default: enabled.
DOI Priority: When a DOI is available, it overrides other matching criteria. Default: enabled.
Manual Review Mode: Flags ambiguous matches (score between 0.85–0.90) for user verification. Default: disabled.

📊 Applications

Reference matching is essential for several bibliometric analyses:

Co-citation analysis: Identify documents frequently cited together, revealing intellectual structure.
Historiograph: Trace the historical development of research topics through citation linkages.
Most Cited Local Documents: Rank documents by the number of times they are cited within the collection.
Citation networks: Construct directed citation graphs for network-based metrics (PageRank, betweenness centrality).

⚠️ Important Notes

Reference matching quality depends heavily on the completeness and accuracy of the CR field in the original data export.
Incomplete or poorly formatted references (e.g., missing author names, incorrect years) may result in lower matching rates.
For very large collections (>10,000 documents), consider using subsets or increasing the matching threshold to improve performance.
Always verify ambiguous matches manually, especially for high-stakes analyses.

📚 References

Aria, M., & Cuccurullo, C. (2017). bibliometrix: An R-tool for comprehensive science mapping analysis. Journal of Informetrics, 11(4), 959–975. https://doi.org/10.1016/j.joi.2017.08.007

Garfield, E. (1979). Citation indexing: Its theory and application in science, technology, and humanities. New York: Wiley.

Small, H. (1973). Co-citation in the scientific literature: A new measure of the relationship between two documents. Journal of the American Society for Information Science, 24(4), 265–269. https://doi.org/10.1002/asi.4630240406

Filter List
Info & References

1. General

Document Type

Language

Publication Year

Subject Category

2. (J) Journal

Upload a List of Journals

Browse...

Upload a Journal Ranking List

Browse...

Source by Bradford Law Zones

3. (AU) Author's Country

Region

Country

4. (DOC) Documents

Total Citations

Total Citations per Year

🔍 Filters: Refining Your Bibliometric Collection

The Filters module provides a comprehensive set of tools to refine and subset your bibliographic collection based on multiple metadata criteria. By applying filters, you can focus your analysis on specific document types, time periods, geographic regions, journals, or citation thresholds—enabling more targeted and meaningful bibliometric insights.

Filters are organized into four thematic panels, each addressing different aspects of bibliographic metadata. At the top of the page, a real-time summary displays how many documents, sources, and authors remain after applying your filter selections.

📊 Real-Time Filter Summary

Located at the top of the Filters page, this summary updates dynamically as you adjust filter settings:

Documents: Shows the number of documents currently selected (e.g., '898 of 898' means all documents are included; '450 of 898' means 450 documents match your filter criteria).
Sources: The number of distinct journals, books, or conferences represented in the filtered subset.
Authors: The total number of unique authors contributing to the filtered documents.

These indicators help you assess the impact of your filters before applying them, ensuring your subset maintains sufficient size for robust analysis.

1️⃣ General Filters

The General panel provides fundamental filters applicable to most bibliometric collections:

Document Type

Function: Filters documents by publication type (e.g., Article, Book Chapter, Proceedings Paper, Review, Editorial, Letter, Note).
How to Use:
- By default, all document types are selected (shown in the filter box).
- To exclude a document type, click on its name in the filter box—it will be removed.
- To include a previously excluded type, click on it in the list below the filter box.
Use Cases:
- Focus on peer-reviewed articles by excluding editorials, letters, and notes.
- Analyze conference proceedings separately from journal articles.
- Include only review articles for systematic literature reviews.

Language

Function: Filters documents by publication language (e.g., English, Spanish, French, German, Chinese).
Interaction: Similar to Document Type—click to select/deselect languages.
Note: Most bibliometric databases predominantly index English-language publications. Non-English documents may represent a small fraction (<5%) of typical collections.

Publication Year

Function: Restricts the collection to documents published within a specific time range.
How to Use:
- Use the slider to adjust the start and end years.
- The selected range is displayed below the slider (e.g., '1985 - 2020').
- The histogram shows the distribution of publications across years, helping you identify periods of high activity.
Use Cases:
- Temporal segmentation: Analyze different decades separately (e.g., 1990-2000 vs. 2010-2020).
- Exclude recent publications: Remove documents <2 years old to avoid citation lag bias.
- Focus on historical literature: Study foundational works from earlier periods.

2️⃣ Journal (J) Filters

The Journal panel enables filtering based on publication venues, journal rankings, or Bradford's Law zones:

Upload a List of Journals

Function: Restricts the collection to documents published in a user-defined list of journals.
How to Use:
1. Prepare a file (.csv, .txt, or .xlsx) with journal titles listed in the first column.
2. Click Browse... and select your file.
3. Only documents from journals matching the uploaded list (case-insensitive, partial matching) will be retained.
Use Cases:
- Focus on core journals in your field (e.g., top 10 management journals).
- Analyze publications from open-access journals only.
- Exclude predatory or low-quality journals identified via external blacklists.
Example File Format:


  Journal of Informetrics

  Scientometrics

  Journal of the Association for Information Science and Technology

  Research Policy

Upload a Journal Ranking List

Function: Filters journals based on quality rankings (e.g., Q1, Q2, Q3, Q4 quartiles; A*, A, B, C grades).
How to Use:
1. Prepare a file (.csv or .xlsx) with two columns and headers:
  - Column 1: Journal titles (must match exactly or closely)
  - Column 2: Ranking categories (e.g., Q1, Q2, A*, B)
2. Upload the file via Browse...
3. Select which ranking categories to include in your filtered collection.
Use Cases:
- Focus on top-tier journals (e.g., Q1 only) for high-impact analysis.
- Compare publication patterns across journal tiers (e.g., Q1 vs. Q2-Q4).
- Filter by national rankings (e.g., Italian VQR, Australian ABDC, UK ABS).
Example File Format:


  Journal,Quartile

  Journal of Informetrics,Q1

  Scientometrics,Q1

  Library Quarterly,Q2

  Online Information Review,Q3

Source by Bradford Law Zones

Function: Filters journals based on Bradford's Law, which divides sources into three productivity zones: Core, Zone 2, and Zone 3.
Theory: Bradford's Law states that:
- Core journals (Zone 1): A small number of highly productive sources publishing ~1/3 of all documents.
- Zone 2: A moderate number of sources contributing another ~1/3.
- Zone 3: A large number of peripheral sources producing the final ~1/3.
How to Use:
- Select 'All Sources' to include everything (default).
- Select 'Core' to focus on the most productive journals.
- Select 'Zone 2' or 'Zone 3' to analyze mid-tier or peripheral journals.
Use Cases:
- Identify the core journals dominating a research field.
- Compare citation impact between core and peripheral sources.
- Exclude low-productivity journals (Zone 3) to streamline analysis.

3️⃣ Author's Country (AU) Filters

The Author's Country panel enables geographic filtering based on author affiliations:

Region

Function: Filters documents by broad geographic regions (e.g., Africa, Asia, Europe, North America, South America, Oceania, Seven Seas, Unknown).
How to Use: Click on region buttons to toggle selection. Selected regions are highlighted in blue.
Note: 'Seven Seas' represents international waters or unclassified regions; 'Unknown' indicates missing affiliation data.

Country

Function: Filters documents by specific author countries (e.g., USA, China, UK, Germany, Italy).
How to Use:
- Use the search box to quickly find countries.
- Countries are displayed in two columns: left (available), right (selected).
- Click a country in the left column to add it; click in the right column to remove.
Use Cases:
- Analyze national research outputs (e.g., Italian contributions to bibliometrics).
- Study international collaboration by including multiple countries.
- Compare regional trends (e.g., Europe vs. Asia vs. North America).
- Identify emerging research nations in a field.
Important: Multi-country documents (with authors from different countries) are included if any selected country is represented among the authors.

4️⃣ Documents (DOC) Filters

The Documents panel provides citation-based filters with interactive histograms:

Total Citations

Function: Filters documents by their cumulative citation count (from database records).
How to Use:
- Use the slider below the histogram to set minimum and maximum citation thresholds.
- The histogram shows the distribution of citation counts across documents, helping you identify highly-cited outliers.
- Example: Set minimum = 50 to include only documents with ≥50 citations.
Use Cases:
- Focus on high-impact documents (e.g., citations >100) for influence analysis.
- Exclude uncited documents (citations = 0) for citation network studies.
- Identify the citation elite (top 1% most-cited papers).

Total Citations per Year

Function: Filters documents by their average annual citation rate, calculated as: Total Citations / (Current Year - Publication Year).
Why Use This? Raw citation counts are biased toward older publications. Citations per year normalizes for document age, enabling fairer comparison between recent and historical works.
How to Use: Adjust the slider to set citation-per-year thresholds (e.g., ≥5 citations/year).
Use Cases:
- Identify rapidly accumulating citations (indicators of emerging influence).
- Compare citation velocity across time periods.
- Find recent high-impact papers that haven't yet accumulated large total citation counts but show strong annual growth.

🎛️ Filter Workflow

Follow these steps to apply filters effectively:

Review the initial collection: Check the summary counts (Documents, Sources, Authors) before applying filters.
Select filter criteria: Adjust settings across the four panels based on your research objectives.
Monitor real-time updates: The summary at the top updates dynamically as you change selections, showing how many documents remain.
Click 'Apply': Once satisfied with your selections, click the blue Apply button to activate the filters.
Verify results: Check the updated summary to ensure your filters produced the expected subset size.
Proceed to analysis: Navigate to other modules (Overview, Sources, Authors, etc.) to analyze the filtered collection.
Reset if needed: Click the Reset button to clear all filters and restore the original dataset.

💡 Best Practices

Avoid over-filtering: Very small subsets (<100 documents) may not provide robust results for network or clustering analyses. Aim for at least 200-300 documents when possible.
Document your filters: Record which filters you applied for reproducibility and transparency in research reporting (e.g., 'Filtered to Q1 journals, 2010-2020, English-language articles only').
Iterative refinement: Start with broad filters and gradually narrow your criteria while monitoring the summary counts.
Combine filters strategically: Use multiple filter types together (e.g., specific countries + high citations + recent years) for highly targeted analyses.
Save filtered collections: After applying filters, export your refined collection using the Data button (top right) to preserve your work.
Compare filtered vs. unfiltered: Run key analyses on both the full and filtered collections to assess how filters impact results.

⚠️ Important Considerations

Citation Data Availability: Citation counts depend on database indexing. Web of Science and Scopus provide citation data; PubMed and some other databases do not. Missing citation data will result in empty histograms in the Documents panel.
Affiliation Data Quality: Author country filters rely on affiliation metadata, which may be incomplete or inconsistent, especially in older publications or non-WoS/Scopus databases.
Subject Category Coverage: Subject categories are database-specific. Scopus categories differ from Web of Science categories; merged collections may have inconsistent classification.
Filter Order Independence: Filters are applied simultaneously, not sequentially. The order in which you select filters does not affect the final result.
Bradford Zone Recalculation: Bradford's Law zones are calculated based on the current collection. If you merge collections or upload new data, zones may shift.

🔍 Use Case Examples

Example 1: Analyzing Top-Tier Recent Research

Goal: Focus on high-impact, recent publications in core journals.
Filters Applied:
- Document Type: Article, Review
- Publication Year: 2015-2020
- Source by Bradford Law Zones: Core
- Total Citations per Year: ≥10
Outcome: A curated subset of influential papers from leading journals, suitable for identifying emerging research fronts.

Example 2: National Research Assessment

Goal: Evaluate research output from Italian universities in Computer Science.
Filters Applied:
- Author's Country: Italy
- Subject Category: Computer Science, Information Systems
- Document Type: Article
Outcome: A collection focused on Italian contributions to CS, enabling analysis of national productivity, collaboration patterns, and impact.

Example 3: Historical Foundational Literature

Goal: Study the intellectual foundations of a field by examining seminal works.
Filters Applied:
- Publication Year: 1970-1990
- Total Citations: ≥100
- Document Type: Article
Outcome: A set of highly-cited historical documents representing foundational contributions.

📚 References

Bradford, S. C. (1934). Sources of information on specific subjects. Engineering, 137, 85–86.

Aria, M., & Cuccurullo, C. (2017). bibliometrix: An R-tool for comprehensive science mapping analysis. Journal of Informetrics, 11(4), 959–975. https://doi.org/10.1016/j.joi.2017.08.007

Garfield, E. (2009). From the science of science to Scientometrics: visualizing the history of science with HistCite software. Journal of Informetrics, 3(3), 173–179. https://doi.org/10.1016/j.joi.2009.03.009

Main Information

Thinking...

📊 Main Information: Overview of Your Bibliometric Collection

The Main Information page provides a comprehensive, at-a-glance summary of the key bibliometric indicators for your collection. This dashboard-style interface displays 12 core metrics organized into visual cards, allowing you to quickly assess the scope, composition, and characteristics of your dataset.

This section is the ideal starting point for understanding your collection before diving into more detailed analyses. It answers fundamental questions such as: How large is my dataset? What is the temporal coverage? How collaborative is the research? How impactful are the documents?

📈 Core Metrics Explained

The Main Information dashboard displays the following indicators:

1. Timespan

Definition: The temporal range covered by the collection, from the earliest to the most recent publication year.
Example: 1985-2020 indicates documents published between 1985 and 2020.
Interpretation: A wider timespan enables longitudinal trend analysis and historical perspectives. Collections spanning decades are suitable for studying research evolution and paradigm shifts.

2. Sources

Definition: The total number of distinct publication venues (journals, conferences, books) represented in the collection.
Interpretation: A higher number of sources suggests a multidisciplinary or dispersed research field, while a lower number indicates concentration in a few core journals. This metric is useful for identifying dominant publication venues via Bradford's Law analysis.

3. Documents

Definition: The total number of bibliographic records (articles, reviews, proceedings, etc.) in the collection.
Interpretation: This is the fundamental sample size for all subsequent analyses. Larger collections (>1,000 documents) provide more robust insights, especially for network and clustering analyses.

4. Annual Growth Rate

Definition: The average percentage increase in the number of publications per year over the collection's timespan.
Formula: Compound Annual Growth Rate (CAGR) calculated as: [(N_final / N_initial)^(1/years) - 1] × 100
Interpretation: A positive growth rate indicates an expanding research field, while negative or near-zero values suggest maturity or decline. High growth rates (>10%) often signal emerging topics attracting increasing scholarly attention.

5. Authors

Definition: The total number of unique authors who contributed to the documents in the collection.
Interpretation: This metric reflects the size of the research community. A high author-to-document ratio suggests collaborative research, while a low ratio may indicate a field dominated by a few prolific researchers.

6. Authors of Single-Authored Docs

Definition: The number of authors who published at least one single-authored document in the collection.
Interpretation: Single-authored papers are more common in humanities and theoretical disciplines. A low proportion suggests high collaboration intensity, typical of experimental sciences and interdisciplinary fields.

7. International Co-Authorship

Definition: The percentage of documents authored by researchers from multiple countries.
Interpretation: High international collaboration (>30%) indicates global research networks and is often associated with higher citation impact. This metric is a proxy for research globalization and cross-border knowledge exchange.

8. Co-Authors per Document

Definition: The average number of authors per document in the collection.
Interpretation: Values typically range from 2 (social sciences, humanities) to 5+ (biomedical sciences, physics). Increasing values over time reflect the trend toward team science and large-scale collaborative projects.

9. Author's Keywords (DE)

Definition: The total number of unique keywords provided by authors (DE = Descriptors) across all documents.
Interpretation: A rich keyword set (>1,000 unique terms) enables robust thematic analysis and topic modeling. The diversity of keywords reflects the conceptual breadth of the research field.

10. References

Definition: The total number of cited references listed in the bibliographies of all documents in the collection.
Interpretation: This metric is essential for citation-based analyses (co-citation, bibliographic coupling, reference publication year spectroscopy). Larger reference pools enable more comprehensive intellectual structure mapping.

11. Document Average Age

Definition: The average number of years elapsed since publication, calculated relative to the current year.
Formula: Current Year - Mean(Publication Years)
Interpretation: Lower values (<5 years) indicate a focus on recent research, while higher values suggest inclusion of foundational or historical literature. This metric helps assess whether the collection is contemporary or retrospective.

12. Average Citations per Document

Definition: The mean number of citations received by documents in the collection (based on database citation counts).
Interpretation: Higher values indicate high-impact research. Average citation rates vary widely by discipline (e.g., biomedical sciences >20, social sciences ~10). This metric is influenced by document age, field norms, and database coverage.

🧠 Biblio AI Integration

If Biblio AI is enabled, you can click the Biblio AI tab to receive an automated narrative summary of these indicators. The AI-generated text provides contextualized interpretations, highlights notable patterns, and offers insights suitable for inclusion in research reports or presentations.

Example AI-generated insights:

'The collection exhibits a strong annual growth rate of 14.05%, suggesting an emerging and rapidly expanding research domain.'
'With 36.41% international co-authorship, the field demonstrates moderate global collaboration, indicating opportunities for further cross-border partnerships.'
'The average of 37.12 citations per document reflects high scholarly impact, placing this collection above typical citation rates for the social sciences.'

📋 Viewing Options

The Main Information page offers three viewing modes via tabs at the top:

Plot: Visual card-based dashboard (default view) with color-coded metrics
Table: Tabular representation of all indicators for easy export to reports
Biblio AI: AI-generated narrative summary and interpretation (requires Gemini API key)

💡 How to Use Main Information

This section is designed for multiple purposes:

Initial Data Assessment: Quickly validate that your collection has been imported correctly and contains the expected number of documents and metadata fields.
Research Reporting: Extract summary statistics for the 'Methods' or 'Data' section of a systematic review or bibliometric study.
Comparative Analysis: Compare indicators across different datasets (e.g., two time periods, competing research streams) to identify differences in growth, collaboration, or impact.
Presentation Material: Export the dashboard or AI-generated text for use in slides, posters, or grant proposals.

📌 Best Practices

Always review Main Information first before proceeding to advanced analyses—it helps identify potential data quality issues (e.g., missing years, incomplete author data).
Compare with field benchmarks: Contextualize your indicators by comparing them with known norms for your discipline (e.g., citation rates, collaboration patterns).
Document your collection: Use the 'Brief Description' text box (visible in the Import/Load section) to record search queries, inclusion criteria, and data sources for reproducibility.
Export summary statistics: Save the table view as a reference for your research documentation or supplementary materials.

⚠️ Important Considerations

Database Bias: Indicators reflect the coverage and indexing policies of the source database(s). Web of Science and Scopus have different journal lists, which affects metrics like citation counts and international co-authorship.
Citation Lag: Recent documents (<2 years old) typically have lower citation counts due to insufficient time for accumulation. Average citations per document may be biased downward if your collection includes many recent papers.
Incomplete Metadata: Some databases (e.g., PubMed, Dimensions) provide limited metadata, which may result in missing or incomplete values for certain indicators (e.g., author affiliations for international co-authorship calculation).
Growth Rate Sensitivity: Annual growth rate calculations are sensitive to the start and end years of the collection. Unusual spikes or drops in specific years can distort the overall trend.

🔍 Next Steps

After reviewing the Main Information dashboard, proceed to more detailed analyses:

Filters: Refine your collection by applying metadata filters (e.g., document type, time range, subject category)
Sources: Identify the most productive journals and analyze publication patterns
Authors: Examine author productivity, collaboration networks, and impact metrics
Conceptual Structure: Explore thematic evolution and topic clustering via keyword co-occurrence and thematic maps
Intellectual Structure: Investigate citation networks through co-citation analysis and historiography

📚 References

Aria, M., & Cuccurullo, C. (2017). bibliometrix: An R-tool for comprehensive science mapping analysis. Journal of Informetrics, 11(4), 959–975. https://doi.org/10.1016/j.joi.2017.08.007

Zupic, I., & Čater, T. (2015). Bibliometric methods in management and organization. Organizational Research Methods, 18(3), 429–472. https://doi.org/10.1177/1094428114562629

Average Citations Per Year

Plot
Table

Life Cycle of Scientific Production

Thinking...

📈 Life Cycle of Scientific Production: Modeling Research Topic Evolution

The Life Cycle of Scientific Production module implements a logistic growth model to analyze the temporal dynamics of research topics. This approach, grounded in the theory of scientific paradigms and innovation diffusion, allows researchers to identify the current developmental stage of a field, predict future trends, and estimate when a topic will reach maturity or saturation.

By fitting a logistic curve to the annual publication counts in your collection, this analysis reveals whether a research area is in its emergence phase, rapid growth phase, maturity phase, or decline phase.

📐 The Logistic Growth Model

The life cycle analysis is based on the logistic growth function, which models how the cumulative number of publications evolves over time:

Formula:


  P(t) = K / (1 + exp(-b(t - t₀)))

Where:

P(t): Cumulative number of publications at time t
K: Saturation level (maximum total publications the topic will produce)
b: Growth rate parameter (determines the steepness of the curve)
t₀: Inflection point (time when growth rate is highest)

The annual publication rate is derived as the first derivative of P(t), producing a bell-shaped curve that peaks at the inflection point and gradually declines as the topic approaches saturation.

🔬 Model Overview: Key Parameters

The Model Overview section displays four fundamental indicators derived from the fitted logistic model:

1. Saturation (K)

Definition: The estimated maximum total number of publications that will ever be produced on this research topic.
Interpretation:
- High K values (>5,000) indicate a broad, impactful research domain with sustained long-term interest.
- Low K values (<1,000) suggest a niche topic with limited scope or a specialized subtopic within a larger field.
- The current cumulative total as a percentage of K reveals how close the topic is to exhaustion.
Example: K = 8,980 publications suggests the topic will produce approximately 8,980 total documents before reaching saturation.

2. Peak Year (T_m)

Definition: The year when annual publication output is predicted to reach its maximum.
Interpretation:
- If the peak year is in the future, the topic is still in a growth phase and attracting increasing attention.
- If the peak year is in the past, the topic has entered a maturity or decline phase, with decreasing annual output.
- If the peak year is near the present, the topic is at the zenith of its popularity.
Example: Peak Year = 2029 indicates the topic will reach maximum annual productivity in 2029, suggesting it is currently in an accelerating growth phase.

3. Peak Annual

Definition: The maximum number of publications per year predicted to occur at the Peak Year.
Interpretation: This metric reflects the intensity of research activity at the topic's peak. Higher values indicate greater scholarly attention and resource allocation.
Example: Peak Annual = 592 pubs/year means the topic will generate approximately 592 publications annually at its zenith.

4. Growth Duration (Δ_t)

Definition: The estimated time span (in years) from the topic's emergence (10% of K) to near-saturation (90% of K).
Interpretation:
- Short duration (<10 years): Rapid maturation, typical of hot topics, technological innovations, or crisis-driven research (e.g., COVID-19 studies).
- Medium duration (10-20 years): Typical of mainstream research domains with sustained but gradual growth.
- Long duration (>20 years): Slow-developing fields, foundational topics, or interdisciplinary areas requiring extensive infrastructure.
Example: Growth Duration = 16.7 years suggests the topic will take approximately 17 years to mature from its early stage to near-saturation.

✅ Model Fit Quality

The Model Fit Quality section assesses how well the logistic curve fits the observed publication data using four statistical metrics:

1. R² (Coefficient of Determination)

Range: 0 to 1 (higher is better)
Interpretation: Proportion of variance in publication counts explained by the model.
- R² > 0.90: Excellent fit—the logistic model accurately captures the publication trend.
- 0.70 < R² < 0.90: Good fit—the model is reasonable but may not capture all nuances (e.g., fluctuations due to external events).
- R² < 0.70: Poor fit—the logistic model may not be appropriate for this dataset (non-logistic growth pattern, data quality issues).
Example: R² = 0.953 indicates an excellent fit, with 95.3% of publication variance explained by the model.

2. RMSE (Root Mean Squared Error)

Definition: Average deviation between observed and predicted annual publications.
Interpretation: Lower values indicate better fit. RMSE should be interpreted relative to the scale of annual publications (e.g., RMSE = 10 is negligible for topics with 500+ annual pubs, but significant for topics with <50 pubs/year).

3. AIC (Akaike Information Criterion)

Purpose: Balances model fit against complexity (penalizes overfitting).
Interpretation: Lower AIC values indicate a better model. AIC is most useful for comparing alternative models rather than assessing absolute fit quality.

4. BIC (Bayesian Information Criterion)

Purpose: Similar to AIC but applies a stronger penalty for model complexity.
Interpretation: Lower BIC values indicate better models. BIC is more conservative than AIC and favors simpler models.

Overall Assessment: Biblioshiny automatically classifies model fit as Excellent, Good, or Poor based primarily on R² values. An 'Excellent' fit (R² > 0.90) validates the use of logistic growth assumptions for forecasting.

📍 Current Status

This section provides a snapshot of the topic's present state relative to its life cycle trajectory:

Last Observed Year: The most recent year with publication data in your collection.
Annual Publications: The number of publications in the last observed year.
Cumulative Total: The total number of publications from the collection's start to the last observed year.
Progress to Saturation: The percentage of K (saturation level) already reached.
- 0-30%: Emergence or early growth phase.
- 30-70%: Rapid growth phase (the topic is 'hot').
- 70-90%: Late growth phase, approaching maturity.
- >90%: Maturity or decline phase, nearing exhaustion.

Example Interpretation: If Progress to Saturation = 10.0%, the topic is in the rapid growth phase, with 90% of its publication potential still ahead. This signals a promising emerging field attracting increasing scholarly attention.

🏁 Milestone Years

The Milestone Years section predicts when the topic will reach specific saturation thresholds:

10% of K: Emergence milestone—marks the topic's transition from niche to recognized research area.
50% of K (Midpoint): The inflection point where growth rate is highest. This coincides with the Peak Year (T_m).
90% of K: Maturity milestone—indicates the topic is approaching saturation, with declining annual growth.
99% of K: Near-complete saturation—the topic has exhausted most of its research potential.

Example:


  10% of K: 2021.0

  50% of K: 2029.3 (+9 years)

  90% of K: 2037.6 (+18 years)

  99% of K: 2046.7 (+27 years)

This indicates the topic emerged around 2021, will peak in 2029, and approach saturation by 2038, with a full life cycle spanning approximately 25 years.

The system also classifies the topic's current phase (e.g., 'rapid growth phase' if between 10-50% of K) to aid interpretation.

🚀 Forecast

The Forecast section projects future publication output based on the fitted logistic model:

Forecast Period: The time range for predictions (typically 5-50 years into the future).
Projection for 2025: Estimated cumulative total publications by 2025 (includes annual projection in parentheses).
Projection for 2030: Estimated cumulative total publications by 2030 (includes annual projection in parentheses).

Example:


  Projection for 2025: 2183 cumulative (436 annual)

  Projection for 2030: 4898 cumulative (587 annual)

This suggests the topic will grow from ~900 publications (current) to over 4,800 by 2030, with annual output peaking around 587 publications per year.

Important: Forecasts assume the logistic model remains valid (no disruptive events, paradigm shifts, or external shocks). Long-term forecasts (>10 years) should be interpreted with caution.

📊 Visualizations

The Plot tab provides two complementary graphs:

1. Life Cycle - Annual Publications

Blue solid line: Logistic fit to observed data
Blue dashed line: Forecasted annual publications
Blue dots: Observed annual publications from your collection
Red dashed vertical line: Peak Year (T_m)

Interpretation: This bell-shaped curve shows how publication activity rises, peaks, and eventually declines. The shape reveals the topic's maturity:

Steep ascent, pre-peak: Emerging or rapidly growing topic.
Near or at peak: Mature topic at maximum attention.
Descending curve, post-peak: Declining topic losing relevance.

2. Cumulative Growth Curve

Green solid line: Logistic fit to observed cumulative data
Green dashed line: Forecasted cumulative publications
Green dots: Observed cumulative publications
Horizontal dashed lines: Saturation thresholds (50%, 90%)

Interpretation: This S-shaped curve illustrates the topic's total knowledge accumulation over time. The curve's position and steepness reveal:

Lower left (shallow slope): Emergence phase with slow initial growth.
Middle (steep slope): Rapid growth phase with exponential accumulation.
Upper right (flattening): Maturity phase approaching saturation asymptote (K).

🧠 Biblio AI Integration

The Biblio AI tab allows you to generate AI-powered narrative interpretations of the life cycle analysis. Key features include:

Customizable Prompts: Edit the default prompt to add context-specific details (e.g., research domain, database source, filter criteria).
Graph-Based Analysis: Biblio AI analyzes the visualizations to identify trends, anomalies, and key transition points.
Automatic Interpretation: Generates text suitable for research reports, explaining model parameters, growth phases, and forecasts in natural language.

Example Prompt Enhancement:


  The analysis was performed on a collection downloaded from WOS focusing on machine learning applications in healthcare from 1990-2020.

This contextual information helps Biblio AI produce more accurate and domain-relevant interpretations.

💡 Use Cases

Identifying Emerging Topics: Detect rapidly growing fields in their early stages (10-30% of K) for strategic research investment.
Timing Research Entry: Avoid entering saturated fields (>90% of K) where novelty is harder to achieve.
Forecasting Resource Needs: Predict future publication volumes to plan journal submissions, conferences, or funding opportunities.
Comparative Life Cycle Analysis: Run the analysis on multiple subtopics to identify which are growing vs. declining.
Paradigm Shift Detection: Poor model fit (R² < 0.70) may signal non-logistic patterns caused by disruptive innovations or paradigm shifts.

📌 Best Practices

Ensure sufficient data: Logistic models require at least 10-15 years of publication data for reliable fitting. Collections with <10 years may produce unstable forecasts.
Check model fit: Always review R² and visual fit before interpreting forecasts. Poor fits (R² < 0.70) indicate the logistic model may not be appropriate.
Consider external events: The model assumes smooth, uninterrupted growth. Real-world shocks (e.g., pandemics, funding cuts, technological breakthroughs) can invalidate long-term forecasts.
Use relative comparisons: Life cycle parameters (K, Peak Year) are most informative when comparing multiple topics or time periods within the same field.
Validate forecasts periodically: Re-run the analysis with updated data every 2-3 years to recalibrate predictions.

⚠️ Important Considerations

Database Coverage: The model reflects only publications indexed in your source database(s). Incomplete coverage (e.g., missing journals, preprints) can distort saturation estimates.
Definition Drift: Topic boundaries may shift over time (e.g., 'artificial intelligence' in 1990 vs. 2020), affecting the validity of K estimates.
Multiple Life Cycles: Some broad topics exhibit multiple overlapping life cycles as subtopics emerge and decline independently. In such cases, aggregate logistic fits may be misleading.
Self-Fulfilling Prophecies: Publishing forecasts may influence researcher behavior (e.g., avoiding 'saturated' topics), potentially altering actual trajectories.
Model Limitations: The logistic model assumes a single saturation point and smooth growth. Topics experiencing resurgence (e.g., due to new technologies) may not fit this pattern.

🔍 Interpreting Fit Quality Issues

If your model shows poor fit (R² < 0.70), consider these potential causes:

Insufficient Data: Too few years or highly irregular publication patterns.
Non-Logistic Growth: The topic may exhibit exponential, linear, or cyclic growth rather than logistic.
Recent Disruptions: External shocks (e.g., COVID-19 boosting health research) create anomalies that deviate from smooth curves.
Topic Too Broad: Aggregating multiple subtopics with different life cycles can obscure individual patterns.
Data Quality Issues: Missing years, database indexing changes, or inconsistent metadata.

Solution: Try narrowing your collection (e.g., focusing on a specific subtopic or time range) or exploring alternative growth models.

📚 References

Aria, M., Misuraca, M., & Spano, M. (2020). Mapping the evolution of social research and data science on 30 years of Social Indicators Research. Social Indicators Research, 149, 803–831. https://doi.org/10.1007/s11205-020-02281-3

Bettencourt, L. M., Kaiser, D. I., & Kaur, J. (2009). Scientific discovery and topological transitions in collaboration networks. Journal of Informetrics, 3(3), 210–221. https://doi.org/10.1016/j.joi.2009.03.001

Rogers, E. M. (2003). Diffusion of Innovations (5th ed.). New York: Free Press.

Small, H., & Upham, S. P. (2009). Citation structure of an emerging research area on the verge of application. Scientometrics, 79(2), 365–375. https://doi.org/10.1007/s11192-009-0424-0

Wang, Q. (2018). A bibliometric model for identifying emerging research topics. Journal of the Association for Information Science and Technology, 69(2), 290–304. https://doi.org/10.1002/asi.23930

Three-Field Plot

Options:

Main Configuration

Middle Field

Number of Items

Left Field

Number of Items

Right Field

Number of Items

Thinking...

🔀 Three-Field Plot

The Three-Field Plot is an advanced visualization tool that reveals the relationships among three distinct bibliographic dimensions through an interactive Sankey diagram. This plot enables researchers to explore the complex connections between different metadata fields, making it particularly useful for understanding how research topics, authors, sources, and references are interconnected within a scientific domain.

🎯 Purpose and Application

The Three-Field Plot serves multiple analytical purposes:

Relationship Mapping: Visualizes how elements from three different bibliographic fields are associated with each other
Knowledge Flow: Tracks the flow of ideas and citations across different dimensions (e.g., from cited references through authors to keywords)
Thematic Connections: Identifies which keywords or topics are most strongly associated with specific authors or sources
Author-Topic Associations: Shows which authors are working on which topics and citing which foundational works

📊 How It Works

The visualization consists of three vertical columns representing different bibliographic fields:

Left Field: Typically represents sources (cited references, journals) or temporal information
Middle Field: Usually displays authors or intermediary elements that connect the other two fields
Right Field: Often shows keywords, topics, or other thematic elements

The width of each flow (colored band) is proportional to the frequency of co-occurrence between elements. Thicker flows indicate stronger associations, while thinner ones represent weaker connections.

⚙️ Configuration Options

The Options panel allows you to customize the plot:

Left Field: Select from available metadata fields (e.g., Cited References, Sources, Authors' Countries)
Middle Field: Choose the central connecting field (e.g., Authors, Sources, Keywords)
Right Field: Define the destination field (e.g., Author's Keywords, Keywords Plus, Subject Categories)
Number of Items: Control how many top elements to display for each field (typically 10-30 items per field)

💡 Common Field Combinations

Some particularly insightful field combinations include:

References → Authors → Keywords: Shows which foundational works are cited by which authors working on which topics
Sources → Authors → Countries: Maps the geographical distribution of authors publishing in specific journals
Keywords → Authors → Cited References: Reveals the intellectual foundations of different research themes
Authors' Countries → Authors → Keywords: Identifies national research specializations and thematic focuses
Publication Year → Authors → Keywords: Tracks temporal evolution of author productivity and topic emergence

🔍 Interpretation Guidelines

Flow Thickness: A thick flow between two elements indicates a strong association (high co-occurrence frequency)
Multiple Connections: Elements with many outgoing or incoming flows are central nodes in the network
Isolated Flows: Thin, isolated connections may represent niche specializations or emerging topics
Color Coding: Colors help distinguish different elements in the left field, making it easier to trace specific flows
Cross-field Patterns: Look for patterns where multiple elements from one field connect to the same element in another field, indicating convergence or interdisciplinarity

📌 Best Practices

Start Simple: Begin with a small number of items (10-15 per field) to avoid visual clutter, then increase if needed
Logical Sequences: Arrange fields in a logical flow (e.g., past → present, source → output, context → content)
Interactive Exploration: Hover over flows and nodes to see exact frequencies and connections
Export Results: Use the plot in presentations to illustrate complex relationships in an accessible way
Combine with Networks: Use Three-Field Plots alongside network analyses for complementary perspectives on your data
Context Matters: Always interpret the plot in the context of your research question and domain knowledge

⚠️ Limitations

Aggregation Effects: The plot shows aggregate patterns and may obscure individual document-level details
Top-N Selection: Only the most frequent items are displayed; rare but potentially important connections may be hidden
Direction Ambiguity: While flows suggest relationships, they don't always imply causal or temporal direction
Visual Complexity: With too many items, the plot can become difficult to interpret; reduce the number of items if necessary

🤖 Biblio AI Integration

When Biblio AI is enabled, you can generate automatic interpretations of the Three-Field Plot. The AI will:

Identify the most important flows and connections
Highlight dominant patterns and relationships
Provide narrative explanations suitable for research reports and presentations
Suggest potential interpretations based on the observed patterns

📚 Key References

Aria, M., & Cuccurullo, C. (2017). bibliometrix: An R-tool for comprehensive science mapping analysis. Journal of Informetrics, 11(4), 959–975. https://doi.org/10.1016/j.joi.2017.08.007

Chen, C. (2017). Science Mapping: A Systematic Review of the Literature. Journal of Data and Information Science, 2(2), 1–40. https://doi.org/10.1515/jdis-2017-0006

Author Profile

👤 Author Profile Overview

The Author Profile page provides a dual-perspective bibliometric overview of each author included in the collection:

🔹 Global Profile

The Global Profile presents the author's complete scientific output, based on metadata retrieved from OpenAlex via the openalexR R package. This profile includes all publications authored by the researcher, regardless of whether they are part of the current collection.

Main features of the Global Profile include:

Total Publications and Citations
H-Index and i10-Index
2-Year Mean Citation Rate
Publication Trends over the last 10 years
Main Research Topics extracted from OpenAlex concepts

Data Source: OpenAlex API (via openalexR)
Unique Identifier: OpenAlex Author ID (e.g., A5014455237)

🔸 Local Profile

The Local Profile focuses exclusively on the subset of the author's publications that are included in the user-defined collection currently under analysis in the project.

Main features of the Local Profile include:

Number of Publications, Total Citations, and Local H-Index
Average Citations per Work
Recent Activity: Number of publications in the last 5 years
Publication Trends (based only on local data)
Main Keywords derived from the local collection
List of Publications with full metadata (title, year, journal, DOI, citations)

This local profile helps contextualize the author's role and impact within the specific research topic or dataset under investigation.

🔄 Interpretation and Use

The Global Profile offers a broad, external view of the author's overall scholarly influence, while the Local Profile highlights their specific relevance within the current study.

This dual visualization is particularly useful for:

Identifying influential researchers in the topic area
Comparing local vs. global impact
Evaluating thematic alignment of authors with the collection's focus

📚 References

Priem, J. et al. (2022). OpenAlex: A fully-open index of scholarly works, authors, venues, institutions, and concepts. Retrieved from https://openalex.org

Aria, M., Le, T., Cuccurullo, C., Belfiore, A., & Choe, J. (2024). openalexR: An R-Tool for Collecting Bibliometric Data from OpenAlex. R Journal, 15(4), 167–180. https://doi.org/10.32614/RJ-2023-089

Aria, M. et al. (2023). openalexR: An R package for programmatic access to OpenAlex metadata. CRAN. Retrieved from https://cran.r-project.org/package=openalexR

Hirsch, J.E. (2005). An index to quantify an individual's scientific research output. Proceedings of the National Academy of Sciences, 102(46), 16569–16572. https://doi.org/10.1073/pnas.0507655102

Search Auhtor

Most Local Cited Authors

Plot
Table

Authors' Production over Time

Thinking...

Author Productivity through Lotka's Law

Plot
Table

Authors' Local Impact

Plot
Table

Most Relevant Affiliations

Plot
Table

Affiliations' Production over Time

Plot
Table

Corresponding Author's Countries

Thinking...

Countries' Scientific Production

Plot
Table

Countries' Production over Time

Plot
Table

Most Cited Countries

Plot
Table

Most Global Cited Documents

Plot
Table

Most Local Cited Documents

Thinking...

Most Local Cited References

Plot
Table

Reference Spectroscopy

Type

Thinking...

Most Frequent Words

Plot
Table

WordCloud

Plot
Table

TreeMap

Plot
Table

Words' Frequency over Time

Plot
Table

Trend Topics

Thinking...

Clustering by Coupling

Co-occurrence Network

Speed (ms)

Thinking...

Thematic Map

Thinking...

Thematic Evolution

Map
Table

Thinking...

Factorial Analysis

Thinking...

Co-citation Network

Thinking...

Historiograph

Options:

Main Configuration

Number of Nodes

Graphical Parameters

Label Configuration

Node Label

Filtering Options

Remove Isolated Nodes

Visual Settings

Label Size

Node Size

Thinking...

Collaboration Network

Speed (ms)

Thinking...

Countries' Collaboration World Map

Thinking...

Report

Select results to include in the Report

TALL - Text Analysis for All

Biblioshiny now includes a dedicated export tool that allows you to prepare and extract textual data (Titles, Abstracts, and Keywords) from your bibliographic collection in a format ready to be used in TALL.

TALL is a user-friendly R Shiny application designed to support researchers in performing textual data analysis without requiring advanced programming skills.

TALL offers a comprehensive workflow for data cleaning, pre-processing, statistical analysis, and visualization of textual data, by combining state-of-the-art text analysis techniques into an R Shiny app.

TALL includes a wide set of methodologies specifically tailored for various text analysis tasks. It aims to address the needs of researchers without extensive programming skills, providing a versatile and general-purpose tool for analyzing textual data. With TALL, researchers can leverage a wide range of text analysis techniques without the burden of extensive programming knowledge, enabling them to extract valuable insights from textual data in a more efficient and accessible manner.

Learn more at: www.tall-app.com

Export a corpus for TALL

Select textual metadata:

Select additional metadata:

Select at least one textual field to export, click 'Play' to generate the dataset, then save and import it into TALL.

Plot Export Settings

Resolution (DPI)

Higher DPI values produce better quality images but larger file sizes.

Plot Height (inches)

Adjust the height of exported plots. Width is automatically calculated to maintain aspect ratio.

Reproducibility Settings

Random Seed

Set a random seed to ensure reproducible results in stochastic algorithms (e.g., community detection, network layouts).

Using the same seed value ensures that analyses involving randomization will produce identical results when re-run.

Biblio AI - Google Gemini Integration

Enable advanced AI-powered features by providing your Google Gemini API Key. If you don't have one, you can generate it at AI Studio .

API Key

Model Selection

Select the Gemini model to use for AI operations.

Output Size

Configure the maximum output length for AI responses.

Package Tutorial

Convert and Import Data

biblioshiny Tutorial

Version 5.2.1

Donation

Bibliometrix

K-Synth

Github

biblioshiny: the shiny app for bibliometrix

Biblioshiny 5.0 now includes Biblio AI – a powerful AI assistant for your science mapping analyses.

biblioshiny and bibliometrix are open-source and freely available for use, distributed under the MIT license.

When they are used in a publication, we ask that authors to cite the following reference:

Aria, M., & Cuccurullo, C. (2017). bibliometrix: An R-tool for comprehensive

science mapping analysis. Journal of Informetrics , 11(4), 959-975.

Failure to properly cite the software is considered a violation of the license.

SAAS Workflow

About the SAAS Workflow

🧠 Biblio AI: AI-Powered Bibliometric Analysis

✨ What does Biblio AI do?

🔧 How to enable Biblio AI?

🎯 Why use Biblio AI?

📚 Supported Bibliographic Databases and Suggested File Formats

📖 Main Authors' References (Bibliometrics)

Import or Load

The use of bibliometric approaches in business and management disciplines.

Dataset 'Management'

Export Collection

📥 Import or Load: Building Your Bibliometric Collection

📂 Three Import Options

1. Import Raw File(s)

2. Load Bibliometrix File(s)

3. Use a Sample Collection

🔍 Post-Import Features

💾 Exporting Collections

⚠️ Best Practices

📚 References

OpenAlex Data Collection

Date Range Filter

Downloaded Collection

PubMed Data Collection

Date Range Filter

Downloaded Collection

Load Collections

Export collection

🔀 Merge Collections: Combining Data from Multiple Sources

🎯 Why Merge Collections?

🔧 How to Merge Collections

🔬 Merge Algorithm Overview

Stage 1: Database Identification and Ordering

Stage 2: Field Alignment

Stage 3: Duplicate Detection

Stage 4: Author Name Standardization

Stage 5: Metadata Integration

📊 Merge Statistics and Validation

📌 Best Practices

⚠️ Important Considerations

🔍 Example Use Cases

📚 References

Reference Matching

Matching Statistics

Manual Merge

Top Cited References (After Normalization)

Citation Variants Examples

Matching Options

Matching in progress...

Apply to Data

Export Results

Advanced Options

🔗 Reference Matching: Algorithm and Usage

🔬 Algorithm Overview

💡 Usage in Biblioshiny

⚙️ Key Parameters and Options

📊 Applications

⚠️ Important Notes

📚 References

1. General

2. (J) Journal

3. (AU) Author's Country

4. (DOC) Documents

🔍 Filters: Refining Your Bibliometric Collection

2. Peak Year (T_m)

4. Growth Duration (Δ_t)